Look under the Hood of Lookalike Modeling

In today’s fast-paced digital marketplace, staying ahead of the competition is a top priority. This means, as a media buyer, you’re always on the lookout for ways to improve your targeting so you get the most out of your ad dollars. 

Chances are, you’ve used lookalike modeling to optimize your ad spend. With lookalike modeling, you can use your first-party data to create an expanded audience of people who are more likely to convert or engage. But without a full understanding of how lookalike modeling works, you may not be leveraging the full capabilities of this powerful machine-learning-based tool. 

What Is Lookalike Modeling?

With the help of machine learning techniques, lookalike modeling takes a data set—referred to as a “seed set”—of your existing customers and uses this to create a new, larger audience of potential buyers who share similar attributes and behaviors.

These shared attributes and behaviors mean you now have a highly relevant pool of people who are more likely to be interested in your products or services. 

Behind the Scenes: How Lookalike Modeling Works

Now that we know the general definition of lookalike modeling, it’s time to look at what’s going on “under the hood”. By understanding how lookalike modeling works, you’ll be better able to optimize your advertising budget and reduce inefficient ad spend. 

While each programmatic platform will have its own methods, there are a number of elements common to the lookalike modeling process overall:

Data Collection

Your “ideal customers” seed set is built using first-party data. You most likely won’t need to include all or even most of your first-party data to create your seed set—in fact, each programmatic platform has different limitations on what data points they can match to your first-party data to find ideal customers in their network. But any first-party data you don’t use can still be useful during the pre-seed stage, for filtering and honing in on who your most valuable customers are. 

The seed set data you use will depend on your campaign objectives. For example, the data used to build a seed set for a campaign targeting people most likely to purchase Product A might consist of a list of customers filtered down by behavioral or purchase history information related to Product A, such as recent purchases, repeat purchases, and the action of adding the product to their carts. 

Data Analysis

Because the attributes (and behaviors) you choose to focus on will form the basis of your lookalike audience, you’ll want to create audiences with attributes that are a good fit for your campaign objectives. Programmatic platforms will then use a number of techniques, including machine learning algorithms, to help analyze your seed set data and identify patterns and similarities in these attributes.

For example, suppose your objective is to raise brand awareness. In that case, the platform might identify patterns in attributes and behaviors related to your broader seed set’s interests, social media behavior, and brand affinity. For a conversion campaign, however, patterns in purchase history, online shopping behavior, product affinity, and in-platform conversions would most likely be better predictors. And with both objectives, certain demographic and geographic attributes could be relevant.

You’ll also need to balance quality versus scale when choosing the attributes you want to be analyzed and used for your lookalike model. More attributes will mean a more highly targeted audience, but you’ll likely end up with a lookalike audience that’s much smaller. 

Lookalike Model/Audience Creation

In this stage, the programmatic platform uses machine learning techniques to build and train the lookalike model. This model includes the identified seed attributes and behaviors which can then be used to create a custom audience of people who share similar attributes and behaviors.

Once created, your lookalike model isn’t set in stone. Instead, the programmatic platform will continue to refine it, based on any new data or variables as they arise or are added. The platform may also adjust or modify its proprietary machine learning algorithms from time to time to reflect any changes in the target audience—for example, inflation may mean customers become more price-conscious, causing a shift in their buying behavior. 

Machine Learning & Lookalike Modeling

Machine learning drives the lookalike modeling process, and every programmatic platform has its own proprietary machine learning algorithms. 

Several types of machine learning techniques are commonly used in lookalike modeling. The following are simplified explanations of some of the most popular techniques:

  • PU Learning: PU learning (positive-unlabeled learning) works with data that contains only positive examples and unlabeled examples. Your seed set contains positive examples (for example, customers who add items to their cart). PU learning uses these positive examples to identify unlabeled examples in the programmatic platform’s data that are similar to the positive examples. 
  • Gradient Boosting Machines (GBMs): Using an iterative process, GBMs identify target market attributes by using decision trees to predict outcomes (for example, whether a person likes to cook), detect errors, and create new decision trees that correct those errors. These decision trees typically have a single root node from which other nodes branch out, and the user sets a maximum depth, based on the objectives and the dataset. 
  • Logistic Regression: Logistic regression is a machine learning technique that predicts the probability of an event occurring (for example, renewing a subscription) by identifying patterns between audience characteristics and the desired attribute (the subscription renewal).
  • Random Forests: As its name indicates, the random forests technique uses several decision trees to make predictions based on customers’ characteristics (for example, whether someone will click on a link). It then takes the overall consensus of these decision trees to form a final prediction.
  • Neural Networks: Just like the human brain, neural networks use interconnected nodes (“neurons”) to perform complex calculations on input data consisting of customers’ characteristics (ie age, gender, location). The result is a probability score that predicts the likelihood of the desired attribute (for example, engagement in a social media post) based on these characteristics.
  • Support Vector Machines (SVMs): SVMs work by finding the boundary (known as a hyperplane) that best separates customers known to exhibit the desired attribute (for example, an interest in sports) from those who do not. 
  • K-nearest Neighbors (KNN): KNN uses proximity to make predictions. The “k” refers to the number of nearest neighbors, based on their distance (or similarity) to the target attribute, used to make the prediction (for example, whether a customer will fill out an online form). 

Pros and Cons of Lookalike Modeling

Lookalike modeling offers advertisers several pros:

  • can expand limited first-party data
  • improved ad performance
  • efficient targeting
  • cost-effective customer acquisition
  • better conversion and lead generation

On the flip side, though, are the cons:

  • lack of diversity in seed set and machine learning algorithms data can lead to biases
  • requires a quality static seed set data
  • does not incorporate real-time first-party data to capture behavioral or preference changes
  • cannot identify untapped audiences

Tap into the Potential of Lookalike Modeling

With the power of lookalike modeling, you can build an audience of potential buyers who share attributes and behaviors similar to your most valuable customers—increasing the likelihood of conversions. 

You can also improve the cons of lookalike modeling by enriching your first-party data with ShareThis data. Our advanced data science capabilities use the latest best-in-class techniques to enrich your seed audience, making all the difference when it comes to lookalike modeling. Get in touch with us to learn more. 

About ShareThis

ShareThis has unlocked the power of global digital behavior by synthesizing social share, interest, and intent data since 2007. Powered by consumer behavior on over three million global domains, ShareThis observes real-time actions from real people on real digital destinations.

Subscribe to our Newsletter

Get the latest news, tips, and updates

Subscribe

Related Content