Skip to main content
Look-alike Studio is the platform’s visual look-alike audience builder. It takes a small seed dataset of users you already value (converters, high-LTV customers, opted-in subscribers) and scores a larger population dataset to find users who resemble the seed — producing a new audience of likely-to-convert users without writing NQL by hand.

How Look-alike Studio works

A look-alike audience is built from three inputs:
  1. A seed dataset — Users who exhibit the behavior you want to find more of. Typically small (thousands to low millions of records).
  2. A population dataset — The larger universe to score against. Must share a join-key identifier with the seed (for example, both datasets map sha256_hashed_email).
  3. Shared attributes — Categorical or continuous fields present on both datasets that describe each user. These are the features used to score similarity.
The builder generates a multi-stage NQL workflow that computes per-attribute weights from the seed, scores every population user against the seed distribution, and materializes the highest-scoring users as a new audience dataset.

Scoring model

Look-alike Studio uses a Naive Bayes model with Gaussian terms for continuous attributes:
  • Categorical attributes (country, device type, product category) contribute log-likelihood weights based on how much more frequently each value appears in the seed versus the population.
  • Continuous attributes (age, lifetime value, days since last visit) contribute Gaussian density terms parameterized by the seed’s mean and variance for each attribute.
  • Attribute weights down-weight low-information attributes — an attribute that has the same distribution in the seed and the population contributes little signal.
Each population user receives a single composite score between 0 and 1 representing how seed-like they are.

Builder steps

1. Seed selection

Choose the dataset of users you want to find more of. The dataset must have Rosetta Stone attribute mappings so the builder can identify join keys and shared features. Only datasets on the currently selected data plane are shown. Pipeline intermediates and other audiences are hidden from the picker.

2. Population selection

Choose the larger dataset to score against. The population must share at least one join-key identifier attribute with the seed (such as sha256_hashed_email or narrative_id) so the builder can identify which population users overlap with seed users and exclude or include them appropriately. If the seed and population do not share a join-key attribute, the dataset is shown as ineligible.

3. Attribute selection

Look-alike Studio classifies every attribute that exists on both datasets into one of three roles:
RoleDescriptionUsed for scoring?
IdentityIdentifiers like hashed email, narrative_id, MAIDNo — used only to join seed and population
CategoricalLow-to-medium cardinality fields like country, device, genderYes — Naive Bayes
ContinuousNumeric fields like age, LTV, frequencyYes — Gaussian density
You select which classified attributes to include as scoring features. More attributes is not always better — irrelevant or correlated attributes add noise. Start with 3–8 attributes that you believe describe what makes the seed distinctive.

4. Output configuration

Choose how the look-alike audience is sized:
  • Limit by size — Return the top-N highest-scoring population users (for example, the top 100,000).
  • Limit by score — Return every population user whose score exceeds a threshold (for example, score ≥ 0.5).
You also choose whether to include the original seed users in the output:
  • New users only — Only population users not already in the seed are returned. Use when you want net-new reach.
  • New + original seed users — Seed users are added to the output. Use when you want a single combined audience to activate.

5. Finalize

Configure the output dataset metadata:
  • Audience name (required) — Identifies the audience in the platform and downstream systems.
  • Description — Optional context.
  • Tags — Optional labels. The system automatically applies the _nio_audience and _nio_lookalike tags so the audience appears under My Audiences and is recognizable as look-alike-derived.
Name uniqueness is enforced against all existing datasets, including hidden interactive results and look-alike pipeline intermediates.

What Look-alike Studio creates

When you click Create Look-alike Audience, the builder:
  1. Assembles the seed, population, attribute selections, and output config into a multi-stage workflow YAML.
  2. Submits the workflow via the workflow engine with trigger_immediately.
  3. Materializes around a dozen intermediate views (canonical feature combos, per-attribute distributions, attribute weights, scored candidates) tagged _nio_lookalike_intermediate. These are hidden from the standard dataset catalog so only the final audience appears under My Audiences.
  4. Materializes a final audience dataset with the _nio_audience and _nio_lookalike tags, your display name, description, and any user-supplied tags applied as part of the CREATE MATERIALIZED VIEW so the audience is born fully tagged even if you leave mid-run.
The run surfaces a dismissible progress dialog — closing it does not cancel the workflow, which continues server-side. When the run completes, the audience opens in the standard audience metadata drawer and is available for delivery through any compatible connector.

Resuming or restarting

  • Dismiss progress, keep running — Closing the progress dialog leaves the workflow running. The audience appears under My Audiences when it finishes.
  • Retry on error — A failed run shows an error dialog with a Retry action that resubmits the same workflow.
  • Start Over — Cancels polling and resets all builder state so you can build a different look-alike audience.

Building a look-alike audience

Step-by-step guide to creating your first look-alike audience

Audience Studio

Filter-based audience builder for direct segmentation

Structuring Audiences

Strategies for organizing data for activation

Connector Reference

Technical details for each destination connector