How Look-alike Studio works
A look-alike audience is built from three inputs:- A seed dataset — Users who exhibit the behavior you want to find more of. Typically small (thousands to low millions of records).
- A population dataset — The larger universe to score against. Must share a join-key identifier with the seed (for example, both datasets map
sha256_hashed_email). - Shared attributes — Categorical or continuous fields present on both datasets that describe each user. These are the features used to score similarity.
Scoring model
Look-alike Studio uses a Naive Bayes model with Gaussian terms for continuous attributes:- Categorical attributes (country, device type, product category) contribute log-likelihood weights based on how much more frequently each value appears in the seed versus the population.
- Continuous attributes (age, lifetime value, days since last visit) contribute Gaussian density terms parameterized by the seed’s mean and variance for each attribute.
- Attribute weights down-weight low-information attributes — an attribute that has the same distribution in the seed and the population contributes little signal.
Builder steps
1. Seed selection
Choose the dataset of users you want to find more of. The dataset must have Rosetta Stone attribute mappings so the builder can identify join keys and shared features. Only datasets on the currently selected data plane are shown. Pipeline intermediates and other audiences are hidden from the picker.2. Population selection
Choose the larger dataset to score against. The population must share at least one join-key identifier attribute with the seed (such assha256_hashed_email or narrative_id) so the builder can identify which population users overlap with seed users and exclude or include them appropriately.
If the seed and population do not share a join-key attribute, the dataset is shown as ineligible.
3. Attribute selection
Look-alike Studio classifies every attribute that exists on both datasets into one of three roles:| Role | Description | Used for scoring? |
|---|---|---|
| Identity | Identifiers like hashed email, narrative_id, MAID | No — used only to join seed and population |
| Categorical | Low-to-medium cardinality fields like country, device, gender | Yes — Naive Bayes |
| Continuous | Numeric fields like age, LTV, frequency | Yes — Gaussian density |
4. Output configuration
Choose how the look-alike audience is sized:- Limit by size — Return the top-N highest-scoring population users (for example, the top 100,000).
- Limit by score — Return every population user whose score exceeds a threshold (for example, score ≥ 0.5).
- New users only — Only population users not already in the seed are returned. Use when you want net-new reach.
- New + original seed users — Seed users are added to the output. Use when you want a single combined audience to activate.
5. Finalize
Configure the output dataset metadata:- Audience name (required) — Identifies the audience in the platform and downstream systems.
- Description — Optional context.
- Tags — Optional labels. The system automatically applies the
_nio_audienceand_nio_lookaliketags so the audience appears under My Audiences and is recognizable as look-alike-derived.
What Look-alike Studio creates
When you click Create Look-alike Audience, the builder:- Assembles the seed, population, attribute selections, and output config into a multi-stage workflow YAML.
- Submits the workflow via the workflow engine with
trigger_immediately. - Materializes around a dozen intermediate views (canonical feature combos, per-attribute distributions, attribute weights, scored candidates) tagged
_nio_lookalike_intermediate. These are hidden from the standard dataset catalog so only the final audience appears under My Audiences. - Materializes a final audience dataset with the
_nio_audienceand_nio_lookaliketags, your display name, description, and any user-supplied tags applied as part of theCREATE MATERIALIZED VIEWso the audience is born fully tagged even if you leave mid-run.
Resuming or restarting
- Dismiss progress, keep running — Closing the progress dialog leaves the workflow running. The audience appears under My Audiences when it finishes.
- Retry on error — A failed run shows an error dialog with a Retry action that resubmits the same workflow.
- Start Over — Cancels polling and resets all builder state so you can build a different look-alike audience.
Related content
Building a look-alike audience
Step-by-step guide to creating your first look-alike audience
Audience Studio
Filter-based audience builder for direct segmentation
Structuring Audiences
Strategies for organizing data for activation
Connector Reference
Technical details for each destination connector

