Skip to main content
A mobile gaming company has rapidly grown to 2 million users but knows very little about who those users are. Since the game doesn’t require account registration, they lack basic demographic information needed for effective ad targeting and audience insights. Through Narrative, they can enrich their user base with age and gender data from identity providers, then automate monthly refreshes for new users.

What you will build

This cookbook walks through creating a demographic enrichment pipeline that:
  1. Uploads a seed dataset of mobile advertising IDs (MAIDs) from your customer base
  2. Matches those MAIDs against demographic data providers using Rosetta Stone
  3. Enriches records with raw age values (not pre-bucketed segments) and gender
  4. Automates recurring enrichment for new users added each month
The result is a continuously refreshed customer dataset with demographic attributes that you own outright and can use across any platform.

Prerequisites

Before starting, ensure you have:
  • A dataset of customer identifiers (MAIDs, hashed emails, or other supported identifier types)
  • Access to demographic data through data collaboration or marketplace providers
  • Familiarity with NQL syntax
  • Understanding of materialized views
  • Understanding of Mobile Ad IDs

Step 1: Upload your seed dataset

First, upload your customer identifier file to Narrative. This dataset serves as the seed list you want to enrich. Your dataset should include at minimum:
  • A unique identifier column (MAID, hashed email, etc.)
  • An identifier type column if you have mixed identifier types
Example schema for a MAID-based seed file:
ColumnTypeDescription
maidstringMobile advertising ID (IDFA or GAID)
maid_typestringIdentifier type: idfa or gaid
internal_user_idstringYour internal user ID for joining back to source systems
first_seen_datedateWhen the user first appeared in your system
Upload your seed dataset through the Narrative UI or SDK. Map the identifier column to the unique_identifier Rosetta Stone attribute for identity matching.

Normalize identifiers with Rosetta Stone

Ensure your identifier column is mapped to Rosetta Stone’s unique_identifier attribute:
-- Verify your dataset has Rosetta Stone mappings
SELECT
    seed."_rosetta_stone"."unique_identifier"."value" AS maid,
    seed."_rosetta_stone"."unique_identifier"."type" AS maid_type,
    seed.internal_user_id
FROM company_data."your_seed_dataset" seed
LIMIT 10

Step 2: Identify available demographic sources

Query the Rosetta Stone to understand which demographic attributes are available across your data collaborations:
-- Preview available demographic data
SELECT
    rs."unique_identifier"."type" AS id_type,
    rs."demographics"."age" AS age,
    rs."demographics"."gender" AS gender,
    COUNT(1) AS record_count
FROM narrative.rosetta_stone rs
WHERE
    rs."demographics"."age" IS NOT NULL
    OR rs."demographics"."gender" IS NOT NULL
GROUP BY
    rs."unique_identifier"."type",
    rs."demographics"."age",
    rs."demographics"."gender"
LIMIT 100
This shows you the identifier types and demographic coverage across available providers.

Step 3: Match and enrich with demographics

Create a materialized view that joins your seed dataset with demographic data through Rosetta Stone identity matching.
CREATE MATERIALIZED VIEW "customers_demographics_enriched"
REFRESH_SCHEDULE = '@monthly'
DISPLAY_NAME = 'Customer Demographics - Enriched'
DESCRIPTION = 'Seed customer list enriched with age and gender from data providers'
TAGS = ('enrichment', 'demographics', 'audience')
WRITE_MODE = 'overwrite'
AS
SELECT
    seed.internal_user_id,
    seed."_rosetta_stone"."unique_identifier"."value" AS maid,
    seed."_rosetta_stone"."unique_identifier"."type" AS maid_type,
    seed.first_seen_date,
    demo."demographics"."age" AS age,
    demo."demographics"."gender" AS gender,
    demo."_ni_event_timestamp" AS demo_observed_at
FROM company_data."your_seed_dataset" seed
INNER JOIN narrative.rosetta_stone demo
    ON seed."_rosetta_stone"."unique_identifier"."value" = demo."unique_identifier"."value"
    AND seed."_rosetta_stone"."unique_identifier"."type" = demo."unique_identifier"."type"
WHERE
    demo."demographics"."age" IS NOT NULL
    AND demo."demographics"."age" BETWEEN 13 AND 100
    AND demo."event_timestamp" > CURRENT_TIMESTAMP - INTERVAL '180' DAY
Key patterns used:
PatternPurpose
INNER JOIN narrative.rosetta_stoneMatches seed identifiers to demographic records
unique_identifier.value and unique_identifier.typeEnsures exact identifier matching
demographics.age BETWEEN 13 AND 100Filters to valid age ranges
INTERVAL '180' DAYUses recent demographic observations for accuracy
REFRESH_SCHEDULE = '@monthly'Automates recurring enrichment
Unlike pre-bucketed audience segments (e.g., “25-34”), Rosetta Stone provides raw age values. This gives you flexibility to create your own segments, perform precise analytics, or build custom targeting strategies.

Step 4: Handle multiple demographic observations

A single identifier may have demographic data from multiple sources or observation times. To get the most recent or most common value:

Most recent observation

CREATE MATERIALIZED VIEW "customers_demographics_latest"
REFRESH_SCHEDULE = '@monthly'
DISPLAY_NAME = 'Customer Demographics - Latest Observation'
DESCRIPTION = 'Enriched demographics using most recent observation per customer'
TAGS = ('enrichment', 'demographics', 'deduplicated')
WRITE_MODE = 'overwrite'
AS
SELECT
    internal_user_id,
    maid,
    maid_type,
    first_seen_date,
    age,
    gender,
    demo_observed_at
FROM (
    SELECT
        seed.internal_user_id,
        seed."_rosetta_stone"."unique_identifier"."value" AS maid,
        seed."_rosetta_stone"."unique_identifier"."type" AS maid_type,
        seed.first_seen_date,
        demo."demographics"."age" AS age,
        demo."demographics"."gender" AS gender,
        demo."_ni_event_timestamp" AS demo_observed_at,
        ROW_NUMBER() OVER (
            PARTITION BY seed.internal_user_id
            ORDER BY demo."_ni_event_timestamp" DESC
        ) AS row_num
    FROM company_data."your_seed_dataset" seed
    INNER JOIN narrative.rosetta_stone demo
        ON seed."_rosetta_stone"."unique_identifier"."value" = demo."unique_identifier"."value"
        AND seed."_rosetta_stone"."unique_identifier"."type" = demo."unique_identifier"."type"
    WHERE
        demo."demographics"."age" IS NOT NULL
        AND demo."demographics"."age" BETWEEN 13 AND 100
)
WHERE row_num = 1

Consensus across sources

When you want the most common demographic value across multiple providers:
CREATE MATERIALIZED VIEW "customers_demographics_consensus"
REFRESH_SCHEDULE = '@monthly'
DISPLAY_NAME = 'Customer Demographics - Consensus'
DESCRIPTION = 'Enriched demographics using most frequently observed values'
TAGS = ('enrichment', 'demographics', 'consensus')
WRITE_MODE = 'overwrite'
AS
WITH age_consensus AS (
    SELECT
        seed.internal_user_id,
        demo."demographics"."age" AS age,
        COUNT(1) AS age_count,
        ROW_NUMBER() OVER (
            PARTITION BY seed.internal_user_id
            ORDER BY COUNT(1) DESC
        ) AS age_rank
    FROM company_data."your_seed_dataset" seed
    INNER JOIN narrative.rosetta_stone demo
        ON seed."_rosetta_stone"."unique_identifier"."value" = demo."unique_identifier"."value"
        AND seed."_rosetta_stone"."unique_identifier"."type" = demo."unique_identifier"."type"
    WHERE demo."demographics"."age" IS NOT NULL
    GROUP BY seed.internal_user_id, demo."demographics"."age"
),
gender_consensus AS (
    SELECT
        seed.internal_user_id,
        demo."demographics"."gender" AS gender,
        COUNT(1) AS gender_count,
        ROW_NUMBER() OVER (
            PARTITION BY seed.internal_user_id
            ORDER BY COUNT(1) DESC
        ) AS gender_rank
    FROM company_data."your_seed_dataset" seed
    INNER JOIN narrative.rosetta_stone demo
        ON seed."_rosetta_stone"."unique_identifier"."value" = demo."unique_identifier"."value"
        AND seed."_rosetta_stone"."unique_identifier"."type" = demo."unique_identifier"."type"
    WHERE demo."demographics"."gender" IS NOT NULL
    GROUP BY seed.internal_user_id, demo."demographics"."gender"
)
SELECT
    seed.internal_user_id,
    seed."_rosetta_stone"."unique_identifier"."value" AS maid,
    seed."_rosetta_stone"."unique_identifier"."type" AS maid_type,
    ac.age,
    gc.gender
FROM company_data."your_seed_dataset" seed
LEFT JOIN age_consensus ac
    ON seed.internal_user_id = ac.internal_user_id
    AND ac.age_rank = 1
LEFT JOIN gender_consensus gc
    ON seed.internal_user_id = gc.internal_user_id
    AND gc.gender_rank = 1

Step 5: Automate incremental enrichment

For growing customer bases, create an incremental view that only processes new users added since the last refresh.

Create an incremental seed view

First, identify new users not yet in your enriched dataset:
CREATE MATERIALIZED VIEW "customers_new_since_last_enrichment"
REFRESH_SCHEDULE = '@monthly'
DISPLAY_NAME = 'New Customers - Pending Enrichment'
DESCRIPTION = 'Customers added since last demographic enrichment'
TAGS = ('enrichment', 'incremental', 'staging')
WRITE_MODE = 'overwrite'
AS
SELECT
    seed.internal_user_id,
    seed."_rosetta_stone"."unique_identifier"."value" AS maid,
    seed."_rosetta_stone"."unique_identifier"."type" AS maid_type,
    seed.first_seen_date
FROM company_data."your_seed_dataset" seed
WHERE NOT EXISTS (
    SELECT 1
    FROM company_data."customers_demographics_enriched" enriched
    WHERE seed.internal_user_id = enriched.internal_user_id
)

Enrich and merge new users

CREATE MATERIALIZED VIEW "customers_demographics_incremental"
REFRESH_SCHEDULE = '@monthly'
DISPLAY_NAME = 'Customer Demographics - Incremental Update'
DESCRIPTION = 'Demographic enrichment for newly added customers'
TAGS = ('enrichment', 'demographics', 'incremental')
WRITE_MODE = 'append'
AS
SELECT
    new_users.internal_user_id,
    new_users.maid,
    new_users.maid_type,
    new_users.first_seen_date,
    demo."demographics"."age" AS age,
    demo."demographics"."gender" AS gender,
    demo."_ni_event_timestamp" AS demo_observed_at
FROM company_data."customers_new_since_last_enrichment" new_users
INNER JOIN narrative.rosetta_stone demo
    ON new_users.maid = demo."unique_identifier"."value"
    AND new_users.maid_type = demo."unique_identifier"."type"
WHERE
    demo."demographics"."age" IS NOT NULL
    AND demo."demographics"."age" BETWEEN 13 AND 100
Use WRITE_MODE = 'append' for incremental views that add new records to an existing dataset without overwriting historical data.

Step 6: Create audience segments

With enriched demographics, create targetable audience segments:

Age-based segments

CREATE MATERIALIZED VIEW "audience_segment_age_buckets"
REFRESH_SCHEDULE = '@monthly'
DISPLAY_NAME = 'Audience Segments - Age Buckets'
DESCRIPTION = 'Customers segmented by age ranges for targeting'
TAGS = ('audience', 'segmentation', 'age')
WRITE_MODE = 'overwrite'
AS
SELECT
    internal_user_id,
    maid,
    maid_type,
    age,
    gender,
    CASE
        WHEN age BETWEEN 18 AND 24 THEN '18-24'
        WHEN age BETWEEN 25 AND 34 THEN '25-34'
        WHEN age BETWEEN 35 AND 44 THEN '35-44'
        WHEN age BETWEEN 45 AND 54 THEN '45-54'
        WHEN age >= 55 THEN '55+'
        ELSE 'unknown'
    END AS age_segment
FROM company_data."customers_demographics_enriched"
WHERE age IS NOT NULL

Gender-specific segments

CREATE MATERIALIZED VIEW "audience_female_25_44"
REFRESH_SCHEDULE = '@monthly'
DISPLAY_NAME = 'Audience - Female 25-44'
DESCRIPTION = 'Female customers aged 25-44'
TAGS = ('audience', 'segmentation', 'female', '25-44')
WRITE_MODE = 'overwrite'
AS
SELECT
    internal_user_id,
    maid,
    maid_type
FROM company_data."customers_demographics_enriched"
WHERE
    gender = 'female'
    AND age BETWEEN 25 AND 44

Summary

You have built a demographic enrichment pipeline with these components:
ViewPurpose
customers_demographics_enrichedBase enrichment with age and gender
customers_demographics_latestDeduplicated using most recent observation
customers_demographics_consensusDeduplicated using most common values
customers_new_since_last_enrichmentIdentifies users pending enrichment
customers_demographics_incrementalProcesses only new users
audience_segment_age_bucketsAge-based targeting segments

Use cases

  • Media targeting: Reach specific demographic segments with relevant creative
  • Audience analytics: Understand the composition of your customer base
  • Lookalike modeling: Build seed audiences based on demographic profiles
  • Personalization: Tailor app experiences or content by age group
  • Attribution: Analyze conversion rates across demographic segments

Key advantages

Raw values, not buckets

Receive exact age values instead of pre-defined segments, giving you flexibility to create custom ranges

Omni-use license

Data is enriched once and can be used across any platform or use case without per-use fees

Automated refresh

Monthly schedules keep your enriched data current as new users join

Transparent sourcing

See which providers contribute demographic data and evaluate match quality

Important considerations

MAID availability: iOS App Tracking Transparency and Android privacy changes have reduced MAID availability. Consider supplementing with hashed email or other identifier types for broader coverage. See Mobile Ad IDs for details.
Privacy compliance: Demographic data may be subject to GDPR, CCPA, and other privacy regulations depending on source and use case. Ensure your enrichment workflow complies with applicable requirements. See CCPA compliance and GDPR compliance for details.