Enrich customer lists with age and gender data using identity matching and automated refresh
A mobile gaming company has rapidly grown to 2 million users but knows very little about who those users are. Since the game doesn’t require account registration, they lack basic demographic information needed for effective ad targeting and audience insights. Through Narrative, they can enrich their user base with age and gender data from identity providers, then automate monthly refreshes for new users.
First, upload your customer identifier file to Narrative. This dataset serves as the seed list you want to enrich.Your dataset should include at minimum:
A unique identifier column (MAID, hashed email, etc.)
An identifier type column if you have mixed identifier types
Example schema for a MAID-based seed file:
Column
Type
Description
maid
string
Mobile advertising ID (IDFA or GAID)
maid_type
string
Identifier type: idfa or gaid
internal_user_id
string
Your internal user ID for joining back to source systems
first_seen_date
date
When the user first appeared in your system
Upload your seed dataset through the Narrative UI or SDK. Map the identifier column to the unique_identifier Rosetta Stone attribute for identity matching.
Ensure your identifier column is mapped to Rosetta Stone’s unique_identifier attribute:
Copy
Ask AI
-- Verify your dataset has Rosetta Stone mappingsSELECT seed."_rosetta_stone"."unique_identifier"."value" AS maid, seed."_rosetta_stone"."unique_identifier"."type" AS maid_type, seed.internal_user_idFROM company_data."your_seed_dataset" seedLIMIT 10
Query the Rosetta Stone to understand which demographic attributes are available across your data collaborations:
Copy
Ask AI
-- Preview available demographic dataSELECT rs."unique_identifier"."type" AS id_type, rs."demographics"."age" AS age, rs."demographics"."gender" AS gender, COUNT(1) AS record_countFROM narrative.rosetta_stone rsWHERE rs."demographics"."age" IS NOT NULL OR rs."demographics"."gender" IS NOT NULLGROUP BY rs."unique_identifier"."type", rs."demographics"."age", rs."demographics"."gender"LIMIT 100
This shows you the identifier types and demographic coverage across available providers.
Create a materialized view that joins your seed dataset with demographic data through Rosetta Stone identity matching.
Copy
Ask AI
CREATE MATERIALIZED VIEW "customers_demographics_enriched"REFRESH_SCHEDULE = '@monthly'DISPLAY_NAME = 'Customer Demographics - Enriched'DESCRIPTION = 'Seed customer list enriched with age and gender from data providers'TAGS = ('enrichment', 'demographics', 'audience')WRITE_MODE = 'overwrite'ASSELECT seed.internal_user_id, seed."_rosetta_stone"."unique_identifier"."value" AS maid, seed."_rosetta_stone"."unique_identifier"."type" AS maid_type, seed.first_seen_date, demo."demographics"."age" AS age, demo."demographics"."gender" AS gender, demo."_ni_event_timestamp" AS demo_observed_atFROM company_data."your_seed_dataset" seedINNER JOIN narrative.rosetta_stone demo ON seed."_rosetta_stone"."unique_identifier"."value" = demo."unique_identifier"."value" AND seed."_rosetta_stone"."unique_identifier"."type" = demo."unique_identifier"."type"WHERE demo."demographics"."age" IS NOT NULL AND demo."demographics"."age" BETWEEN 13 AND 100 AND demo."event_timestamp" > CURRENT_TIMESTAMP - INTERVAL '180' DAY
Key patterns used:
Pattern
Purpose
INNER JOIN narrative.rosetta_stone
Matches seed identifiers to demographic records
unique_identifier.value and unique_identifier.type
Ensures exact identifier matching
demographics.age BETWEEN 13 AND 100
Filters to valid age ranges
INTERVAL '180' DAY
Uses recent demographic observations for accuracy
REFRESH_SCHEDULE = '@monthly'
Automates recurring enrichment
Unlike pre-bucketed audience segments (e.g., “25-34”), Rosetta Stone provides raw age values. This gives you flexibility to create your own segments, perform precise analytics, or build custom targeting strategies.
CREATE MATERIALIZED VIEW "customers_demographics_latest"REFRESH_SCHEDULE = '@monthly'DISPLAY_NAME = 'Customer Demographics - Latest Observation'DESCRIPTION = 'Enriched demographics using most recent observation per customer'TAGS = ('enrichment', 'demographics', 'deduplicated')WRITE_MODE = 'overwrite'ASSELECT internal_user_id, maid, maid_type, first_seen_date, age, gender, demo_observed_atFROM ( SELECT seed.internal_user_id, seed."_rosetta_stone"."unique_identifier"."value" AS maid, seed."_rosetta_stone"."unique_identifier"."type" AS maid_type, seed.first_seen_date, demo."demographics"."age" AS age, demo."demographics"."gender" AS gender, demo."_ni_event_timestamp" AS demo_observed_at, ROW_NUMBER() OVER ( PARTITION BY seed.internal_user_id ORDER BY demo."_ni_event_timestamp" DESC ) AS row_num FROM company_data."your_seed_dataset" seed INNER JOIN narrative.rosetta_stone demo ON seed."_rosetta_stone"."unique_identifier"."value" = demo."unique_identifier"."value" AND seed."_rosetta_stone"."unique_identifier"."type" = demo."unique_identifier"."type" WHERE demo."demographics"."age" IS NOT NULL AND demo."demographics"."age" BETWEEN 13 AND 100)WHERE row_num = 1
When you want the most common demographic value across multiple providers:
Copy
Ask AI
CREATE MATERIALIZED VIEW "customers_demographics_consensus"REFRESH_SCHEDULE = '@monthly'DISPLAY_NAME = 'Customer Demographics - Consensus'DESCRIPTION = 'Enriched demographics using most frequently observed values'TAGS = ('enrichment', 'demographics', 'consensus')WRITE_MODE = 'overwrite'ASWITH age_consensus AS ( SELECT seed.internal_user_id, demo."demographics"."age" AS age, COUNT(1) AS age_count, ROW_NUMBER() OVER ( PARTITION BY seed.internal_user_id ORDER BY COUNT(1) DESC ) AS age_rank FROM company_data."your_seed_dataset" seed INNER JOIN narrative.rosetta_stone demo ON seed."_rosetta_stone"."unique_identifier"."value" = demo."unique_identifier"."value" AND seed."_rosetta_stone"."unique_identifier"."type" = demo."unique_identifier"."type" WHERE demo."demographics"."age" IS NOT NULL GROUP BY seed.internal_user_id, demo."demographics"."age"),gender_consensus AS ( SELECT seed.internal_user_id, demo."demographics"."gender" AS gender, COUNT(1) AS gender_count, ROW_NUMBER() OVER ( PARTITION BY seed.internal_user_id ORDER BY COUNT(1) DESC ) AS gender_rank FROM company_data."your_seed_dataset" seed INNER JOIN narrative.rosetta_stone demo ON seed."_rosetta_stone"."unique_identifier"."value" = demo."unique_identifier"."value" AND seed."_rosetta_stone"."unique_identifier"."type" = demo."unique_identifier"."type" WHERE demo."demographics"."gender" IS NOT NULL GROUP BY seed.internal_user_id, demo."demographics"."gender")SELECT seed.internal_user_id, seed."_rosetta_stone"."unique_identifier"."value" AS maid, seed."_rosetta_stone"."unique_identifier"."type" AS maid_type, ac.age, gc.genderFROM company_data."your_seed_dataset" seedLEFT JOIN age_consensus ac ON seed.internal_user_id = ac.internal_user_id AND ac.age_rank = 1LEFT JOIN gender_consensus gc ON seed.internal_user_id = gc.internal_user_id AND gc.gender_rank = 1
CREATE MATERIALIZED VIEW "audience_segment_age_buckets"REFRESH_SCHEDULE = '@monthly'DISPLAY_NAME = 'Audience Segments - Age Buckets'DESCRIPTION = 'Customers segmented by age ranges for targeting'TAGS = ('audience', 'segmentation', 'age')WRITE_MODE = 'overwrite'ASSELECT internal_user_id, maid, maid_type, age, gender, CASE WHEN age BETWEEN 18 AND 24 THEN '18-24' WHEN age BETWEEN 25 AND 34 THEN '25-34' WHEN age BETWEEN 35 AND 44 THEN '35-44' WHEN age BETWEEN 45 AND 54 THEN '45-54' WHEN age >= 55 THEN '55+' ELSE 'unknown' END AS age_segmentFROM company_data."customers_demographics_enriched"WHERE age IS NOT NULL
MAID availability: iOS App Tracking Transparency and Android privacy changes have reduced MAID availability. Consider supplementing with hashed email or other identifier types for broader coverage. See Mobile Ad IDs for details.
Privacy compliance: Demographic data may be subject to GDPR, CCPA, and other privacy regulations depending on source and use case. Ensure your enrichment workflow complies with applicable requirements. See CCPA compliance and GDPR compliance for details.