Skip to main content
An identity graph is a data structure that connects identifiers—emails, device IDs, phone numbers, cookies—across devices, channels, and platforms into unified profiles. Each cluster of connected identifiers (a connected component) represents an individual or household. Organizations use identity graphs to move from fragmented, device-level data to a coherent view of their customers. The quality of that graph directly determines the quality of downstream analytics, segmentation, and activation.

What is an identity graph

An identity graph consists of three elements:
  • Nodes are individual identifiers: a hashed email, a mobile advertising ID (MAID), a cookie, a phone number, a postal address
  • Edges are linkages between identifiers, representing evidence that two identifiers belong to the same person or household
  • Connected components are clusters of nodes where every identifier is reachable from every other through some path of edges—each component represents a resolved identity
For example, a single customer might generate nodes for their work email, personal email, iPhone IDFA, Android GAID, and home address. As the graph discovers linkages between these identifiers, they merge into a single connected component.

Why graph quality matters

Graph quality is a balance between two failure modes:
ProblemCauseConsequence
Under-connectedToo few or too conservative linkagesFragmented user profiles, duplicated outreach, inflated audience counts
Over-connectedToo many or too liberal linkagesMerged distinct individuals, corrupted targeting, wasted spend on wrong audiences
Under-connected graphs treat the same person as multiple people. Over-connected graphs treat multiple people as one. Both degrade every downstream use case—segmentation, frequency capping, attribution, and personalization.

How the graph grows

Identity graphs typically build in layers:
  1. First-party data as foundation. Login events, CRM records, and transaction data create high-confidence, deterministic linkages. A customer who logs into your app with their email on their iPhone creates a direct edge between that hashed email and that device’s IDFA.
  2. Third-party data adding edges. External identity providers contribute additional linkages that your first-party data cannot observe. A provider might link your customer’s email to a second device ID or a postal address that you have never seen.
  3. Deterministic vs. probabilistic linkages. Deterministic linkages come from direct observation (same user logged in on two devices). Probabilistic linkages are inferred from signals like shared IP addresses, co-location patterns, or behavioral similarity. Deterministic linkages are more reliable but harder to scale; probabilistic linkages offer broader reach but carry higher false-match risk.

Two approaches to using third-party identity data

When organizations purchase third-party identity data, the data serves one of two distinct purposes:
Graph enrichmentAddressability expansion
Primary benefitImproved identity resolutionImproved media activation reach
Effect on graph structureAdds or strengthens edges between nodesAppends identifiers to existing nodes
Impact on segmentationCan change segment compositionDoes not change segment composition
Impact on match ratesIndirect (better resolution enables better matching)Direct (more identifiers per record increases match rates)
Risk profileHigher (bad linkages corrupt graph structure)Lower (bad appends reduce match rates but don’t corrupt structure)
Understanding which approach you need—or whether you need both—is the first decision in any identity data strategy.

Identity graphs in Narrative

Narrative’s platform connects to identity graph concepts in several ways:

Rosetta Stone and unique identifiers

Rosetta Stone’s unique_identifier attribute provides a standardized way to match identifiers across suppliers. When you join on unique_identifier.value and unique_identifier.type, Narrative handles format normalization—consistent MAID casing, standardized email hashing—so that identifiers from different sources match correctly.

Cross-supplier identity resolution

Different data suppliers contribute different identity linkages. Narrative enables you to evaluate and combine these linkages through NQL queries, comparing coverage, overlap, and incremental contribution across providers before committing to a purchase.

Narrative ID

Narrative ID provides a privacy-preserving mechanism for cross-partner matching. It enables identity resolution across organizations without exposing raw identifiers, maintaining graph connectivity while protecting PII.