How it works
Building an identity graph is a two-step process:- Define edges — Tell the platform which identifiers in your data should be used to connect records. Two records that share the same identifier value (like the same email address) are linked together.
- Build the graph — The platform runs a connected components algorithm over your edges to discover which records belong to the same person or household, resolving transitive connections across multiple hops.
Edges
An edge connects a source record to a target identifier. The Edge Builder creates an edges dataset by combining your data sources with the identity attributes you choose as connection points.Key concepts
| Term | Description | Example |
|---|---|---|
| Source ID type | A label you choose to identify which system a record comes from | CRM, Website, Partner |
| Source ID | The field that uniquely identifies each record in that system | CUSTOMER_ID |
| Target ID type | The category of identifier used as a connection point — always a Rosetta Stone attribute | normalized_email, clear_text_e164_phone_number |
| Target ID | The actual identifier value for a given record, derived from the attribute mapping | [email protected], +15705551234 |
Target ID groups
Target IDs are organized into groups. Each group defines one type of connection. A group can contain a single attribute or multiple attributes — when a group has multiple attributes, all values in the group must match for two records to be connected. For example:- A group with just normalized email connects any two records sharing the same email — high confidence, since email is typically unique to a person
- A group with phone number + first name requires both values to match, which is more precise than phone alone — useful when a phone number might be shared across a household
Data sources
The Edge Builder accepts two types of sources:- Datasets (first-party) — Your own data, mapped to Rosetta Stone attributes
- Access rules (third-party) — Data shared with you by other companies. Third-party sources introduce connections that your first-party data cannot see on its own.
Graph
The Graph Builder takes one or more edges datasets and runs a Label Connected Components algorithm. It follows connections between records — including transitive chains — and groups every connected record into a single identity.Algorithm parameters
| Parameter | Default | Description |
|---|---|---|
| Max Component Size | 100 | Caps how many records can merge into one identity. Prevents over-connection. |
| Max Iterations | 10 | How many passes the algorithm makes to resolve transitive chains. |
| Max Degree Threshold | 100 | Excludes nodes with too many connections (e.g., shared corporate emails) to avoid merging unrelated records. |
Output
The graph produces a dataset where each record is assigned a component ID — all records with the same component ID belong to the same resolved identity. You can join this back to your original data for analytics, segmentation, and activation.Automation
You can run the graph build once or set a refresh schedule to keep it current as source data changes. You can also optionally encode identifiers in the output using your company’s encryption material.Data plane support
Graph Studio is available on Snowflake data planes today. AWS data plane support is coming soon.Related content
Identity Graphs
How connected components and graph structure unify identifiers
Building an Identity Graph
Step-by-step guide to creating your first graph
Graph Enrichment
Strengthen graph structure with third-party linkage data
Mapping Schemas
Map your data to Rosetta Stone attributes

