Skip to main content
Graph Studio is the platform’s identity graph building tool. It provides two builders — Edge Builder and Graph Builder — that work together to transform your raw data into a connected identity graph.

How it works

Building an identity graph is a two-step process:
  1. Define edges — Tell the platform which identifiers in your data should be used to connect records. Two records that share the same identifier value (like the same email address) are linked together.
  2. Build the graph — The platform runs a connected components algorithm over your edges to discover which records belong to the same person or household, resolving transitive connections across multiple hops.

Edges

An edge connects a source record to a target identifier. The Edge Builder creates an edges dataset by combining your data sources with the identity attributes you choose as connection points.

Key concepts

TermDescriptionExample
Source ID typeA label you choose to identify which system a record comes fromCRM, Website, Partner
Source IDThe field that uniquely identifies each record in that systemCUSTOMER_ID
Target ID typeThe category of identifier used as a connection point — always a Rosetta Stone attributenormalized_email, clear_text_e164_phone_number
Target IDThe actual identifier value for a given record, derived from the attribute mapping[email protected], +15705551234
Each source record produces one edge per target ID group. When two records from any source share the same target ID value, the graph recognizes them as connected.

Target ID groups

Target IDs are organized into groups. Each group defines one type of connection. A group can contain a single attribute or multiple attributes — when a group has multiple attributes, all values in the group must match for two records to be connected. For example:
  • A group with just normalized email connects any two records sharing the same email — high confidence, since email is typically unique to a person
  • A group with phone number + first name requires both values to match, which is more precise than phone alone — useful when a phone number might be shared across a household
You can define multiple target ID groups to give the graph different ways to find connections. The algorithm considers all groups when resolving identities.

Data sources

The Edge Builder accepts two types of sources:
  • Datasets (first-party) — Your own data, mapped to Rosetta Stone attributes
  • Access rules (third-party) — Data shared with you by other companies. Third-party sources introduce connections that your first-party data cannot see on its own.

Graph

The Graph Builder takes one or more edges datasets and runs a Label Connected Components algorithm. It follows connections between records — including transitive chains — and groups every connected record into a single identity.

Algorithm parameters

ParameterDefaultDescription
Max Component Size100Caps how many records can merge into one identity. Prevents over-connection.
Max Iterations10How many passes the algorithm makes to resolve transitive chains.
Max Degree Threshold100Excludes nodes with too many connections (e.g., shared corporate emails) to avoid merging unrelated records.
The defaults work well for most use cases.

Output

The graph produces a dataset where each record is assigned a component ID — all records with the same component ID belong to the same resolved identity. You can join this back to your original data for analytics, segmentation, and activation.

Automation

You can run the graph build once or set a refresh schedule to keep it current as source data changes. You can also optionally encode identifiers in the output using your company’s encryption material.

Data plane support

Graph Studio is available on Snowflake data planes today. AWS data plane support is coming soon.

Identity Graphs

How connected components and graph structure unify identifiers

Building an Identity Graph

Step-by-step guide to creating your first graph

Graph Enrichment

Strengthen graph structure with third-party linkage data

Mapping Schemas

Map your data to Rosetta Stone attributes