Graph Studio is available on Snowflake data planes. For a conceptual overview, see Graph Studio.
Prerequisites
- A dataset with Rosetta Stone attribute mappings for a unique identifier and at least one identity attribute (e.g., normalized email, phone number)
- A Snowflake data plane
Example dataset
This guide uses a CRM dataset calledOFFICE_CRM:
| CUSTOMER_ID | FIRST_NAME | LAST_NAME | PHONE | |
|---|---|---|---|---|
| CRM-001 | Michael | Scott | [email protected] | (570) 555-1234 |
| CRM-002 | Dwight | Schrute | [email protected] | 15705552345 |
| CRM-003 | Jim | Halpert | [email protected] | 5705553456 |
| CRM-004 | Michael | Scott | [email protected] | (570) 555-9876 |
| CRM-005 | Pam | Beesly | [email protected] | (570) 555-4567 |
| CRM-006 | Michael | Scott | [email protected] | (570) 555-9876 |
CUSTOMER_ID), Normalized Email, Clear Text E.164 Phone Number, and Person Name.
Step 1: Build edges
Edges define how records connect to each other through shared identifiers. Navigate to My Data > Graph Studio and select the Edge builder tab.Add a source dataset
Click Select Sources and choose your dataset. Set a source ID type (a label like
OFFICE_CRM that identifies this system) and choose the source ID field (CUSTOMER_ID).Choose target IDs
Target IDs are the Rosetta Stone attributes that serve as connection points. When two records share the same target ID value, the graph connects them.Target IDs are grouped — each group acts as a single connection type. For this example, add two target ID groups:
- Normalized Email — connects records that share the same email address
- Clear Text E.164 Phone Number + Person Name > first_name — connects records that share both the same phone number and first name. Combining these into one group means both values must match for a connection, which is more precise than matching on phone alone.
Step 2: Build the graph
Switch to the Graph builder tab. This takes your edges and runs a connected components algorithm to discover which records belong to the same person.Review algorithm parameters
The defaults work well for most use cases. You can adjust max component size (caps how many records can merge into one identity), max iterations, and max degree threshold (excludes overly-connected nodes like shared corporate emails) if needed later.
Results
The graph resolves the six CRM records into four identities:| Identity | Records | How they connected |
|---|---|---|
| Identity 1 | CRM-001, CRM-004, CRM-006 | All three Michael Scott records — CRM-001 and CRM-006 share the same email; CRM-004 and CRM-006 share the same phone + first name |
| Identity 2 | CRM-002 | Dwight Schrute |
| Identity 3 | CRM-003 | Jim Halpert |
| Identity 4 | CRM-005 | Pam Beesly |
Next steps
- Add third-party data — Include an access rule as an additional source in the Edge Builder to introduce connections your first-party data cannot see on its own. See Graph Enrichment.
- Set up recurring builds — Use a refresh schedule to keep your graph current as new data arrives.

