Skip to main content
This guide walks through building an identity graph from a CRM dataset using Graph Studio. By the end, you will have resolved duplicate customer records into unified identities based on shared emails and phone numbers.
Graph Studio is available on Snowflake data planes. For a conceptual overview, see Graph Studio.

Prerequisites

  • A dataset with Rosetta Stone attribute mappings for a unique identifier and at least one identity attribute (e.g., normalized email, phone number)
  • A Snowflake data plane

Example dataset

This guide uses a CRM dataset called OFFICE_CRM:
CUSTOMER_IDFIRST_NAMELAST_NAMEEMAILPHONE
CRM-001MichaelScott[email protected](570) 555-1234
CRM-002DwightSchrute[email protected]15705552345
CRM-003JimHalpert[email protected]5705553456
CRM-004MichaelScott[email protected](570) 555-9876
CRM-005PamBeesly[email protected](570) 555-4567
CRM-006MichaelScott[email protected](570) 555-9876
Michael Scott appears three times with different email and phone combinations. The goal is to resolve all three records into a single identity. The dataset is mapped to Rosetta Stone attributes for Unique Identifier (using CUSTOMER_ID), Normalized Email, Clear Text E.164 Phone Number, and Person Name.

Step 1: Build edges

Edges define how records connect to each other through shared identifiers. Navigate to My Data > Graph Studio and select the Edge builder tab.
1

Add a source dataset

Click Select Sources and choose your dataset. Set a source ID type (a label like OFFICE_CRM that identifies this system) and choose the source ID field (CUSTOMER_ID).
2

Choose target IDs

Target IDs are the Rosetta Stone attributes that serve as connection points. When two records share the same target ID value, the graph connects them.Target IDs are grouped — each group acts as a single connection type. For this example, add two target ID groups:
  1. Normalized Email — connects records that share the same email address
  2. Clear Text E.164 Phone Number + Person Name > first_name — connects records that share both the same phone number and first name. Combining these into one group means both values must match for a connection, which is more precise than matching on phone alone.
3

Finalize and build

Name the edge dataset (e.g., office_crm_edges) and click Build Edges.

Step 2: Build the graph

Switch to the Graph builder tab. This takes your edges and runs a connected components algorithm to discover which records belong to the same person.
1

Select input sources

Click Select Sources and choose the edges dataset you just created.
2

Review algorithm parameters

The defaults work well for most use cases. You can adjust max component size (caps how many records can merge into one identity), max iterations, and max degree threshold (excludes overly-connected nodes like shared corporate emails) if needed later.
3

Finalize and build

Name the graph (e.g., office_crm_graph), choose a refresh schedule, and click Build Graph +.

Results

The graph resolves the six CRM records into four identities:
IdentityRecordsHow they connected
Identity 1CRM-001, CRM-004, CRM-006All three Michael Scott records — CRM-001 and CRM-006 share the same email; CRM-004 and CRM-006 share the same phone + first name
Identity 2CRM-002Dwight Schrute
Identity 3CRM-003Jim Halpert
Identity 4CRM-005Pam Beesly
Michael Scott’s three records are resolved into a single identity even though no two records share both the same email and phone — the graph follows transitive connections across shared values to link them together. Dwight, Jim, and Pam remain as separate identities because they have no overlapping identifiers with other records in this dataset.

Next steps

  • Add third-party data — Include an access rule as an additional source in the Edge Builder to introduce connections your first-party data cannot see on its own. See Graph Enrichment.
  • Set up recurring builds — Use a refresh schedule to keep your graph current as new data arrives.