Familiarize yourself with basic mapping before tackling these advanced scenarios.
Multi-value fields
When a single source column contains multiple values (often comma-separated or delimited).Scenario
Your source data has aninterests column with values like:
"sports,music,travel""cooking""tech,gaming,music,movies"
Solution: Split to array
Map to an array attribute usingSPLIT:
"sports,music,travel" into ["sports", "music", "travel"].
Handling whitespace
If values might have spaces around delimiters:Handling empty values
Filter out empty strings that result from trailing delimiters:Conditional mappings
When different source values require completely different transformation logic.Scenario: Different ID types in one column
Youridentifier column contains mixed ID types:
"email:[email protected]""phone:5551234567""cookie:abc123xyz"
Solution: Parse and route
Scenario: Different date formats in one column
Some rows useMM/DD/YYYY, others use YYYY-MM-DD:
Missing or null data
Handling columns that are frequently null or contain placeholder values.Default values
Provide a fallback when data is missing:Coalescing from multiple columns
Try multiple columns in order:Treating placeholders as null
Your data uses"N/A" or "-" as placeholders:
Conditional defaults
Different defaults based on context:Type coercion challenges
When source data types don’t cleanly match target types.String to numeric with formatting
Source has currency formatting:"$1,234.56"
Boolean interpretations
Sources represent booleans differently:| Source values | Transformation |
|---|---|
"yes"/"no" | LOWER(col) = 'yes' |
"Y"/"N" | UPPER(col) = 'Y' |
"1"/"0" | col = '1' |
"true"/"false" | LOWER(col) = 'true' |
1/0 (numeric) | col = 1 |
Timestamp with multiple formats
Parse dates that might be in several formats:TRY_TO_TIMESTAMP function returns null instead of erroring on parse failure, allowing the COALESCE to try the next format.
One-to-many relationships
When a single source column maps to multiple parts of a struct attribute.Scenario
Yourfull_address column contains: "123 Main St, Boston, MA 02101"
You need to map to a struct with separate fields for street, city, state, and postal code.
Solution: Parse components
Many-to-one relationships
When multiple source columns combine into a single attribute.Scenario: Separate date and time columns
Source hasevent_date and event_time as separate columns:
Scenario: First name and last name
Combine into a full name:Scenario: Building a struct from multiple columns
Create aunique_identifier from separate columns:
Multiple columns mapped to the same attribute
When multiple source columns map to the same Rosetta Stone attribute, each mapping produces a value. The NQL compiler collects these into an array and automatically expands (unnests) them into separate rows.Scenario
Your dataset has multiple email columns:| user_id | name | email_1 | email_2 |
|---|---|---|---|
| 1 | Alice | [email protected] | [email protected] |
| 2 | Bob | [email protected] | NULL |
email_1 and email_2 are mapped to the raw_email attribute.
Query behavior
When you query:| user_id | raw_email |
|---|---|
| 1 | [email protected] |
| 1 | [email protected] |
| 2 | [email protected] |
Why this happens
The NQL compiler:- Evaluates each mapping to
raw_email(bothemail_1andemail_2) - Collects the results into an array:
['[email protected]', '[email protected]'] - Automatically unnests the array, creating one output row per array element
- Filters out NULL values during expansion
When to use alternatives
Use COALESCE when you want only one value per row: If your source columns represent “primary email” and “backup email” with a preference order, map them using:Performance considerations
Row expansion multiplies result set size. If a dataset has 5 columns mapped to the same attribute and 1 million source rows, queries could return up to 5 million rows. Consider:- Filtering early to reduce the row count before expansion
- Using COALESCE if you only need one value per source row
- Being explicit about which specific columns you need
Version handling
When your source data schema changes over time.Scenario: Column renamed in new data
Old data hassex, new data has gender:
Scenario: New values added to enum
Old data:"M", "F"
New data: "M", "F", "NB", "O"
Update your transformation to handle all values:
Scenario: Data format changed
Old timestamps in epoch milliseconds, new data in ISO 8601:Handling arrays of structs
When your data contains nested structures.Scenario: Multiple identifiers per row
Source columnidentifiers contains JSON like:
Solution: Parse JSON array
Handling special characters
When source data contains characters that need escaping or encoding.URL encoding
Source has URL-encoded values:HTML entities
Source has HTML entities like&:
Unicode normalization
Ensure consistent unicode representation:Performance considerations
Complex transformations can impact query performance.Avoid expensive operations in hot paths
If a transformation is complex, consider:- Pre-computing values in a materialized view
- Simplifying the logic where possible
- Breaking into multiple simpler mappings
Test with realistic data volumes
Edge case handling that works in tests might be slow at scale. Test with production-like data volumes.Related content
Mapping Schemas
Basic mapping creation
Validating Mappings
Test your edge case handling
Transformation Functions
Complete function reference
The Normalization Model
Understand attribute types and validations

