This guide covers solutions for complex mapping scenarios that go beyond simple column-to-attribute translations. Use these patterns when your data doesn’t fit neatly into the standard mapping model.
Familiarize yourself with basic mapping before tackling these advanced scenarios.
Multi-value fields
When a single source column contains multiple values (often comma-separated or delimited).
Scenario
Your source data has an interests column with values like:
"sports,music,travel"
"cooking"
"tech,gaming,music,movies"
Solution: Split to array
Map to an array attribute using SPLIT:
This transforms "sports,music,travel" into ["sports", "music", "travel"].
Handling whitespace
If values might have spaces around delimiters:
TRANSFORM(SPLIT(interests, ','), x -> TRIM(x))
Handling empty values
Filter out empty strings that result from trailing delimiters:
FILTER(SPLIT(interests, ','), x -> LENGTH(TRIM(x)) > 0)
Conditional mappings
When different source values require completely different transformation logic.
Scenario: Different ID types in one column
Your identifier column contains mixed ID types:
Solution: Parse and route
STRUCT(
CASE
WHEN identifier LIKE 'email:%' THEN 'email'
WHEN identifier LIKE 'phone:%' THEN 'phone'
WHEN identifier LIKE 'cookie:%' THEN 'cookie_id'
ELSE 'unknown'
END AS type,
SUBSTRING(identifier, POSITION(':' IN identifier) + 1) AS value,
CASE
WHEN identifier LIKE 'email:%' THEN 'hashed_email'
WHEN identifier LIKE 'phone:%' THEN 'hashed_phone'
WHEN identifier LIKE 'cookie:%' THEN 'advertising'
ELSE 'other'
END AS context
)
Some rows use MM/DD/YYYY, others use YYYY-MM-DD:
CASE
WHEN date_col LIKE '____-__-__' THEN
TO_TIMESTAMP(date_col, 'YYYY-MM-DD')
WHEN date_col LIKE '__/__/____' THEN
TO_TIMESTAMP(date_col, 'MM/DD/YYYY')
ELSE
NULL
END
Missing or null data
Handling columns that are frequently null or contain placeholder values.
Default values
Provide a fallback when data is missing:
COALESCE(gender, 'unknown')
Coalescing from multiple columns
Try multiple columns in order:
COALESCE(primary_email, secondary_email, backup_email)
Treating placeholders as null
Your data uses "N/A" or "-" as placeholders:
CASE
WHEN UPPER(value) IN ('N/A', 'NA', '-', 'NONE', 'NULL', '') THEN NULL
ELSE value
END
Conditional defaults
Different defaults based on context:
CASE
WHEN country = 'US' AND state IS NULL THEN 'unknown_us_state'
WHEN country != 'US' AND state IS NULL THEN NULL -- Non-US doesn't need state
ELSE state
END
Type coercion challenges
When source data types don’t cleanly match target types.
Source has currency formatting: "$1,234.56"
CAST(
REPLACE(REPLACE(amount, '$', ''), ',', '')
AS FLOAT
)
Boolean interpretations
Sources represent booleans differently:
| Source values | Transformation |
|---|
"yes"/"no" | LOWER(col) = 'yes' |
"Y"/"N" | UPPER(col) = 'Y' |
"1"/"0" | col = '1' |
"true"/"false" | LOWER(col) = 'true' |
1/0 (numeric) | col = 1 |
Handle multiple formats:
CASE
WHEN UPPER(col) IN ('Y', 'YES', 'TRUE', '1') THEN true
WHEN UPPER(col) IN ('N', 'NO', 'FALSE', '0') THEN false
ELSE NULL
END
Parse dates that might be in several formats:
COALESCE(
TRY_TO_TIMESTAMP(date_col, 'YYYY-MM-DD''T''HH24:MI:SS'),
TRY_TO_TIMESTAMP(date_col, 'MM/DD/YYYY'),
TRY_TO_TIMESTAMP(date_col, 'DD-Mon-YYYY'),
TRY_TO_TIMESTAMP(date_col, 'YYYYMMDD')
)
The TRY_TO_TIMESTAMP function returns null instead of erroring on parse failure, allowing the COALESCE to try the next format.
One-to-many relationships
When a single source column maps to multiple parts of a struct attribute.
Scenario
Your full_address column contains: "123 Main St, Boston, MA 02101"
You need to map to a struct with separate fields for street, city, state, and postal code.
Solution: Parse components
STRUCT(
TRIM(SPLIT_PART(full_address, ',', 1)) AS street,
TRIM(SPLIT_PART(full_address, ',', 2)) AS city,
TRIM(SPLIT_PART(
SPLIT_PART(full_address, ',', 3), ' ', 1
)) AS state,
TRIM(SPLIT_PART(
SPLIT_PART(full_address, ',', 3), ' ', 2
)) AS postal_code
)
Parsing unstructured text is error-prone. Consider whether the complexity is worth it, or if the source data can be improved.
Many-to-one relationships
When multiple source columns combine into a single attribute.
Scenario: Separate date and time columns
Source has event_date and event_time as separate columns:
TO_TIMESTAMP(
CONCAT(event_date, ' ', event_time),
'YYYY-MM-DD HH24:MI:SS'
)
Scenario: First name and last name
Combine into a full name:
TRIM(CONCAT(COALESCE(first_name, ''), ' ', COALESCE(last_name, '')))
Scenario: Building a struct from multiple columns
Create a unique_identifier from separate columns:
STRUCT(
id_type AS type,
id_value AS value,
COALESCE(id_context, 'default') AS context
)
Multiple columns mapped to the same attribute
When multiple source columns map to the same Rosetta Stone attribute, each mapping produces a value. The NQL compiler collects these into an array and automatically expands (unnests) them into separate rows.
Scenario
Your dataset has multiple email columns:
Both email_1 and email_2 are mapped to the raw_email attribute.
Query behavior
When you query:
SELECT
user_id,
rosetta_stone.raw_email
FROM narrative.rosetta_stone
WHERE ...
The result contains one row per non-null email:
Why this happens
The NQL compiler:
- Evaluates each mapping to
raw_email (both email_1 and email_2)
- Collects the results into an array:
['[email protected]', '[email protected]']
- Automatically unnests the array, creating one output row per array element
- Filters out NULL values during expansion
This behavior enables queries that need all available values for an attribute without requiring you to know the source schema.
When to use alternatives
Use COALESCE when you want only one value per row:
If your source columns represent “primary email” and “backup email” with a preference order, map them using:
COALESCE(primary_email, secondary_email)
This returns only the first non-null value instead of expanding to multiple rows.
Row expansion multiplies result set size. If a dataset has 5 columns mapped to the same attribute and 1 million source rows, queries could return up to 5 million rows. Consider:
- Filtering early to reduce the row count before expansion
- Using COALESCE if you only need one value per source row
- Being explicit about which specific columns you need
Version handling
When your source data schema changes over time.
Scenario: Column renamed in new data
Old data has sex, new data has gender:
Scenario: New values added to enum
Old data: "M", "F"
New data: "M", "F", "NB", "O"
Update your transformation to handle all values:
CASE UPPER(COALESCE(gender, sex))
WHEN 'M' THEN 'male'
WHEN 'F' THEN 'female'
WHEN 'NB' THEN 'other'
WHEN 'O' THEN 'other'
ELSE 'unknown'
END
Old timestamps in epoch milliseconds, new data in ISO 8601:
CASE
WHEN timestamp_col ~ '^\d+$' THEN
-- Numeric: assume epoch milliseconds
TO_TIMESTAMP(CAST(timestamp_col AS BIGINT) / 1000)
ELSE
-- String: assume ISO 8601
TO_TIMESTAMP(timestamp_col, 'YYYY-MM-DD''T''HH24:MI:SS')
END
Handling arrays of structs
When your data contains nested structures.
Scenario: Multiple identifiers per row
Source column identifiers contains JSON like:
[
{"type": "email", "value": "[email protected]"},
{"type": "phone", "value": "5551234567"}
]
Solution: Parse JSON array
TRANSFORM(
JSON_PARSE(identifiers),
x -> STRUCT(
x.type AS type,
x.value AS value,
'parsed' AS context
)
)
Handling special characters
When source data contains characters that need escaping or encoding.
URL encoding
Source has URL-encoded values:
URL_DECODE(encoded_value)
HTML entities
Source has HTML entities like &:
REPLACE(
REPLACE(
REPLACE(value, '&', '&'),
'<', '<'
),
'>', '>'
)
Unicode normalization
Ensure consistent unicode representation:
NORMALIZE(unicode_string, 'NFC')
Complex transformations can impact query performance.
Avoid expensive operations in hot paths
If a transformation is complex, consider:
- Pre-computing values in a materialized view
- Simplifying the logic where possible
- Breaking into multiple simpler mappings
Test with realistic data volumes
Edge case handling that works in tests might be slow at scale. Test with production-like data volumes.
Related content