The two core primitives
Attributes
An attribute is a standardized field definition in the common schema. Each attribute specifies:| Property | Description |
|---|---|
| Name | A unique identifier (e.g., hl7_gender, event_timestamp) |
| Description | Human-readable explanation of what the attribute represents |
| Type | The data type: string, long, double, boolean, timestamptz, object, or array |
| Validations | Rules that data must satisfy (as an array of validation strings) |
unique_identifier attribute captures identity data from various sources. It’s defined as an object with three properties:
hl7_gender attribute normalizes gender data using the HL7 standard. It’s a string type with restricted enum values:
Mappings
A mapping connects a specific column in a dataset to an attribute. Each mapping includes:| Property | Description |
|---|---|
| Source column | The column in the provider’s dataset |
| Target attribute | The Rosetta Stone attribute to map to |
| Transformation | An optional expression to convert the data |
| Dataset | The specific dataset this mapping applies to |
"M" or "F" in a column called sex. The mapping would be:
The normalization pipeline
Rosetta Stone normalizes data through a three-stage pipeline:Stage 1: Schema inference
When data is uploaded to Narrative, the system analyzes it to understand its structure:- Column detection: Identifies column names and data types
- Pattern recognition: Detects common patterns (dates, identifiers, categorical data)
- Attribute suggestion: Uses machine learning to suggest which attributes each column maps to
Stage 2: Mapping creation
Mappings are created through a combination of machine learning and human curation:- Auto-generated mappings: The system proposes mappings based on schema inference
- Human review: Data owners review suggestions and refine as needed
- Transformation definition: Complex mappings include transformation expressions
AI-assisted quality evaluation
After mappings are created, you can use AI to evaluate their quality and suggest improvements:- Confidence scoring: AI analyzes each mapping and assigns a confidence score (0-100%) indicating how likely the mapping is to produce correct results
- Issue identification: The system highlights potential problems with transformations, such as missing case handling or type mismatches
- Suggestion generation: AI proposes new mappings for columns that aren’t yet normalized
Stage 3: Query-time translation
When you query thenarrative.rosetta_stone table:
- Query analysis: The system identifies which attributes you’re requesting
- Dataset discovery: Finds all datasets with mappings for those attributes
- Query translation: Rewrites your query for each dataset’s native schema
- Execution: Runs the translated queries against source data
- Normalization: Applies transformations and unions results
- Return: Delivers data in the consistent, normalized format
Multiple mappings to the same attribute
When a dataset has multiple columns mapped to the same attribute, the normalization process:- Evaluates each mapping independently
- Collects results into an array
- Expands the array to produce one output row per value
- Filters NULL values during expansion
email_1 and email_2 both mapped to raw_email, a query selecting that attribute returns one row per non-null email address. See Multiple columns mapped to the same attribute for detailed examples and when to use alternatives like COALESCE.
Normalization examples
Date normalization
Different providers store dates in various formats:| Provider | Column | Sample value |
|---|---|---|
| Provider A | event_date | 01/15/2024 |
| Provider B | timestamp | 2024-01-15T14:30:00Z |
| Provider C | dt | 15-Jan-2024 |
event_timestamp attribute. The mappings include transformations that parse each format and output ISO 8601:
event_timestamp, you receive consistent ISO 8601 timestamps regardless of source.
Gender normalization
Providers represent gender in many ways:| Provider | Column | Values |
|---|---|---|
| Provider A | gender | "male", "female" |
| Provider B | sex | "M", "F" |
| Provider C | gender_code | 1, 2, 0 |
| Provider D | gndr | "m", "f", "nb" |
hl7_gender enum:
Validation and quality
Mappings aren’t just translations—they’re quality gates. Each mapping can enforce validations: Type checking: Ensures values can be cast to the target type Enum validation: Confirms values match allowed enum members Range checking: Verifies numeric values fall within acceptable bounds Pattern matching: Validates strings match expected formats (e.g., email patterns) When data fails validation, the system can:- Reject the record
- Map to a default value (like
unknownfor invalid gender) - Flag the record for review

