The attribute hierarchy
Attributes exist at different scopes within the platform:Global attributes
Global attributes are available to all organizations on the platform. They represent standardized concepts that are commonly used across data collaboration scenarios:- Identity attributes:
email_sha256,phone_sha256,unique_identifier - Demographic attributes:
hl7_gender,age,birth_year - Temporal attributes:
event_timestamp,nio_last_modified - Geographic attributes:
country_code,postal_code,geo_coordinates
Organization attributes
Organizations can create custom attributes for concepts specific to their domain or use case. Organization attributes:- Are visible only within your organization (and to partners you explicitly share with)
- Can extend or specialize global attributes
- Follow the same type system and validation rules
Attribute composition
Attributes can reference other attributes, enabling complex data structures:"$ref": 100 references the unique_identifier attribute (ID 100). This allows reuse of standardized definitions while building domain-specific structures.
The type system
Attributes use a type system that supports both primitive and complex data.Primitive types
| Type | Description | Example values |
|---|---|---|
string | Text data of variable length | "hello", "[email protected]" |
long | Whole numbers (64-bit integer) | 42, -17, 0 |
double | Decimal numbers (64-bit float) | 3.14, -0.001, 1000.0 |
boolean | True/false values | true, false |
timestamptz | Date and time with timezone (ISO 8601) | 2024-01-15T14:30:00Z |
Enum type
Enums restrict string values to a predefined set. An enum is astring type with an enum property:
unknown.
Object type
Objects group related fields into a single attribute usingproperties:
required array specifies which properties must be present. When mapping to an object, the transformation must produce all required fields:
Array type
Arrays contain multiple values of the same type, specified withitems:
Reference type
References link to other attribute definitions using$ref with the numeric attribute ID:
$ref: 300 references the event_timestamp attribute and $ref: 100 references the unique_identifier attribute. References inherit the type, validations, and semantics of the referenced attribute.
Validations
Validations enforce data quality at the attribute level. Validations are specified as an array of strings.Required fields in objects
For object types, therequired array specifies which properties must be present:
middle_name is optional because it’s not in the required array.
Range constraints
Numeric fields can have minimum and maximum bounds using thevalidations array:
String length
Constrain string length:Pattern matching
Validate strings against regular expressions:Combining validations
Multiple validations can be combined in a single array:Schema presets
Schema presets are curated collections of attributes designed for common use cases.Available presets
| Preset | Description | Key attributes |
|---|---|---|
| Demographics | Consumer demographic data | hl7_gender, age, birth_year, country_code |
| Identity | User identification | unique_identifier, email_sha256, phone_sha256 |
| Events | Timestamped occurrences | event_timestamp, event_type, event_properties |
| Location | Geographic data | geo_coordinates, country_code, postal_code |
| Marketing | Campaign and engagement | campaign_id, channel, conversion_timestamp |
Using a preset
When configuring a new dataset, you can select a preset to automatically suggest mappings for standard attributes. This accelerates the mapping process for common data types.Creating custom presets
Organizations can create private presets that bundle:- A set of attributes (global or organization-specific)
- Default transformation templates
- Validation rules
The narrative.rosetta_stone table
Thenarrative.rosetta_stone table is a virtual table that provides unified access to all normalized data.
How it works
When you querynarrative.rosetta_stone:
- The query planner identifies which attributes you’re selecting
- It finds all datasets with active mappings for those attributes
- It rewrites queries for each dataset using the mapping transformations
- Results are unioned and returned in the normalized format
Querying the table
System columns
The table includes system columns for traceability:| Column | Description |
|---|---|
_nio_source_dataset_id | ID of the source dataset |
_nio_source_row_id | Original row identifier |
_nio_mapping_version | Version of the mapping used |
Data quality implications
Normalization through Rosetta Stone improves data quality in several ways: Consistency: All data adheres to the same type system and validations, regardless of source. Completeness: Missing or malformed data is flagged during mapping, enabling targeted data quality improvements. Comparability: Data from different sources can be meaningfully combined because it shares a common semantic model. Traceability: System columns maintain lineage back to source data.Handling validation failures
When data fails validation during mapping, the system can:- Reject: Exclude the record from the normalized view
- Default: Map to a default value (e.g.,
unknownfor invalid enums) - Flag: Include the record but mark it for review

