The attribute hierarchy
Attributes exist at different scopes within the platform:Global attributes
Global attributes are available to all organizations on the platform. They represent standardized concepts that are commonly used across data collaboration scenarios:- Identity attributes:
email_sha256,phone_sha256,unique_identifier - Demographic attributes:
hl7_gender,age,birth_year - Temporal attributes:
event_timestamp,nio_last_modified - Geographic attributes:
country_code,postal_code,geo_coordinates
Organization attributes
Organizations can create custom attributes for concepts specific to their domain or use case. Organization attributes:- Are visible only within your organization (and to partners you explicitly share with)
- Can extend or specialize global attributes
- Follow the same type system and validation rules
Attribute composition
Attributes can reference other attributes, enabling complex data structures:"$ref": 100 references the unique_identifier attribute (ID 100). This allows reuse of standardized definitions while building domain-specific structures.
The type system
Attributes use a type system that supports both primitive and complex data.Primitive types
| Type | Description | Example values |
|---|---|---|
string | Text data of variable length | "hello", "[email protected]" |
long | Whole numbers (64-bit integer) | 42, -17, 0 |
double | Decimal numbers (64-bit float) | 3.14, -0.001, 1000.0 |
boolean | True/false values | true, false |
timestamptz | Date and time with timezone (ISO 8601) | 2024-01-15T14:30:00Z |
Enum type
Enums restrict string values to a predefined set. An enum is astring type with an enum property:
unknown.
Object type
Objects group related fields into a single attribute usingproperties:
required array specifies which properties must be present. When mapping to an object, the transformation must produce all required fields:
Array type
Arrays contain multiple values of the same type, specified withitems:
Reference type
References link to other attribute definitions using$ref with the numeric attribute ID:
$ref: 300 references the event_timestamp attribute and $ref: 100 references the unique_identifier attribute. References inherit the type, validations, and semantics of the referenced attribute.
Validations
Validations enforce data quality at the attribute level. Validations are NQL expressions where$this represents the value being validated. These expressions are injected into compiled NQL queries to filter out invalid values.
Required fields in objects
For object types, therequired array specifies which properties must be present:
middle_name is optional because it’s not in the required array.
Range constraints
Numeric fields can have minimum and maximum bounds using comparison operators with$this:
String length
Use theLENGTH() function to constrain string length:
Pattern matching
UseLIKE for pattern validation or combine with length checks:
Combining validations
Multiple validations can be combined in a single array:Schema presets
Schema presets are curated collections of attributes designed for common use cases.Available presets
| Preset | Description | Key attributes |
|---|---|---|
| Demographics | Consumer demographic data | hl7_gender, age, birth_year, country_code |
| Identity | User identification | unique_identifier, email_sha256, phone_sha256 |
| Events | Timestamped occurrences | event_timestamp, event_type, event_properties |
| Location | Geographic data | geo_coordinates, country_code, postal_code |
| Marketing | Campaign and engagement | campaign_id, channel, conversion_timestamp |
Using a preset
When configuring a new dataset, you can select a preset to automatically suggest mappings for standard attributes. This accelerates the mapping process for common data types.Creating custom presets
Organizations can create private presets that bundle:- A set of attributes (global or organization-specific)
- Default transformation templates
- Validation rules
The narrative.rosetta_stone table
Thenarrative.rosetta_stone table is a virtual table that provides unified access to all normalized data.
How it works
When you querynarrative.rosetta_stone:
- The query planner identifies which attributes you’re selecting
- It finds all datasets with active mappings for those attributes
- It rewrites queries for each dataset using the mapping transformations
- Results are unioned and returned in the normalized format
Querying the table
Scoping your queries
When querying Rosetta Stone, you can control which data sources are included by choosing an appropriate scope level.Global scope
Usenarrative.rosetta_stone when you want to query all normalized data available to you:
- Includes data from all companies that have shared data with you
- Best for broad analysis across your entire data ecosystem
- Returns the widest possible result set
Company scope
Usecompany_data._rosetta_stone or <company_slug>._rosetta_stone when you want data from a specific company:
- Limits results to datasets owned by that company
- Useful when you need data from a specific partner or your own organization
- Reduces the search space for faster queries
Dataset scope
Usecompany_data.<dataset>._rosetta_stone when you need to work with a specific dataset:
- Required when combining normalized attributes with non-normalized columns from the same dataset
- Gives you precise control over exactly which data is queried
- Enables joining Rosetta Stone attributes back to source-specific columns
Choosing the right scope
| I want to… | Use this scope |
|---|---|
| Query all available normalized data | Global (narrative.rosetta_stone) |
| Query data from my company only | Company (company_data._rosetta_stone) |
| Query data from a specific partner | Company (partner_slug._rosetta_stone) |
| Join normalized and original columns from one dataset | Dataset (company_data."123"._rosetta_stone) |
| Control exactly which datasets contribute to results | Dataset |
System columns
The table includes system columns for traceability:| Column | Description |
|---|---|
_nio_source_dataset_id | ID of the source dataset |
_nio_source_row_id | Original row identifier |
_nio_mapping_version | Version of the mapping used |
Data quality implications
Normalization through Rosetta Stone improves data quality in several ways: Consistency: All data adheres to the same type system and validations, regardless of source. Completeness: Missing or malformed data is flagged during mapping, enabling targeted data quality improvements. Comparability: Data from different sources can be meaningfully combined because it shares a common semantic model. Traceability: System columns maintain lineage back to source data.Handling validation failures
When data fails validation during mapping, the system can:- Reject: Exclude the record from the normalized view
- Default: Map to a default value (e.g.,
unknownfor invalid enums) - Flag: Include the record but mark it for review

