Datasets - Narrative I/O Knowledge Base

A dataset is a structured collection of data registered in Narrative. Think of it as a table in a database—it has a defined schema, holds rows of data, and can be queried, shared, and collaborated on. Datasets are one of the core primitives in the Narrative platform.

What datasets are for

Datasets serve as the primary way to bring data into Narrative and make it available for collaboration: Storage and organization. Datasets provide a structured container for your data. Each dataset has a defined schema that specifies what fields exist, their data types, and how they should be validated. Querying. Once data is in a dataset, you can query it using NQL. Datasets are the foundation of all query operations in Narrative. Collaboration. Through access rules, you can grant other organizations permission to query your datasets—enabling data sharing, monetization, or joint analysis.

How datasets are structured

Schema

Every dataset has a schema that acts as its structural blueprint. The schema defines:

Field names — The columns that exist in the dataset
Field types — The data type for each field (string, number, timestamp, etc.)
Descriptions — Documentation explaining what each field contains
Validations — Rules that ensure data integrity when records are added

// Example schema definition
{
  properties: {
    customer_id: { type: 'string' },
    event_type: { type: 'string' },
    event_timestamp: { type: 'timestamptz' },
    event_value: { type: 'double' },
  }
}

Schemas are designed to be stable. While you can add new fields to a schema, changing or removing existing fields requires careful consideration to avoid breaking queries or integrations that depend on them.

Records and snapshots

Data in a dataset is organized into records (rows) and snapshots:

Records are individual data entries that conform to the dataset’s schema
Snapshots represent a point-in-time collection of files that were ingested together

When you upload data, the ingestion process validates each record against the schema and adds it to the dataset as part of a new snapshot.

Adding data to datasets

Datasets support multiple ways to add data: Append mode. New data is added alongside existing data. Use this for event-style data where each upload contains new records. Overwrite mode. New data replaces existing data. Use this when you want to refresh the entire dataset with an updated version. For procedural details on uploading data, see Uploading Data.

Retention policies

Datasets can have retention policies that automatically manage data lifecycle. A retention policy defines how long data is kept before automatic deletion, helping you manage storage costs and comply with data governance requirements. Common retention configurations include:

Time-based retention — Automatically remove data older than a specified period (e.g., 90 days, 1 year)
Retain everything — Keep all data indefinitely until manually deleted

For details on how retention policies work, including differences between data planes and how to configure them, see Dataset Retention Policies.

Ownership and access

Single-company ownership

Every dataset is owned by exactly one company. The owner has full control over:

The dataset’s schema and configuration
Who can access the data and under what terms
Whether to archive or delete the dataset

This ownership model ensures clear accountability and prevents ambiguity about who controls sensitive data.

Access through access rules

By default, a dataset is private to its owner. To share data with other organizations, you create access rules that define:

Which organizations can query the dataset
Which fields and records they can access
What pricing applies (if any)

This separation between ownership and access provides flexibility—you retain full control while selectively enabling collaboration.

Where datasets live

Datasets are scoped to a specific data plane. The data plane determines:

Where the data physically resides (Narrative-hosted or your own infrastructure)
Which query engine processes queries against the dataset
What data residency and compliance requirements are met

When you create a dataset, you specify which data plane it belongs to. The control plane maintains metadata about the dataset—its schema, access rules, and statistics—while the actual data remains in the data plane.

View datasets

A view dataset is a dataset backed by an NQL query rather than uploaded data. When you execute a query with the create_as_view option, the result is stored as a view dataset. Unlike a materialized view, a view dataset does not refresh on a schedule — the stored NQL is inlined and re-evaluated at query time whenever the view dataset is referenced. View datasets are useful when you want a reusable, queryable subset of your data without duplicating it into a separate physical dataset. The underlying NQL query can reference other datasets (including other view datasets), and the platform resolves those dependencies automatically.

Restrictions

View datasets have specific restrictions compared to regular datasets:

No access rules. You cannot create access rules on a view dataset. If you need to share the data with other organizations, create a regular dataset or materialized view instead.
No connections. You cannot create connections to deliver a view dataset to an external platform.
No forecasting. Query cost forecasting is not available for queries against view datasets.

For the full feature comparison with materialized views — including unsupported NQL features like MERGE ON, PARTITIONED_BY, and chunking strategies — see view dataset limitations.

Creating a view dataset

Use the create_as_view option when executing an NQL query through the SDK:

const result = await api.executeNql({
  nql: 'SELECT user_id, email, event_type FROM company_data."my_dataset" WHERE event_type = \'purchase\'',
  data_plane_id: null,
  create_as_view: true,
});

For details on executing queries with this option, see Executing NQL Queries.

Datasets, materialized views, and view datasets

Narrative supports three types of data containers:

Type	Source	Data storage	Updates
Dataset	External data you upload	Physical table	Manual uploads or automated ingestion
Materialized view	NQL query results	Physical table (cached results)	Automatic refresh on schedule
View dataset	NQL query definition	No physical storage	Re-evaluated at query time

Materialized views are created from NQL queries and automatically refresh their contents. Regular datasets require you to explicitly add data through uploads or ingestion. View datasets store only the NQL query definition and re-evaluate it at query time — see View datasets above.

Retention Policies

Configure automatic data lifecycle management

Access Rules

Control who can query your datasets and at what price

Data Planes

Understand where your datasets physically reside

Dataset Statistics

Column-level metrics computed over your dataset contents

Managing Datasets

Create and manage datasets with the SDK

Name Conflict Errors

Resolve HTTP 409 errors when a dataset name is already in use

Schema Incompatibility Errors

Resolve HTTP 400 errors when a dataset schema doesn’t match a connector

​What datasets are for

​How datasets are structured

​Schema

​Records and snapshots

​Adding data to datasets

​Retention policies

​Ownership and access

​Single-company ownership

​Access through access rules

​Where datasets live

​View datasets

​Restrictions

​Creating a view dataset

​Datasets, materialized views, and view datasets

​Related content

Retention Policies

Access Rules

Data Planes

Dataset Statistics

Managing Datasets

Name Conflict Errors

Schema Incompatibility Errors

What datasets are for

How datasets are structured

Schema

Records and snapshots

Adding data to datasets

Retention policies

Ownership and access

Single-company ownership

Access through access rules

Where datasets live

View datasets

Restrictions

Creating a view dataset

Datasets, materialized views, and view datasets

Related content