Skip to main content
A dataset is a structured collection of data registered in Narrative. Think of it as a table in a database—it has a defined schema, holds rows of data, and can be queried, shared, and collaborated on. Datasets are one of the core primitives in the Narrative platform.

What datasets are for

Datasets serve as the primary way to bring data into Narrative and make it available for collaboration: Storage and organization. Datasets provide a structured container for your data. Each dataset has a defined schema that specifies what fields exist, their data types, and how they should be validated. Querying. Once data is in a dataset, you can query it using NQL. Datasets are the foundation of all query operations in Narrative. Collaboration. Through access rules, you can grant other organizations permission to query your datasets—enabling data sharing, monetization, or joint analysis.

How datasets are structured

Schema

Every dataset has a schema that acts as its structural blueprint. The schema defines:
  • Field names — The columns that exist in the dataset
  • Field types — The data type for each field (string, number, timestamp, etc.)
  • Descriptions — Documentation explaining what each field contains
  • Validations — Rules that ensure data integrity when records are added
// Example schema definition
{
  properties: {
    customer_id: { type: 'string' },
    event_type: { type: 'string' },
    event_timestamp: { type: 'timestamptz' },
    event_value: { type: 'double' },
  }
}
Schemas are designed to be stable. While you can add new fields to a schema, changing or removing existing fields requires careful consideration to avoid breaking queries or integrations that depend on them.

Records and snapshots

Data in a dataset is organized into records (rows) and snapshots:
  • Records are individual data entries that conform to the dataset’s schema
  • Snapshots represent a point-in-time collection of files that were ingested together
When you upload data, the ingestion process validates each record against the schema and adds it to the dataset as part of a new snapshot.

Adding data to datasets

Datasets support multiple ways to add data: Append mode. New data is added alongside existing data. Use this for event-style data where each upload contains new records. Overwrite mode. New data replaces existing data. Use this when you want to refresh the entire dataset with an updated version. For procedural details on uploading data, see Uploading Data.

Retention policies

Datasets can have retention policies that automatically manage data lifecycle. A retention policy defines how long data is kept before automatic deletion, helping you manage storage costs and comply with data governance requirements. Common retention configurations include:
  • Time-based retention — Automatically remove data older than a specified period (e.g., 90 days, 1 year)
  • Retain everything — Keep all data indefinitely until manually deleted
For details on how retention policies work, including differences between data planes and how to configure them, see Dataset Retention Policies.

Ownership and access

Single-company ownership

Every dataset is owned by exactly one company. The owner has full control over:
  • The dataset’s schema and configuration
  • Who can access the data and under what terms
  • Whether to archive or delete the dataset
This ownership model ensures clear accountability and prevents ambiguity about who controls sensitive data.

Access through access rules

By default, a dataset is private to its owner. To share data with other organizations, you create access rules that define:
  • Which organizations can query the dataset
  • Which fields and records they can access
  • What pricing applies (if any)
This separation between ownership and access provides flexibility—you retain full control while selectively enabling collaboration.

Where datasets live

Datasets are scoped to a specific data plane. The data plane determines:
  • Where the data physically resides (Narrative-hosted or your own infrastructure)
  • Which query engine processes queries against the dataset
  • What data residency and compliance requirements are met
When you create a dataset, you specify which data plane it belongs to. The control plane maintains metadata about the dataset—its schema, access rules, and statistics—while the actual data remains in the data plane.

View datasets

A view dataset is a dataset backed by an NQL query rather than uploaded data. When you execute a query with the create_as_view option, the result is stored as a view dataset. Unlike a materialized view, a view dataset does not refresh on a schedule — the stored NQL is inlined and re-evaluated at query time whenever the view dataset is referenced. View datasets are useful when you want a reusable, queryable subset of your data without duplicating it into a separate physical dataset. The underlying NQL query can reference other datasets (including other view datasets), and the platform resolves those dependencies automatically.

Restrictions

View datasets have specific restrictions compared to regular datasets:
  • No access rules. You cannot create access rules on a view dataset. If you need to share the data with other organizations, create a regular dataset or materialized view instead.
  • No connections. You cannot create connections to deliver a view dataset to an external platform.
  • No forecasting. Query cost forecasting is not available for queries against view datasets.
For the full feature comparison with materialized views — including unsupported NQL features like MERGE ON, PARTITIONED_BY, and chunking strategies — see view dataset limitations.

Creating a view dataset

Use the create_as_view option when executing an NQL query through the SDK:
const result = await api.executeNql({
  nql: 'SELECT user_id, email, event_type FROM company_data."my_dataset" WHERE event_type = \'purchase\'',
  data_plane_id: null,
  create_as_view: true,
});
For details on executing queries with this option, see Executing NQL Queries.

Datasets, materialized views, and view datasets

Narrative supports three types of data containers:
TypeSourceData storageUpdates
DatasetExternal data you uploadPhysical tableManual uploads or automated ingestion
Materialized viewNQL query resultsPhysical table (cached results)Automatic refresh on schedule
View datasetNQL query definitionNo physical storageRe-evaluated at query time
Materialized views are created from NQL queries and automatically refresh their contents. Regular datasets require you to explicitly add data through uploads or ingestion. View datasets store only the NQL query definition and re-evaluate it at query time — see View datasets above.

Retention Policies

Configure automatic data lifecycle management

Access Rules

Control who can query your datasets and at what price

Data Planes

Understand where your datasets physically reside

Dataset Statistics

Column-level metrics computed over your dataset contents

Managing Datasets

Create and manage datasets with the SDK

Name Conflict Errors

Resolve HTTP 409 errors when a dataset name is already in use

Schema Incompatibility Errors

Resolve HTTP 400 errors when a dataset schema doesn’t match a connector