Configuring dataset statistics - Narrative I/O Knowledge Base

Dataset statistics configuration lets you control which column-level metrics are computed for a dataset, how often they refresh, and whether specific fields get different treatment than the defaults. You configure all of this from a side panel on the dataset details page.

This guide covers the UI-based configuration. You can also manage statistics configuration programmatically through the API or the TypeScript SDK.

Prerequisites

A dataset you own or have write access to
The dataset must be on an active data plane

Opening the configuration panel

Navigate to the dataset

Go to My Data > Datasets and select the dataset you want to configure.

Open the actions menu

Click the Actions button on the dataset details page to open the actions dropdown.

Select Configure statistics

Under the Stats group, click Configure statistics. A side panel opens with the configuration form.If a configuration already exists, the panel loads it in edit mode. Otherwise, you see the create form with sensible defaults pre-selected.

Setting default statistics

The Defaults section controls which statistics are computed for all eligible fields across the dataset. Select or deselect individual statistics, or use the category-level and global toggles to select groups at once. Available statistics are organized into five categories:

Category	Statistics
Counts	Value count, Null value count, NaN value count
Bounds	Lower bound, Upper bound
Distinctness	Approx count distinct, Count distinct
Distribution	Histogram, Mean, Standard deviation
Quality	Completeness

Approx count distinct and Count distinct are mutually exclusive — you can enable one or the other, but not both. Approximate counting uses HyperLogLog and is significantly faster on large datasets.

Not all statistics apply to all data types. For example, Mean and Standard deviation only apply to numeric columns (double and long). The UI automatically filters out incompatible statistics when you configure per-field overrides. See the type compatibility matrix for the full mapping.

Histogram options

When you enable Histogram, additional options appear:

Option	Description	Default
Max bins	Maximum number of histogram buckets (2–100,000)	50
Overflow	How to handle values exceeding the bin count — `none` or `truncate`	none

Setting a refresh schedule

The Refresh Trigger section controls when statistics are recomputed. Choose one of three options:

Trigger	Behavior
Manual	Statistics are only recomputed when you manually trigger an update from the actions menu
On update	Statistics are automatically recomputed whenever new data is ingested into the dataset
Cron	Statistics are recomputed on a UTC cron schedule (e.g., `0 0 * * *` for daily at midnight)

For most datasets, a daily cron schedule (0 0 * * *) provides a good balance between freshness and compute cost. Use On update only for datasets where you need statistics to reflect every data change immediately.

Adding field overrides

By default, all eligible fields inherit the statistics you set in the Defaults section. You can override this at two levels:

Namespace scope overrides

Expand the Dataset Field Overrides or Rosetta Stone Overrides section and enable the scope override toggle. This lets you set different default statistics for all fields within that namespace, overriding the global defaults.

Per-field overrides

Within each namespace section, click Add Field Override to configure statistics for a specific column or attribute. Each override row lets you:

Select a field from the dropdown
Choose which statistics to compute for that field
Optionally configure histogram options specific to that field

For non-primitive fields (objects and arrays), only Null value count is available as a self-statistic. Statistics on nested child fields within objects and arrays are not yet configurable from the UI.

The inheritance model means a per-field override completely replaces the stat set from the parent level. If you want a field to compute only value_count and histogram, select just those two — it won’t also inherit the defaults. For details on how inheritance works, see the inheritance model in the reference.

Saving and deleting

Click Save to create or update the configuration. A confirmation toast appears on success.
In edit mode, click Delete Configuration to remove the configuration entirely. This stops automatic statistics computation for the dataset.

Manually triggering a statistics update

Separately from the configuration, you can trigger an immediate one-off recomputation. From the dataset actions menu under the Stats group, click Update statistics. This runs the datasets_calculate_column_stats job regardless of the refresh trigger setting.

Dataset Statistics

Why statistics matter and where they appear in the platform

Dataset Statistics Reference

Full API reference, configuration schema, and validation errors

Managing Datasets (SDK)

Manage datasets programmatically including reading statistics

Job Types

Job types including statistics computation

​Prerequisites

​Opening the configuration panel

​Setting default statistics

​Histogram options

​Setting a refresh schedule

​Adding field overrides

​Namespace scope overrides

​Per-field overrides

​Saving and deleting

​Manually triggering a statistics update

​Related content

Dataset Statistics

Dataset Statistics Reference

Managing Datasets (SDK)

Job Types

Prerequisites

Opening the configuration panel

Setting default statistics

Histogram options

Setting a refresh schedule

Adding field overrides

Namespace scope overrides

Per-field overrides

Saving and deleting

Manually triggering a statistics update

Related content