Skip to main content
Dataset statistics configuration lets you control which column-level metrics are computed for a dataset, how often they refresh, and whether specific fields get different treatment than the defaults. You configure all of this from a side panel on the dataset details page.
This guide covers the UI-based configuration. You can also manage statistics configuration programmatically through the API or the TypeScript SDK.

Prerequisites

  • A dataset you own or have write access to
  • The dataset must be on an active data plane

Opening the configuration panel

1

Navigate to the dataset

Go to My Data > Datasets and select the dataset you want to configure.
2

Open the actions menu

Click the Actions button on the dataset details page to open the actions dropdown.
3

Select Configure statistics

Under the Stats group, click Configure statistics. A side panel opens with the configuration form.If a configuration already exists, the panel loads it in edit mode. Otherwise, you see the create form with sensible defaults pre-selected.

Setting default statistics

The Defaults section controls which statistics are computed for all eligible fields across the dataset. Select or deselect individual statistics, or use the category-level and global toggles to select groups at once. Available statistics are organized into five categories:
CategoryStatistics
CountsValue count, Null value count, NaN value count
BoundsLower bound, Upper bound
DistinctnessApprox count distinct, Count distinct
DistributionHistogram, Mean, Standard deviation
QualityCompleteness
Approx count distinct and Count distinct are mutually exclusive — you can enable one or the other, but not both. Approximate counting uses HyperLogLog and is significantly faster on large datasets.
Not all statistics apply to all data types. For example, Mean and Standard deviation only apply to numeric columns (double and long). The UI automatically filters out incompatible statistics when you configure per-field overrides. See the type compatibility matrix for the full mapping.

Histogram options

When you enable Histogram, additional options appear:
OptionDescriptionDefault
Max binsMaximum number of histogram buckets (2–100,000)50
OverflowHow to handle values exceeding the bin count — none or truncatenone

Setting a refresh schedule

The Refresh Trigger section controls when statistics are recomputed. Choose one of three options:
TriggerBehavior
ManualStatistics are only recomputed when you manually trigger an update from the actions menu
On updateStatistics are automatically recomputed whenever new data is ingested into the dataset
CronStatistics are recomputed on a UTC cron schedule (e.g., 0 0 * * * for daily at midnight)
For most datasets, a daily cron schedule (0 0 * * *) provides a good balance between freshness and compute cost. Use On update only for datasets where you need statistics to reflect every data change immediately.

Adding field overrides

By default, all eligible fields inherit the statistics you set in the Defaults section. You can override this at two levels:

Namespace scope overrides

Expand the Dataset Field Overrides or Rosetta Stone Overrides section and enable the scope override toggle. This lets you set different default statistics for all fields within that namespace, overriding the global defaults.

Per-field overrides

Within each namespace section, click Add Field Override to configure statistics for a specific column or attribute. Each override row lets you:
  1. Select a field from the dropdown
  2. Choose which statistics to compute for that field
  3. Optionally configure histogram options specific to that field
For non-primitive fields (objects and arrays), only Null value count is available as a self-statistic. Statistics on nested child fields within objects and arrays are not yet configurable from the UI.
The inheritance model means a per-field override completely replaces the stat set from the parent level. If you want a field to compute only value_count and histogram, select just those two — it won’t also inherit the defaults. For details on how inheritance works, see the inheritance model in the reference.

Saving and deleting

  • Click Save to create or update the configuration. A confirmation toast appears on success.
  • In edit mode, click Delete Configuration to remove the configuration entirely. This stops automatic statistics computation for the dataset.

Manually triggering a statistics update

Separately from the configuration, you can trigger an immediate one-off recomputation. From the dataset actions menu under the Stats group, click Update statistics. This runs the datasets_calculate_column_stats job regardless of the refresh trigger setting.

Dataset Statistics

Why statistics matter and where they appear in the platform

Dataset Statistics Reference

Full API reference, configuration schema, and validation errors

Managing Datasets (SDK)

Manage datasets programmatically including reading statistics

Job Types

Job types including statistics computation