> ## Documentation Index
> Fetch the complete documentation index at: https://docs.narrative.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Configuring dataset statistics

> Choose which column-level metrics to compute, set refresh schedules, and add per-field overrides from the dataset details page

Dataset statistics configuration lets you control which column-level metrics are computed for a dataset, how often they refresh, and whether specific fields get different treatment than the defaults. You configure all of this from a side panel on the dataset details page.

<Info>
  This guide covers the UI-based configuration. You can also manage statistics configuration programmatically through the [API](/reference/architecture/dataset-statistics#api-endpoints) or the [TypeScript SDK](/guides/sdk/managing-datasets#getting-dataset-statistics).
</Info>

## Prerequisites

* A dataset you own or have write access to
* The dataset must be on an active [data plane](/concepts/primitives/data-planes)

## Opening the configuration panel

<Steps>
  <Step title="Navigate to the dataset">
    Go to **My Data > Datasets** and select the dataset you want to configure.
  </Step>

  <Step title="Open the actions menu">
    Click the **Actions** button on the dataset details page to open the actions dropdown.
  </Step>

  <Step title="Select Configure statistics">
    Under the **Stats** group, click **Configure statistics**. A side panel opens with the configuration form.

    If a configuration already exists, the panel loads it in edit mode. Otherwise, you see the create form with sensible defaults pre-selected.
  </Step>
</Steps>

## Setting default statistics

The **Defaults** section controls which statistics are computed for all eligible fields across the dataset. Select or deselect individual statistics, or use the category-level and global toggles to select groups at once.

Available statistics are organized into five categories:

| Category         | Statistics                                     |
| ---------------- | ---------------------------------------------- |
| **Counts**       | Value count, Null value count, NaN value count |
| **Bounds**       | Lower bound, Upper bound                       |
| **Distinctness** | Approx count distinct, Count distinct          |
| **Distribution** | Histogram, Mean, Standard deviation            |
| **Quality**      | Completeness                                   |

<Warning>
  **Approx count distinct** and **Count distinct** are mutually exclusive — you can enable one or the other, but not both. Approximate counting uses HyperLogLog and is significantly faster on large datasets.
</Warning>

Not all statistics apply to all data types. For example, Mean and Standard deviation only apply to numeric columns (`double` and `long`). The UI automatically filters out incompatible statistics when you configure per-field overrides. See the [type compatibility matrix](/reference/architecture/dataset-statistics#type-compatibility-matrix) for the full mapping.

### Histogram options

When you enable **Histogram**, additional options appear:

| Option       | Description                                                         | Default |
| ------------ | ------------------------------------------------------------------- | ------- |
| **Max bins** | Maximum number of histogram buckets (2–100,000)                     | 50      |
| **Overflow** | How to handle values exceeding the bin count — `none` or `truncate` | none    |

## Setting a refresh schedule

The **Refresh Trigger** section controls when statistics are recomputed. Choose one of three options:

| Trigger       | Behavior                                                                                   |
| ------------- | ------------------------------------------------------------------------------------------ |
| **Manual**    | Statistics are only recomputed when you manually trigger an update from the actions menu   |
| **On update** | Statistics are automatically recomputed whenever new data is ingested into the dataset     |
| **Cron**      | Statistics are recomputed on a UTC cron schedule (e.g., `0 0 * * *` for daily at midnight) |

<Tip>
  For most datasets, a daily cron schedule (`0 0 * * *`) provides a good balance between freshness and compute cost. Use **On update** only for datasets where you need statistics to reflect every data change immediately.
</Tip>

## Adding field overrides

By default, all eligible fields inherit the statistics you set in the Defaults section. You can override this at two levels:

### Namespace scope overrides

Expand the **Dataset Field Overrides** or **Rosetta Stone Overrides** section and enable the scope override toggle. This lets you set different default statistics for all fields within that namespace, overriding the global defaults.

### Per-field overrides

Within each namespace section, click **Add Field Override** to configure statistics for a specific column or attribute. Each override row lets you:

1. Select a field from the dropdown
2. Choose which statistics to compute for that field
3. Optionally configure histogram options specific to that field

<Note>
  For non-primitive fields (objects and arrays), only **Null value count** is available as a self-statistic. Statistics on nested child fields within objects and arrays are not yet configurable from the UI.
</Note>

The inheritance model means a per-field override completely replaces the stat set from the parent level. If you want a field to compute only `value_count` and `histogram`, select just those two — it won't also inherit the defaults. For details on how inheritance works, see the [inheritance model](/reference/architecture/dataset-statistics#inheritance-model) in the reference.

## Saving and deleting

* Click **Save** to create or update the configuration. A confirmation toast appears on success.
* In edit mode, click **Delete Configuration** to remove the configuration entirely. This stops automatic statistics computation for the dataset.

## Manually triggering a statistics update

Separately from the configuration, you can trigger an immediate one-off recomputation. From the dataset actions menu under the **Stats** group, click **Update statistics**. This runs the `datasets_calculate_column_stats` [job](/reference/architecture/job-types) regardless of the refresh trigger setting.

## Related content

<CardGroup cols={2}>
  <Card title="Dataset Statistics" icon="lightbulb" href="/concepts/primitives/dataset-statistics">
    Why statistics matter and where they appear in the platform
  </Card>

  <Card title="Dataset Statistics Reference" icon="book" href="/reference/architecture/dataset-statistics">
    Full API reference, configuration schema, and validation errors
  </Card>

  <Card title="Managing Datasets (SDK)" icon="code" href="/guides/sdk/managing-datasets">
    Manage datasets programmatically including reading statistics
  </Card>

  <Card title="Job Types" icon="list-check" href="/reference/architecture/job-types">
    Job types including statistics computation
  </Card>
</CardGroup>
