Skip to main content
Dataset statistics are column-level metrics computed over the contents of a dataset. They provide a quantitative profile of each column—counts, bounds, distributions, and quality indicators—so you can understand what your data looks like without querying it directly.

Why statistics matter

Data quality assessment. Statistics give you an immediate picture of data health. You can see how many values are null, whether numeric ranges look reasonable, and how complete each column is—all without writing a query. Schema understanding. When working with a new dataset, statistics help you understand the shape of the data. Cardinality tells you whether a column has a few categories or millions of unique values. Histograms show you how values are distributed. Query context. Statistics surface in the platform UI to help you make better decisions. For example, when filtering on a field in Data Studio, histogram statistics populate a dropdown with the field’s actual values, so you can select from real data rather than guessing.

Where statistics appear in the platform

Statistics are visible in two places:
  • Dataset details page — shows column-level statistics for the full dataset
  • Dataset sample page — shows statistics computed over the sample
They also power UI features elsewhere. In Data Studio, filtering on a field that has histogram statistics displays the histogram values in a dropdown for easy selection.

Configuring what gets computed

You can control which statistics are computed, set refresh schedules, and add per-field overrides directly from the dataset details page. The configuration supports three levels of specificity: global defaults that apply to all fields, namespace-scoped overrides for dataset columns or Rosetta Stone attributes, and individual field-level overrides. For step-by-step instructions, see the configuring dataset statistics guide. For the full configuration schema and API details, see the reference.

What gets computed

The platform computes 12 statistics per column, grouped into three categories.

Counts

Metrics that describe how many values exist and their uniqueness.
StatisticWhat it tells you
valueCountTotal number of non-null values
nullValueCountNumber of null values
nanValueCountNumber of NaN (not-a-number) values—applies only to floating-point columns
approxCountDistinctApproximate number of unique values, using a probabilistic algorithm for efficiency
countDistinctExact number of unique values

Bounds and distribution

Metrics that describe how values are distributed across the column.
StatisticWhat it tells you
lowerBoundMinimum value in the column
upperBoundMaximum value in the column
histogramFrequency distribution of values across distinct buckets
meanAverage value—applies only to numeric columns
standardDeviationSpread of values around the mean—applies only to numeric columns

Storage and quality

Metrics that describe the physical footprint and overall completeness of the column.
StatisticWhat it tells you
columnStoredBytesBytes of storage consumed by the column
completenessRatio of non-null values to total rows, expressed as a value between 0 and 1

How data types affect available statistics

Not all statistics apply to all data types. Numeric columns support the full set of 12 statistics, while non-numeric types like strings and booleans don’t have mean or standardDeviation. Complex types like arrays and objects only support basic counts and storage metrics. For the full mapping of which statistics are available for each data type, see the type compatibility matrix in the reference.

Datasets

How datasets store, organize, and protect your data

Configuring Dataset Statistics

Set up statistics through the platform UI

Dataset Statistics Reference

Full reference for each statistic and type compatibility matrix

Managing Datasets

Create and manage datasets with the SDK

Sample Data

How samples cross to the control plane