Dataset Statistics

Dataset statistics are column-level metrics computed over the contents of a dataset. They provide a quantitative profile of each column—counts, bounds, distributions, and quality indicators—so you can understand what your data looks like without querying it directly.

Why statistics matter

Data quality assessment. Statistics give you an immediate picture of data health. You can see how many values are null, whether numeric ranges look reasonable, and how complete each column is—all without writing a query. Schema understanding. When working with a new dataset, statistics help you understand the shape of the data. Cardinality tells you whether a column has a few categories or millions of unique values. Histograms show you how values are distributed. Query context. Statistics surface in the platform UI to help you make better decisions. For example, when filtering on a field in Data Studio, histogram statistics populate a dropdown with the field’s actual values, so you can select from real data rather than guessing.

Where statistics appear in the platform

Statistics are visible in two places:

Dataset details page — shows column-level statistics for the full dataset
Dataset sample page — shows statistics computed over the sample

They also power UI features elsewhere. In Data Studio, filtering on a field that has histogram statistics displays the histogram values in a dropdown for easy selection.

Configuring what gets computed

You can control which statistics are computed, set refresh schedules, and add per-field overrides directly from the dataset details page. The configuration supports three levels of specificity: global defaults that apply to all fields, namespace-scoped overrides for dataset columns or Rosetta Stone attributes, and individual field-level overrides. For step-by-step instructions, see the configuring dataset statistics guide. For the full configuration schema and API details, see the reference.

What gets computed

The platform computes 12 statistics per column, grouped into three categories.

Counts

Metrics that describe how many values exist and their uniqueness.

Statistic	What it tells you
`valueCount`	Total number of non-null values
`nullValueCount`	Number of null values
`nanValueCount`	Number of NaN (not-a-number) values—applies only to floating-point columns
`approxCountDistinct`	Approximate number of unique values, using a probabilistic algorithm for efficiency
`countDistinct`	Exact number of unique values

Bounds and distribution

Metrics that describe how values are distributed across the column.

Statistic	What it tells you
`lowerBound`	Minimum value in the column
`upperBound`	Maximum value in the column
`histogram`	Frequency distribution of values across distinct buckets
`mean`	Average value—applies only to numeric columns
`standardDeviation`	Spread of values around the mean—applies only to numeric columns

Storage and quality

Metrics that describe the physical footprint and overall completeness of the column.

Statistic	What it tells you
`columnStoredBytes`	Bytes of storage consumed by the column
`completeness`	Ratio of non-null values to total rows, expressed as a value between 0 and 1

How data types affect available statistics

Not all statistics apply to all data types. Numeric columns support the full set of 12 statistics, while non-numeric types like strings and booleans don’t have mean or standardDeviation. Complex types like arrays and objects only support basic counts and storage metrics. For the full mapping of which statistics are available for each data type, see the type compatibility matrix in the reference.

Datasets

How datasets store, organize, and protect your data

Configuring Dataset Statistics

Set up statistics through the platform UI