Skip to main content
Dataset statistics are column-level metrics computed over the contents of a dataset. They provide a quantitative profile of each column—counts, bounds, distributions, and quality indicators—so you can understand what your data looks like without querying it directly.

Why statistics matter

Data quality assessment. Statistics give you an immediate picture of data health. You can see how many values are null, whether numeric ranges look reasonable, and how complete each column is—all without writing a query. Schema understanding. When working with a new dataset, statistics help you understand the shape of the data. Cardinality tells you whether a column has a few categories or millions of unique values. Histograms show you how values are distributed. Query context. Statistics surface in the platform UI to help you make better decisions. For example, when filtering on a field in Data Studio, histogram statistics populate a dropdown with the field’s actual values, so you can select from real data rather than guessing.

Where statistics appear in the platform

Statistics are visible in two places:
  • Dataset details page — shows column-level statistics for the full dataset
  • Dataset sample page — shows statistics computed over the sample
They also power UI features elsewhere. In Data Studio, filtering on a field that has histogram statistics displays the histogram values in a dropdown for easy selection.

What gets computed

The platform computes 12 statistics per column, grouped into three categories.

Counts

Metrics that describe how many values exist and their uniqueness.
StatisticWhat it tells you
valueCountTotal number of non-null values
nullValueCountNumber of null values
nanValueCountNumber of NaN (not-a-number) values—applies only to floating-point columns
approxCountDistinctApproximate number of unique values, using a probabilistic algorithm for efficiency
countDistinctExact number of unique values

Bounds and distribution

Metrics that describe how values are distributed across the column.
StatisticWhat it tells you
lowerBoundMinimum value in the column
upperBoundMaximum value in the column
histogramFrequency distribution of values across distinct buckets
meanAverage value—applies only to numeric columns
standardDeviationSpread of values around the mean—applies only to numeric columns

Storage and quality

Metrics that describe the physical footprint and overall completeness of the column.
StatisticWhat it tells you
columnStoredBytesBytes of storage consumed by the column
completenessRatio of non-null values to total rows, expressed as a value between 0 and 1

How data types affect available statistics

Not all statistics apply to all data types. Numeric columns support the full set of 12 statistics, while non-numeric types like strings and booleans don’t have mean or standardDeviation. Complex types like arrays and objects only support basic counts and storage metrics. For the full mapping of which statistics are available for each data type, see the type compatibility matrix in the reference.