Why chunking matters
Very large datasets—from tens of gigabytes to hundreds of terabytes—present challenges for single-shot query execution: Stability. Long-running scans can fail due to resource pressure, network timeouts, or cluster issues. A failure 4 hours into a 5-hour query means starting over from scratch. Cost control. When a query fails late in execution, you’ve already consumed compute resources with nothing to show for it. The later the failure, the greater the waste. Scalability. Single-pass processing doesn’t scale predictably. A query that works for 10GB may fail at 100GB or behave differently at 1TB. Chunking addresses these problems by capping the work performed in any single step.How chunking works
When NQL determines a query is eligible for chunked execution:- Evaluation. The query planner analyzes the query structure to determine if chunking can apply without changing results.
-
Time-based splitting. The platform divides the scan into ranges based on a last-modified timestamp column (typically
nio_last_modified_atorlast_modified_at). - Independent execution. Each chunk runs as a separate operation. If one chunk fails, only that chunk needs to retry—not the entire query.
- Result combination. Results from all chunks are combined to produce the final output, semantically equivalent to running the query as a single operation.
Eligible queries
Chunking applies to queries that primarily scan a single dataset with standard projections and filters. These patterns work well:- Simple scans with filters and projections from
company_datadatasets - Queries against
narrative.rosetta_stonewith standard selection patterns - Materialized views that read from a single source with straightforward transformations
Example: A chunkable query
This materialized view is a good candidate for chunking:Queries not eligible for chunking
To preserve correctness and keep the execution model predictable, certain query patterns are not chunked: Complex joins. Multi-table joins, especially when join semantics span large time ranges, require seeing all matching rows together. Chunking could produce incorrect results. Unions and set operations. Operations likeUNION, INTERSECT, and EXCEPT combine results from multiple queries in ways that don’t partition cleanly by time.
Global aggregations. Operations like SUM, AVG, or COUNT over the entire dataset need all rows in a single pass to produce correct results. (Group-by aggregations within chunks may still work.)
Sorting and windowing. ORDER BY across the full result set and window functions that depend on complete data can’t be chunked without changing semantics.
Table functions. Functions that change scan behavior may not be compatible with time-based partitioning.
Queries that include these operations still run—they just execute as a single operation rather than being split into chunks.
Result consistency
Chunked execution aims to produce results equivalent to a single-pass run. In practice:- Results are materially the same for typical analytical workloads
- Minor differences can occur at chunk boundaries due to processing order or concurrent updates to source data
- Deterministic queries over stable data produce identical results regardless of chunking

