How query planners optimize JOINs
When you write a JOIN in NQL, the query planner analyzes the join condition and chooses an execution strategy. For simple equality conditions likeA.key = B.key, the planner can use highly optimized algorithms.
Hash joins
The most efficient join algorithm for equality conditions is the hash join:- The planner builds a hash table from the smaller table’s join keys
- It scans the larger table and probes the hash table for each key
- Matches are found in near-constant time (O(1) per lookup)
Index utilization
When tables have indexes on join columns, the planner can use index lookups instead of full scans. Like hash joins, index-based strategies require predictable, single-condition lookups to work effectively.What makes a join condition optimizable
A join condition is optimizable when the planner can:- Build a hash table or index on one side
- Probe that structure efficiently from the other side
- Match rows without examining every possible pair
A.key = B.key) satisfy all these requirements. The planner knows exactly how to partition the data and find matches efficiently.
Why OR conditions prevent optimization
When you introduce OR into a join condition, the optimization breaks down:Query planner limitations
Most query planners treat OR joins as a single complex condition rather than recognizing they could be split into two efficient joins. The planner isn’t sophisticated enough to automatically:- Run two separate joins and merge results
- Build multiple hash tables and probe them in parallel
- Recognize that UNION would produce the same result more efficiently
What happens internally
When faced with an OR join, the query planner typically resorts to: Nested loop joins: The planner iterates through one table and, for each row, scans the other table checking both conditions. This approach has O(n × m) complexity—for tables with millions of rows, this becomes computationally expensive. Spool operations: Some planners create temporary storage structures (spools) to materialize intermediate results. In extreme cases, these spools can grow to enormous sizes. One documented example showed a spool containing 7 billion rows (112 GB) for what should have been a straightforward join. Constant scans and multiple passes: The planner may create multiple internal join passes or constant scan operations to handle the OR logic, adding overhead with each pass.The exponential cost problem
The fundamental issue is that an OR join isn’t really one join—it’s conceptually two joins that need to be merged. When the planner treats it as one operation, the cost compounds. Instead ofcost(join1) + cost(join2), you get something closer to cost(join1) × cost(join2) due to the nested evaluation required.
Performance impact in practice
The theoretical problems translate to dramatic real-world performance differences.Orders of magnitude slower
Queries that should complete in seconds can run for hours with OR in the join:| Scenario | With OR join | Without OR join | Improvement |
|---|---|---|---|
| Two 10k-row tables | ~4 hours | ~2 minutes | 120x faster |
| Simple match query | 4 min 37 sec | 12 seconds | 23x faster |
| Large dataset join | 3 hr 55 min | under 2 min | 118x faster |
Resource consumption
Beyond time, OR joins consume disproportionate resources:- Memory: Spool operations and intermediate results can consume gigabytes
- CPU: Nested loop evaluation keeps processors busy with redundant work
- I/O: Multiple passes over data increase disk reads dramatically
When queries never finish
For sufficiently large tables, OR joins may never complete. The exponential cost growth means that doubling the data size more than doubles the execution time. What works on test data may be completely impractical on production volumes.Patterns that enable optimization
Understanding what works helps you write efficient queries.Single-key equality: the optimal path
The ideal join condition is a simple equality on one column:How UNNEST transforms OR into single-key joins
The UNNEST technique works by restructuring your data so that multiple possible keys become multiple rows, each with a single key:key column, and the planner can apply its efficient algorithms. You’ve done the work of separating the conditions that the planner couldn’t do automatically.
How UNION separates work efficiently
The UNION approach explicitly runs two optimized joins and combines results:cost(join1) + cost(join2) + cost(dedup)—linear, not exponential.

