Skip to main content
A compute pool determines the compute resources allocated to process your queries within a data plane. When you execute a query, the compute pool controls how much processing power is available and whether those resources are shared with other users or dedicated to your workload. Compute pools are one of the four dimensions of your execution context, alongside data plane, database, and schema.

Compute pool types

Dedicated

Dedicated compute pools provide isolated resources reserved for your workloads. Your queries don’t compete with other users for processing power, which results in more predictable performance. Use dedicated compute pools when:
  • Running production workloads where performance consistency matters
  • Processing large or complex queries that need guaranteed resources
  • Operating time-sensitive pipelines where latency must stay predictable

Shared

Shared compute pools use pooled resources across multiple users. This is more cost-effective but means your query performance may vary depending on current platform load. Use shared compute pools when:
  • Running exploratory queries or ad-hoc analysis
  • Developing and testing queries before promoting to production
  • Working with smaller datasets where performance variability is acceptable

Snowflake warehouse

On Snowflake-based data planes, each compute pool maps to a Snowflake virtual warehouse. When you register warehouses through the Snowflake Native App, each warehouse becomes a compute pool on your data plane. You can register multiple warehouses to separate workloads—for example, a smaller warehouse for exploratory queries and a larger one for production pipelines. Each Snowflake compute pool has a collaboration policy that controls which companies can use it, and one pool can be designated as the default for the data plane.

AWS EMR

On AWS-based data planes, compute pools map to Amazon EMR Spark clusters that the data plane operator provisions, reuses across jobs targeting the same pool, and terminates when idle. Each pool has a configured size that determines the cluster’s worker memory budget and vCPU count. The operator schedules NQL jobs—including materialize-view, nql-forecast, datasets_sample, and datasets_execute_dml—onto the appropriate cluster based on the compute pool selected for the workload. Sizes mirror Snowflake’s warehouse vocabulary (x_small through 6x_large) and target the same worker memory budget as the equivalent Snowflake warehouse, so a workload that fits in a Snowflake warehouse of size N gets equivalent RAM on the EMR cluster of size N. vCPU counts won’t match Snowflake because EMR uses memory-optimized instances (8 GiB per vCPU), so the same memory budget delivers fewer vCPUs.
SizeWorker memoryvCPU (max)Notes
x_small~32 GiB~4Fixed size; rounded up from Snowflake’s 16 GiB minimum
small~32 GiB~4Fixed size
medium~64 GiB~8Fixed size
large~128 GiB~16Fixed size
x_large~256 GiB~32EMR Managed Scaling
2x_large~512 GiB~64EMR Managed Scaling
3x_large~1 TiB~128EMR Managed Scaling
4x_large~2 TiB~256EMR Managed Scaling
5x_large~4 TiB~512EMR Managed Scaling
6x_large~8 TiB~1024EMR Managed Scaling
Sizes x_large and above use EMR Managed Scaling: the cluster boots small and expands toward the maximum based on YARN load, then contracts back down when idle. Sizes large and below run at a fixed instance count.

Idle and job execution timeouts

EMR-backed pools expose two optional tunables that control cluster lifetime and per-job runtime. Both are set on the aws_emr provider when you create or update a pool, and both are validated server-side — invalid values return HTTP 400.
FieldRangeDefaultWhat it controls
idle_timeout_seconds0, or 60604800 (7 days)Operator default (15 minutes)How long an EMR cluster sits idle before auto-termination. 0 disables idle-termination entirely; the cluster runs until it is explicitly terminated or recycled.
job_execution_timeout_seconds60604800 (7 days)No enforcementMaximum time a single job’s EMR step may stay in RUNNING. When exceeded, the operator cancels the step and fails the job with an explanatory message.
The shared Narrative compute pool is configured with a 1-hour job execution timeout and is intended for small jobs. Pools you provision for yourself can set their own (longer) limit, or omit the field entirely to let jobs run indefinitely.
Omitting either field (or setting it to null) keeps the operator’s built-in behavior. Set idle_timeout_seconds to 0 only when you want a long-lived cluster — for example, a pool dedicated to back-to-back batch jobs where boot latency dominates run time.

Which compute pools are available

The compute pool options available to you depend on your data plane’s underlying provider:
ProviderAvailable compute poolsNotes
SnowflakeSnowflake warehouseOne compute pool per registered warehouse
Narrative (shared AWS)Dedicated, SharedChoose based on workload requirements
Customer AWSAWS EMR, Dedicated, SharedEMR-backed pools support sized Spark clusters for NQL workloads
You select your compute pool through the context selector in the platform’s top navigation.

Default compute pool resolution

When a job is created without an explicit compute pool, the platform resolves one through a four-level fallback chain. Resolution happens at job-creation time, so a job’s compute_pool_id is fixed before it lands in Pending. The first level present wins.
LevelSourceWhen it applies
1. Job-specificThe computePoolId passed in the request body (or a workflow task input)A specific job needs a non-default pool
2. Dataset defaultdataset.computePoolConfig.defaultComputePoolIdDataset-scoped operations (refresh, sample, execute-DML, stats) when the request didn’t pin a pool
3. Company default (per data plane)companies.compute_pool_config.by_data_plane[<dataPlaneId>].default_compute_pool_idThe company has a catch-all default for the job’s data plane and nothing earlier supplied a pool. Scoped per data plane because a company can own pools across multiple data planes. Covers the wider surface (model training, model inference, healthcheck) that has no dataset analog.
4. Data plane defaultdataPlane.defaultComputePoolIdCatch-all when nothing else resolved
A company admin can set the level-3 default for a given data plane with:
PUT /company/{companyId}/data-planes/{dataPlaneId}/default-compute-pool/{computePoolId}
and clear it with:
DELETE /company/{companyId}/data-planes/{dataPlaneId}/default-compute-pool
Both endpoints require the Company Info write permission and return 204 No Content on success.
Use a dataset default for “all my materialize-view refreshes on this dataset run on a large pool”, and a company default for the broader set of operations that have no dataset to attach to — model inference, healthchecks, or non-dataset NQL workloads. The two configs are intentionally separate so each can grow operation-specific knobs without polluting the other.

When to use each type

ScenarioRecommended poolWhy
Production data pipelinesDedicatedPredictable performance, no resource contention
Ad-hoc data explorationSharedCost-effective for variable, low-priority workloads
Testing queries before productionSharedSaves dedicated resources for production use
Time-sensitive audience buildsDedicatedGuaranteed resources ensure timely completion
Snowflake data planesSnowflake warehouseRegister one or more warehouses sized for your workload

Archiving compute pools

Deleting a compute pool is a soft delete: the pool’s status changes to archived, the record stays in place, and it is filtered out of list responses but still resolvable by id. Archival behavior diverges by provider because each integration has different operational realities:
Reference pathSnowflake data planesAWS data planes
Pool is the data plane defaultThe platform re-elects another active pool as the new default (or clears the default if none remain), then archives the poolThe archive is rejected; you must clear or change the data plane default before retrying
Pool is set as a dataset defaultThe platform clears the default on every affected dataset, then archives the pool. Affected datasets fall through to the rest of the chain on their next jobThe archive is rejected; you must clear the dataset defaults before retrying
Pool is set as a company default (per data plane)The platform clears the per-data-plane entry on every affected company, then archives the pool. Companies fall through to the data plane default on their next jobThe archive is rejected; you must clear the company defaults before retrying
Pool has in-flight Pending or Running jobsArchive succeeds; jobs are handled at runtime (see below)Archive succeeds; jobs are handled at runtime (see below)
Snowflake users frequently mutate their warehouse list outside the platform — through the Snowflake UI, permissions, or infrastructure as code. When a backing warehouse disappears, the platform has to track reality, so the archive must succeed and dependent references are updated automatically. AWS compute pools are wholly managed inside the platform, so the API rejects an archive that would leave a dangling default and forces an explicit decision.

Runtime impact on in-flight jobs

Neither archive flow blocks in-flight jobs at archive time. Instead, the data plane operator fails any job whose compute pool resolves to a non-active status on its next polling iteration. The job reports as failed with an actionable message similar to:
compute pool '<id>' could not be resolved (it may have been archived or does not exist).
Resubmit the job with an active compute pool.
This is uniform across Pending, Running, and PendingCancellation jobs. Jobs in PendingCancellation are failed rather than cancelled because the missing pool is the actionable signal — a successful cancellation report would mask the root cause. See the Archive a compute pool guide for the API workflow and the order in which to clear references.
On AWS data planes, the first compute pool you create is not automatically promoted to the data plane default. This is the permanent behavior: jobs targeting an AWS data plane must either pin an explicit computePoolId, resolve to a dataset-level default, or have a data plane default set explicitly via PUT /data-planes/{id}/default-compute-pool/{poolId}. The intent is that AWS workloads make a deliberate choice about where they run rather than implicitly routing through a “first pool wins” default.On Snowflake data planes, the first compute pool you create is currently auto-promoted to the data plane default for backward compatibility while the Snowflake-side migration off implicit defaults is in flight. This behavior is temporary and will be removed once every Snowflake workload pins its compute pool explicitly.
Every newly registered company is provisioned a private, x-small AWS EMR compute pool on the Narrative data plane and that pool is set as the company-level default for the same data plane. Jobs that don’t pin a pool explicitly resolve to this default through the pool resolution chain (job → dataset → company → data plane). You can rename, archive, or replace the default at any time.

external_id is optional for aws_emr providers

When creating a compute pool with the aws_emr provider via POST /data-planes/{id}/compute-pools, you can omit external_id from the request body — the server fills in the trivial {"type": "aws_emr"} payload automatically. Supplying it explicitly still works, and the type is validated against the provider. For the snowflake_warehouse provider, external_id is still required because it carries the warehouse name and alias that the platform uses to dispatch queries to the correct Snowflake warehouse.

How compute pools relate to the SDK

When executing queries through the TypeScript SDK, the execution_cluster parameter maps to the compute pool concept:
const result = await api.executeNql({
  nql: 'SELECT _nio_id, _nio_updated_at FROM company_data."my_dataset" LIMIT 100',
  data_plane_id: null,
  execution_cluster: { type: 'dedicated' },
});
The execution_cluster.type accepts 'dedicated' or 'shared', corresponding directly to the Dedicated and Shared compute pool types. If omitted, the data plane’s default compute pool is used. For Snowflake-based data planes, omitting execution_cluster uses the data plane’s default compute pool (the warehouse you’ve designated as default).

Execution Context

How data plane, compute pool, database, and schema work together

Data Planes

Where your data lives and is processed

Executing NQL Queries

Run queries programmatically with the TypeScript SDK

Migrate to Compute Pools

Transition from a single Snowflake warehouse to compute pools

Archive a Compute Pool

Safely retire a compute pool and clear its references