Compute pool types
Dedicated
Dedicated compute pools provide isolated resources reserved for your workloads. Your queries don’t compete with other users for processing power, which results in more predictable performance. Use dedicated compute pools when:- Running production workloads where performance consistency matters
- Processing large or complex queries that need guaranteed resources
- Operating time-sensitive pipelines where latency must stay predictable
Shared
Shared compute pools use pooled resources across multiple users. This is more cost-effective but means your query performance may vary depending on current platform load. Use shared compute pools when:- Running exploratory queries or ad-hoc analysis
- Developing and testing queries before promoting to production
- Working with smaller datasets where performance variability is acceptable
Snowflake warehouse
On Snowflake-based data planes, each compute pool maps to a Snowflake virtual warehouse. When you register warehouses through the Snowflake Native App, each warehouse becomes a compute pool on your data plane. You can register multiple warehouses to separate workloads—for example, a smaller warehouse for exploratory queries and a larger one for production pipelines. Each Snowflake compute pool has a collaboration policy that controls which companies can use it, and one pool can be designated as the default for the data plane.AWS EMR
On AWS-based data planes, compute pools map to Amazon EMR Spark clusters that the data plane operator provisions, reuses across jobs targeting the same pool, and terminates when idle. Each pool has a configured size that determines the cluster’s worker memory budget and vCPU count. The operator schedules NQL jobs—includingmaterialize-view, nql-forecast, datasets_sample, and datasets_execute_dml—onto the appropriate cluster based on the compute pool selected for the workload.
Sizes mirror Snowflake’s warehouse vocabulary (x_small through 6x_large) and target the same worker memory budget as the equivalent Snowflake warehouse, so a workload that fits in a Snowflake warehouse of size N gets equivalent RAM on the EMR cluster of size N. vCPU counts won’t match Snowflake because EMR uses memory-optimized instances (8 GiB per vCPU), so the same memory budget delivers fewer vCPUs.
| Size | Worker memory | vCPU (max) | Notes |
|---|---|---|---|
x_small | ~32 GiB | ~4 | Fixed size; rounded up from Snowflake’s 16 GiB minimum |
small | ~32 GiB | ~4 | Fixed size |
medium | ~64 GiB | ~8 | Fixed size |
large | ~128 GiB | ~16 | Fixed size |
x_large | ~256 GiB | ~32 | EMR Managed Scaling |
2x_large | ~512 GiB | ~64 | EMR Managed Scaling |
3x_large | ~1 TiB | ~128 | EMR Managed Scaling |
4x_large | ~2 TiB | ~256 | EMR Managed Scaling |
5x_large | ~4 TiB | ~512 | EMR Managed Scaling |
6x_large | ~8 TiB | ~1024 | EMR Managed Scaling |
x_large and above use EMR Managed Scaling: the cluster boots small and expands toward the maximum based on YARN load, then contracts back down when idle. Sizes large and below run at a fixed instance count.
Idle and job execution timeouts
EMR-backed pools expose two optional tunables that control cluster lifetime and per-job runtime. Both are set on theaws_emr provider when you create or update a pool, and both are validated server-side — invalid values return HTTP 400.
| Field | Range | Default | What it controls |
|---|---|---|---|
idle_timeout_seconds | 0, or 60–604800 (7 days) | Operator default (15 minutes) | How long an EMR cluster sits idle before auto-termination. 0 disables idle-termination entirely; the cluster runs until it is explicitly terminated or recycled. |
job_execution_timeout_seconds | 60–604800 (7 days) | No enforcement | Maximum time a single job’s EMR step may stay in RUNNING. When exceeded, the operator cancels the step and fails the job with an explanatory message. |
The shared Narrative compute pool is configured with a 1-hour job execution timeout and is intended for small jobs. Pools you provision for yourself can set their own (longer) limit, or omit the field entirely to let jobs run indefinitely.
null) keeps the operator’s built-in behavior. Set idle_timeout_seconds to 0 only when you want a long-lived cluster — for example, a pool dedicated to back-to-back batch jobs where boot latency dominates run time.
Which compute pools are available
The compute pool options available to you depend on your data plane’s underlying provider:| Provider | Available compute pools | Notes |
|---|---|---|
| Snowflake | Snowflake warehouse | One compute pool per registered warehouse |
| Narrative (shared AWS) | Dedicated, Shared | Choose based on workload requirements |
| Customer AWS | AWS EMR, Dedicated, Shared | EMR-backed pools support sized Spark clusters for NQL workloads |
Default compute pool resolution
When a job is created without an explicit compute pool, the platform resolves one through a four-level fallback chain. Resolution happens at job-creation time, so a job’scompute_pool_id is fixed before it lands in Pending. The first level present wins.
| Level | Source | When it applies |
|---|---|---|
| 1. Job-specific | The computePoolId passed in the request body (or a workflow task input) | A specific job needs a non-default pool |
| 2. Dataset default | dataset.computePoolConfig.defaultComputePoolId | Dataset-scoped operations (refresh, sample, execute-DML, stats) when the request didn’t pin a pool |
| 3. Company default (per data plane) | companies.compute_pool_config.by_data_plane[<dataPlaneId>].default_compute_pool_id | The company has a catch-all default for the job’s data plane and nothing earlier supplied a pool. Scoped per data plane because a company can own pools across multiple data planes. Covers the wider surface (model training, model inference, healthcheck) that has no dataset analog. |
| 4. Data plane default | dataPlane.defaultComputePoolId | Catch-all when nothing else resolved |
204 No Content on success.
When to use each type
| Scenario | Recommended pool | Why |
|---|---|---|
| Production data pipelines | Dedicated | Predictable performance, no resource contention |
| Ad-hoc data exploration | Shared | Cost-effective for variable, low-priority workloads |
| Testing queries before production | Shared | Saves dedicated resources for production use |
| Time-sensitive audience builds | Dedicated | Guaranteed resources ensure timely completion |
| Snowflake data planes | Snowflake warehouse | Register one or more warehouses sized for your workload |
Archiving compute pools
Deleting a compute pool is a soft delete: the pool’s status changes toarchived, the record stays in place, and it is filtered out of list responses but still resolvable by id. Archival behavior diverges by provider because each integration has different operational realities:
| Reference path | Snowflake data planes | AWS data planes |
|---|---|---|
| Pool is the data plane default | The platform re-elects another active pool as the new default (or clears the default if none remain), then archives the pool | The archive is rejected; you must clear or change the data plane default before retrying |
| Pool is set as a dataset default | The platform clears the default on every affected dataset, then archives the pool. Affected datasets fall through to the rest of the chain on their next job | The archive is rejected; you must clear the dataset defaults before retrying |
| Pool is set as a company default (per data plane) | The platform clears the per-data-plane entry on every affected company, then archives the pool. Companies fall through to the data plane default on their next job | The archive is rejected; you must clear the company defaults before retrying |
Pool has in-flight Pending or Running jobs | Archive succeeds; jobs are handled at runtime (see below) | Archive succeeds; jobs are handled at runtime (see below) |
Runtime impact on in-flight jobs
Neither archive flow blocks in-flight jobs at archive time. Instead, the data plane operator fails any job whose compute pool resolves to a non-active status on its next polling iteration. The job reports as failed with an actionable message similar to:
Pending, Running, and PendingCancellation jobs. Jobs in PendingCancellation are failed rather than cancelled because the missing pool is the actionable signal — a successful cancellation report would mask the root cause.
See the Archive a compute pool guide for the API workflow and the order in which to clear references.
On AWS data planes, the first compute pool you create is not automatically promoted to the data plane default. This is the permanent behavior: jobs targeting an AWS data plane must either pin an explicit
computePoolId, resolve to a dataset-level default, or have a data plane default set explicitly via PUT /data-planes/{id}/default-compute-pool/{poolId}. The intent is that AWS workloads make a deliberate choice about where they run rather than implicitly routing through a “first pool wins” default.On Snowflake data planes, the first compute pool you create is currently auto-promoted to the data plane default for backward compatibility while the Snowflake-side migration off implicit defaults is in flight. This behavior is temporary and will be removed once every Snowflake workload pins its compute pool explicitly.Every newly registered company is provisioned a private, x-small AWS EMR compute pool on the Narrative data plane and that pool is set as the company-level default for the same data plane. Jobs that don’t pin a pool explicitly resolve to this default through the pool resolution chain (job → dataset → company → data plane). You can rename, archive, or replace the default at any time.
external_id is optional for aws_emr providers
When creating a compute pool with the aws_emr provider via POST /data-planes/{id}/compute-pools, you can omit external_id from the request body — the server fills in the trivial {"type": "aws_emr"} payload automatically. Supplying it explicitly still works, and the type is validated against the provider.
For the snowflake_warehouse provider, external_id is still required because it carries the warehouse name and alias that the platform uses to dispatch queries to the correct Snowflake warehouse.
How compute pools relate to the SDK
When executing queries through the TypeScript SDK, theexecution_cluster parameter maps to the compute pool concept:
execution_cluster.type accepts 'dedicated' or 'shared', corresponding directly to the Dedicated and Shared compute pool types. If omitted, the data plane’s default compute pool is used.
For Snowflake-based data planes, omitting execution_cluster uses the data plane’s default compute pool (the warehouse you’ve designated as default).
Related content
Execution Context
How data plane, compute pool, database, and schema work together
Data Planes
Where your data lives and is processed
Executing NQL Queries
Run queries programmatically with the TypeScript SDK
Migrate to Compute Pools
Transition from a single Snowflake warehouse to compute pools
Archive a Compute Pool
Safely retire a compute pool and clear its references

