Skip to main content
The job queue coordinates work between the control plane and data planes. This reference describes the different types of jobs that flow through the queue.

Overview

Jobs are the unit of work exchanged between the control plane and data plane operators. When the control plane needs something executed in a data plane, it creates a job and enqueues it. Data plane operators poll the queue, claim jobs, execute them, and report results.

Job states

Every job progresses through a lifecycle tracked by its state.
StateDescription
pendingJob is queued and waiting for execution
runningJob is currently being executed
completedJob finished successfully
pending_cancellationJob is marked for cancellation but still running
cancelledJob was cancelled before completion
failedJob execution failed

Core job structure

All jobs share common fields that identify and track them through the system.
FieldDescription
idUnique identifier for the job
typeJob type string that determines routing and execution
stateCurrent lifecycle state
company_idCompany that owns the job
data_plane_idTarget data plane for execution (if specified)
created_atWhen the job was created
updated_atWhen the job was last modified
completed_atWhen the job finished (if applicable)
failuresList of failure records if the job encountered errors

Forecasting jobs

Jobs that estimate data volumes and costs before committing to operations.

forecast

Estimates data volumes and costs for a marketplace subscription before purchase. This job runs when you preview a subscription in the marketplace or use the forecasting API. It samples data to quickly estimate costs without scanning the full dataset. Executor: Spark Data Plane
{
  "as_of": "2024-01-01T00:00:00Z",
  "details": {
    "type": "marketplace",
    "company_constraint": {
      "type": "inclusion",
      "company_ids": [56]
    },
    "data_rules": {
      "attributes": [
        {
          "attribute_id": 1,
          "fields": [
            { "field": "unique_id.type", "filter": null, "exported": true },
            { "field": "unique_id.value", "filter": "include_only_if_not_null_filter", "exported": true }
          ],
          "optional": false
        }
      ],
      "ingestion_timestamp_filter": {
        "recency": "P30D",
        "from": { "type": "inclusive", "value": "2024-01-01T00:00:00Z" }
      }
    },
    "pricing": { "micro_cents_usd": 1000000000 }
  },
  "dimensions": { "distinct_counts": [], "group_by": null },
  "sample_rate_denominator": 128,
  "schedule_unsampled_forecast": false
}
{
  "type": "success",
  "cost": 5000000000,
  "rows": 1500000,
  "datasets": [
    { "dataset_id": 424, "rows": 1500000, "cost": 5000000000 }
  ],
  "estimator": { "type": "sample", "rate": 0.0078125 },
  "scan_cost": {
    "rows": 190001186647,
    "size": 6286551251334,
    "sources": [
      { "rows": 190001186647, "size": 6286551251334, "source": { "type": "dataset", "dataset_id": 424 } }
    ]
  }
}

forecast-internal

This is an internal job type used by the system. You may see it in job listings but cannot create it directly.
Internal variant of the forecast job used for unsampled follow-up forecasts and accuracy comparisons. This job runs automatically after a sampled forecast job completes when the system needs precise cost estimates. It uses the same input and output format as the forecast job. Executor: Spark Data Plane

costs

Calculates scan costs for a subscription query without performing a full forecast. This job provides a quick cost estimate by analyzing the query plan without sampling data. It’s faster than a full forecast but provides less detail. Executor: Spark Data Plane
{
  "as_of": null,
  "details": {
    "type": "marketplace",
    "company_constraint": {
      "type": "inclusion",
      "company_ids": [56]
    },
    "data_rules": {
      "attributes": [
        {
          "attribute_id": 1,
          "fields": [
            { "field": "unique_id.type", "filter": null, "exported": true },
            { "field": "unique_id.value", "filter": "include_only_if_not_null_filter", "exported": true }
          ]
        }
      ]
    },
    "pricing": { "micro_cents_usd": 1000000000 }
  }
}
{
  "type": "success",
  "cost": {
    "rows": 190001186647,
    "size": 6286551251334,
    "sources": [
      { "rows": 0, "size": 0, "source": { "type": "dataset", "dataset_id": 340 } },
      { "rows": 190001186647, "size": 6286551251334, "source": { "type": "dataset", "dataset_id": 424 } }
    ]
  }
}

nql-forecast

Executes an NQL forecast query to estimate results without materializing data. This job runs when you preview an NQL query to see estimated row counts and structure before creating a materialized view. Executor: Spark Data Plane
{
  "nql": "SELECT COUNT(*) FROM narrative.rosetta WHERE country = 'US'",
  "compiled_sql": "SELECT COUNT(*) FROM rosetta_table WHERE country = 'US'"
}
{
  "type": "Success",
  "result": {
    "columns": ["count"],
    "rows": [[1500000]]
  }
}
{
  "type": "Failure",
  "msg": "Query execution timed out after 300 seconds"
}

Materialization jobs

Jobs that execute queries and persist results to datasets.

materialize-view

Executes NQL to refresh a materialized view dataset. This is the core job type for dataset refreshes. It runs the compiled query against source data and writes results to the target dataset as an Iceberg table snapshot. Executor: Spark Data Plane
{
  "nql": "SELECT unique_id FROM narrative.rosetta WHERE timestamp > _nio_start_ts",
  "compiled_select": "SELECT unique_id FROM rosetta_table WHERE timestamp > '2024-01-01'",
  "dataset_id": 1234,
  "stats_enabled": true,
  "contains_delta_syntax": true,
  "first_run": false,
  "merge": false,
  "partitions": [
    { "field": "timestamp", "fn": "day", "args": [] }
  ],
  "chunk_metadata": {
    "batch_id": "550e8400-e29b-41d4-a716-446655440000",
    "chunk_sequence": { "number": 1, "of": 5 }
  }
}
{
  "dataset_id": 1234,
  "snapshot_id": 9876543210,
  "recalculation_id": "550e8400-e29b-41d4-a716-446655440001"
}
Related: Materialized Views, Query Processing

Dataset operations

Jobs that manage dataset tables and data within the data plane.

datasets_create_table

Creates a new table in the data plane for a dataset. This job runs when provisioning a new dataset. It creates the underlying Iceberg table in the company’s schema. Executor: Snowflake Data Plane
{
  "dataset_id": 1234,
  "compiled_sql": "CREATE TABLE company_56.dataset_1234 (id STRING, name STRING, created_at TIMESTAMP_TZ) CLUSTER BY (created_at)"
}
{
  "type": "datasets_create_table"
}

datasets_delete_table

Deletes a dataset’s table from the data plane. This job runs when deprovisioning a dataset. It drops the table and removes associated Iceberg metadata. Executor: Snowflake Data Plane
{
  "external_id": "company_56.dataset_1234",
  "is_narrative_managed": true
}
{
  "type": "datasets_delete_table"
}

datasets_truncate_table

Truncates all data from a dataset’s table while preserving the table structure. This job removes all rows from the dataset but keeps the schema intact for future use. Executor: Snowflake Data Plane
{
  "dataset_id": 1234
}
{
  "type": "datasets_truncate_table"
}

datasets_sample

Generates a sample of data from a dataset for preview purposes. This job retrieves a small subset of rows—including any Rosetta Stone normalized attributes—for display in the UI. The sample data is stored separately from the main dataset. Executor: Snowflake Data Plane, AWS Data Plane
{
  "dataset_id": 1234
}
{
  "type": "datasets_sample"
}
Related: Data Flow, Managing Datasets

datasets_calculate_column_stats

Calculates column-level statistics for a dataset. This job computes statistics like null counts, value ranges, distinct counts, and histograms. Stats are persisted to the dataset metrics store for use in query optimization and data profiling. Executor: Snowflake Data Plane
{
  "dataset_id": 1234,
  "enabled_column_stats": {
    "user_id": {
      "data_type": "string",
      "enabled_column_stats": {
        "null_value_count": true,
        "value_count": true,
        "lower_bound": true,
        "upper_bound": true,
        "approx_count_distinct": true,
        "completeness": true
      }
    },
    "amount": {
      "data_type": "double",
      "enabled_column_stats": {
        "nan_value_count": true,
        "null_value_count": true,
        "value_count": true,
        "lower_bound": true,
        "upper_bound": true,
        "histogram": true,
        "mean": true,
        "standard_deviation": true,
        "completeness": true
      }
    }
  }
}
{
  "type": "datasets_calculate_column_stats"
}

datasets_execute_dml

Executes a DML statement against a dataset. This job handles INSERT, UPDATE, DELETE, and MERGE operations on dataset tables. Executor: Snowflake Data Plane
{
  "dataset_id": 1234,
  "nql": "INSERT INTO my_dataset SELECT * FROM source_table WHERE status = 'active'",
  "compiled_sql": "INSERT INTO company_56.dataset_1234 SELECT * FROM company_56.source_table WHERE status = 'active'",
  "nio_last_modified_at": "2024-01-15T10:30:00Z"
}
{
  "type": "datasets_execute_dml",
  "affected_rows": 15000
}

Model operations

Jobs that run inference and train machine learning models.

model_inference_run

Runs inference using a large language model (LLM). This job sends prompts to models from Anthropic (Claude) or OpenAI and returns structured output. All inference runs within the data plane—no data is sent to external providers. Executor: AWS Data Plane
{
  "model": "anthropic.claude-sonnet-4.5",
  "messages": [
    {
      "role": "system",
      "text": "You are a helpful assistant that extracts structured data."
    },
    {
      "role": "user",
      "text": "Extract the company name and revenue from: Acme Corp reported $1.5M in Q4 revenue."
    }
  ],
  "inference_config": {
    "output_format_schema": {
      "type": "object",
      "properties": {
        "company_name": { "type": "string" },
        "revenue": { "type": "number" }
      },
      "required": ["company_name", "revenue"]
    },
    "max_tokens": 1000,
    "temperature": 0.1
  }
}
{
  "type": "model_inference_run",
  "structured_output": {
    "company_name": "Acme Corp",
    "revenue": 1500000
  },
  "usage": {
    "completion_tokens": 25,
    "prompt_tokens": 150,
    "total_tokens": 175
  }
}
Related: Model Inference Overview, Running Model Inference, Supported Models

model_training_run

Executes a model fine-tuning job. This job trains a custom model using a base model and your training dataset. The fine-tuned model is stored in the Narrative model repository. Executor: AWS Data Plane
{
  "company_id": 56,
  "dataset_id": 1234,
  "instance_type": "ml.g5.2xlarge",
  "base_model_id": "meta-llama/Llama-2-7b-hf",
  "base_model_repository": "huggingface",
  "trained_model_version": "1.0.0",
  "trained_model": {
    "name": "Custom Llama for Classification",
    "collaborators": {
      "owner_company_id": 56,
      "viewer_company_ids": []
    },
    "description": "Fine-tuned Llama model for product classification",
    "tags": ["classification", "products"]
  }
}
{
  "type": "model_training_run",
  "result": {
    "model_id": "company-56-custom-llama-v1",
    "status": "completed",
    "metrics": {
      "training_loss": 0.15,
      "validation_loss": 0.18
    }
  }
}

models_train_classifier

Trains a classifier model using Snowflake’s built-in ML capabilities. This job uses Snowflake-native training to build classification models directly in the data plane. Executor: Snowflake Data Plane
{
  "config": {
    "model_type": "classification",
    "target_column": "category",
    "feature_columns": ["description", "price", "brand"],
    "training_table": "company_56.training_data",
    "output_model_name": "product_classifier_v1"
  },
  "data_plane_id": "dp-snowflake-123",
  "tags": ["classification"]
}
{
  "type": "models_train_classifier",
  "result": {
    "model_name": "product_classifier_v1",
    "accuracy": 0.92,
    "f1_score": 0.89,
    "feature_importance": {
      "description": 0.45,
      "price": 0.35,
      "brand": 0.20
    }
  }
}

models_deliver_model

Delivers a trained model to an external destination. This job exports a model from the Narrative repository to destinations like HuggingFace Hub. Executor: AWS Data Plane
{
  "company_id": 56,
  "app_id": 789,
  "connection_id": "conn-123",
  "profile_id": "550e8400-e29b-41d4-a716-446655440000",
  "model_id": "company-56-custom-llama",
  "model_name": "Custom Llama Classifier",
  "model_version": "1.0.0",
  "quick_settings": {
    "repository_visibility": "private"
  }
}
{
  "type": "models_deliver_model",
  "result": {
    "destination_url": "https://huggingface.co/company-56/custom-llama-classifier",
    "status": "completed",
    "model_files_uploaded": 12,
    "total_size_bytes": 14000000000
  }
}

System jobs

Administrative jobs that support platform operations.

materialized_views_collect_access_rules_billing_data

This is an internal system job. You may see it in job listings but it runs automatically as part of materialized view refreshes.
Collects billing data for access rules associated with a materialized view refresh. This job runs after a materialize-view job completes to track data consumption from different access rules and generate billing records. Executor: Spark Data Plane
{
  "dataset_id": 1234,
  "nio_last_modified_at": "2024-01-15T10:30:00Z",
  "compiled_query": "SELECT _nio_access_rule_id, COUNT(*) as rows, _nio_price_per_row FROM company_56.dataset_1234 WHERE _nio_last_modified_at = '2024-01-15T10:30:00Z' GROUP BY _nio_access_rule_id, _nio_price_per_row",
  "refresh_job_id": "550e8400-e29b-41d4-a716-446655440000"
}
{
  "type": "materialized_views_collect_access_rules_billing_data",
  "sources": [
    {
      "access_rule_id": 456,
      "rows": 50000,
      "price_per_row": { "micro_cents_usd": 100 }
    },
    {
      "access_rule_id": 789,
      "rows": 25000,
      "price_per_row": { "micro_cents_usd": 150 }
    }
  ]
}