Job Types - Narrative I/O Knowledge Base

The job queue coordinates work between the control plane and data planes. This reference describes the different types of jobs that flow through the queue.

Overview

Jobs are the unit of work exchanged between the control plane and data plane operators. When the control plane needs something executed in a data plane, it creates a job and enqueues it. Data plane operators poll the queue, claim jobs, execute them, and report results.

Job states

Every job progresses through a lifecycle tracked by its state.

State	Description
`pending`	Job is queued and waiting for execution
`running`	Job is currently being executed
`completed`	Job finished successfully
`pending_cancellation`	Job is marked for cancellation but still running
`cancelled`	Job was cancelled before completion
`failed`	Job execution failed

Core job structure

All jobs share common fields that identify and track them through the system.

Field	Description
`id`	Unique identifier for the job
`type`	Job type string that determines routing and execution
`state`	Current lifecycle state
`company_id`	Company that owns the job
`data_plane_id`	Target data plane for execution (if specified)
`created_at`	When the job was created
`updated_at`	When the job was last modified
`completed_at`	When the job finished (if applicable)
`failures`	List of failure records if the job encountered errors

Forecasting jobs

Jobs that estimate data volumes and costs before committing to operations.

forecast

Estimates data volumes and costs for a marketplace subscription before purchase. This job runs when you preview a subscription in the marketplace or use the forecasting API. It samples data to quickly estimate costs without scanning the full dataset. Executor: Spark Data Plane

Input example

{
  "as_of": "2024-01-01T00:00:00Z",
  "details": {
    "type": "marketplace",
    "company_constraint": {
      "type": "inclusion",
      "company_ids": [56]
    },
    "data_rules": {
      "attributes": [
        {
          "attribute_id": 1,
          "fields": [
            { "field": "unique_id.type", "filter": null, "exported": true },
            { "field": "unique_id.value", "filter": "include_only_if_not_null_filter", "exported": true }
          ],
          "optional": false
        }
      ],
      "ingestion_timestamp_filter": {
        "recency": "P30D",
        "from": { "type": "inclusive", "value": "2024-01-01T00:00:00Z" }
      }
    },
    "pricing": { "micro_cents_usd": 1000000000 }
  },
  "dimensions": { "distinct_counts": [], "group_by": null },
  "sample_rate_denominator": 128,
  "schedule_unsampled_forecast": false
}

Output example

{
  "type": "success",
  "cost": 5000000000,
  "rows": 1500000,
  "datasets": [
    { "dataset_id": 424, "rows": 1500000, "cost": 5000000000 }
  ],
  "estimator": { "type": "sample", "rate": 0.0078125 },
  "scan_cost": {
    "rows": 190001186647,
    "size": 6286551251334,
    "sources": [
      { "rows": 190001186647, "size": 6286551251334, "source": { "type": "dataset", "dataset_id": 424 } }
    ]
  }
}

forecast-internal

This is an internal job type used by the system. You may see it in job listings but cannot create it directly.

Internal variant of the forecast job used for unsampled follow-up forecasts and accuracy comparisons. This job runs automatically after a sampled forecast job completes when the system needs precise cost estimates. It uses the same input and output format as the forecast job. Executor: Spark Data Plane

costs

Calculates scan costs for a subscription query without performing a full forecast. This job provides a quick cost estimate by analyzing the query plan without sampling data. It’s faster than a full forecast but provides less detail. Executor: Spark Data Plane

Input example

{
  "as_of": null,
  "details": {
    "type": "marketplace",
    "company_constraint": {
      "type": "inclusion",
      "company_ids": [56]
    },
    "data_rules": {
      "attributes": [
        {
          "attribute_id": 1,
          "fields": [
            { "field": "unique_id.type", "filter": null, "exported": true },
            { "field": "unique_id.value", "filter": "include_only_if_not_null_filter", "exported": true }
          ]
        }
      ]
    },
    "pricing": { "micro_cents_usd": 1000000000 }
  }
}

Output example

{
  "type": "success",
  "cost": {
    "rows": 190001186647,
    "size": 6286551251334,
    "sources": [
      { "rows": 0, "size": 0, "source": { "type": "dataset", "dataset_id": 340 } },
      { "rows": 190001186647, "size": 6286551251334, "source": { "type": "dataset", "dataset_id": 424 } }
    ]
  }
}

nql-forecast

Executes an NQL forecast query to estimate results without materializing data. This job runs when you preview an NQL query to see estimated row counts and structure before creating a materialized view. Executor: Spark Data Plane

Input example

{
  "nql": "SELECT COUNT(*) FROM narrative.rosetta WHERE country = 'US'",
  "compiled_sql": "SELECT COUNT(*) FROM rosetta_table WHERE country = 'US'"
}

Output example (success)

{
  "type": "Success",
  "result": {
    "columns": ["count"],
    "rows": [[1500000]]
  }
}

Output example (failure)

{
  "type": "Failure",
  "msg": "Query execution timed out after 300 seconds"
}

Materialization jobs

Jobs that execute queries and persist results to datasets.

materialize-view

Executes NQL to refresh a materialized view dataset. This is the core job type for dataset refreshes. It runs the compiled query against source data and writes results to the target dataset as an Iceberg table snapshot. Executor: Spark Data Plane

Input example

{
  "nql": "SELECT unique_id FROM narrative.rosetta WHERE timestamp > _nio_start_ts",
  "compiled_select": "SELECT unique_id FROM rosetta_table WHERE timestamp > '2024-01-01'",
  "dataset_id": 1234,
  "stats_enabled": true,
  "contains_delta_syntax": true,
  "first_run": false,
  "merge": false,
  "partitions": [
    { "field": "timestamp", "fn": "day", "args": [] }
  ],
  "chunk_metadata": {
    "batch_id": "550e8400-e29b-41d4-a716-446655440000",
    "chunk_sequence": { "number": 1, "of": 5 }
  }
}

Output example

{
  "dataset_id": 1234,
  "snapshot_id": 9876543210,
  "recalculation_id": "550e8400-e29b-41d4-a716-446655440001"
}

Related: Materialized Views, Query Processing

Dataset operations

Jobs that manage dataset tables and data within the data plane.

datasets_create_table

Creates a new table in the data plane for a dataset. This job runs when provisioning a new dataset. It creates the underlying Iceberg table in the company’s schema. Executor: Snowflake Data Plane

Input example

{
  "dataset_id": 1234,
  "compiled_sql": "CREATE TABLE company_56.dataset_1234 (id STRING, name STRING, created_at TIMESTAMP_TZ) CLUSTER BY (created_at)"
}

Output example

{
  "type": "datasets_create_table"
}

datasets_delete_table

Deletes a dataset’s table from the data plane. This job runs when deprovisioning a dataset. It drops the table and removes associated Iceberg metadata. Executor: Snowflake Data Plane

Input example

{
  "external_id": "company_56.dataset_1234",
  "is_narrative_managed": true
}

Output example

{
  "type": "datasets_delete_table"
}

datasets_truncate_table

Truncates all data from a dataset’s table while preserving the table structure. This job removes all rows from the dataset but keeps the schema intact for future use. Executor: Snowflake Data Plane

Input example

{
  "dataset_id": 1234
}

Output example

{
  "type": "datasets_truncate_table"
}

datasets_sample

Generates a sample of data from a dataset for preview purposes. This job retrieves a small subset of rows—including any Rosetta Stone normalized attributes—for display in the UI. The sample data is stored separately from the main dataset. Executor: Snowflake Data Plane, AWS Data Plane

Input example

{
  "dataset_id": 1234
}

Output example

{
  "type": "datasets_sample"
}

Related: Data Flow, Managing Datasets

datasets_calculate_column_stats

Calculates column-level statistics for a dataset. This job computes statistics like null counts, value ranges, distinct counts, and histograms. Stats are persisted to the dataset metrics store for use in query optimization and data profiling. Executor: Snowflake Data Plane

Input example

{
  "dataset_id": 1234,
  "enabled_column_stats": {
    "user_id": {
      "data_type": "string",
      "enabled_column_stats": {
        "null_value_count": true,
        "value_count": true,
        "lower_bound": true,
        "upper_bound": true,
        "approx_count_distinct": true,
        "completeness": true
      }
    },
    "amount": {
      "data_type": "double",
      "enabled_column_stats": {
        "nan_value_count": true,
        "null_value_count": true,
        "value_count": true,
        "lower_bound": true,
        "upper_bound": true,
        "histogram": true,
        "mean": true,
        "standard_deviation": true,
        "completeness": true
      }
    }
  }
}

Output example

{
  "type": "datasets_calculate_column_stats"
}

datasets_execute_dml

Executes a DML statement against a dataset. This job handles INSERT, UPDATE, DELETE, and MERGE operations on dataset tables. Executor: Snowflake Data Plane

Input example

{
  "dataset_id": 1234,
  "nql": "INSERT INTO my_dataset SELECT * FROM source_table WHERE status = 'active'",
  "compiled_sql": "INSERT INTO company_56.dataset_1234 SELECT * FROM company_56.source_table WHERE status = 'active'",
  "nio_last_modified_at": "2024-01-15T10:30:00Z"
}

Output example

{
  "type": "datasets_execute_dml",
  "affected_rows": 15000
}

Model operations

Jobs that run inference and train machine learning models.

model_inference_run

Runs inference using a large language model (LLM). This job sends prompts to models from Anthropic (Claude) or OpenAI and returns structured output. All inference runs within the data plane—no data is sent to external providers. Executor: AWS Data Plane

Input example

{
  "model": "anthropic.claude-sonnet-4.5",
  "messages": [
    {
      "role": "system",
      "text": "You are a helpful assistant that extracts structured data."
    },
    {
      "role": "user",
      "text": "Extract the company name and revenue from: Acme Corp reported $1.5M in Q4 revenue."
    }
  ],
  "inference_config": {
    "output_format_schema": {
      "type": "object",
      "properties": {
        "company_name": { "type": "string" },
        "revenue": { "type": "number" }
      },
      "required": ["company_name", "revenue"]
    },
    "max_tokens": 1000,
    "temperature": 0.1
  }
}

Output example

{
  "type": "model_inference_run",
  "structured_output": {
    "company_name": "Acme Corp",
    "revenue": 1500000
  },
  "usage": {
    "completion_tokens": 25,
    "prompt_tokens": 150,
    "total_tokens": 175
  }
}

model_training_run

Executes a model fine-tuning job. This job trains a custom model using a base model and your training dataset. The fine-tuned model is stored in the Narrative model repository. Executor: AWS Data Plane

Input example

{
  "company_id": 56,
  "dataset_id": 1234,
  "instance_type": "ml.g5.2xlarge",
  "base_model_id": "meta-llama/Llama-2-7b-hf",
  "base_model_repository": "huggingface",
  "trained_model_version": "1.0.0",
  "trained_model": {
    "name": "Custom Llama for Classification",
    "collaborators": {
      "owner_company_id": 56,
      "viewer_company_ids": []
    },
    "description": "Fine-tuned Llama model for product classification",
    "tags": ["classification", "products"]
  }
}

Output example

{
  "type": "model_training_run",
  "result": {
    "model_id": "company-56-custom-llama-v1",
    "status": "completed",
    "metrics": {
      "training_loss": 0.15,
      "validation_loss": 0.18
    }
  }
}

models_train_classifier

Trains a classifier model using Snowflake’s built-in ML capabilities. This job uses Snowflake-native training to build classification models directly in the data plane. Executor: Snowflake Data Plane

Input example

{
  "config": {
    "model_type": "classification",
    "target_column": "category",
    "feature_columns": ["description", "price", "brand"],
    "training_table": "company_56.training_data",
    "output_model_name": "product_classifier_v1"
  },
  "data_plane_id": "dp-snowflake-123",
  "tags": ["classification"]
}

Output example

{
  "type": "models_train_classifier",
  "result": {
    "model_name": "product_classifier_v1",
    "accuracy": 0.92,
    "f1_score": 0.89,
    "feature_importance": {
      "description": 0.45,
      "price": 0.35,
      "brand": 0.20
    }
  }
}

models_deliver_model

Delivers a trained model to an external destination. This job exports a model from the Narrative repository to destinations like HuggingFace Hub. Executor: AWS Data Plane

Input example

{
  "company_id": 56,
  "app_id": 789,
  "connection_id": "conn-123",
  "profile_id": "550e8400-e29b-41d4-a716-446655440000",
  "model_id": "company-56-custom-llama",
  "model_name": "Custom Llama Classifier",
  "model_version": "1.0.0",
  "quick_settings": {
    "repository_visibility": "private"
  }
}

Output example

{
  "type": "models_deliver_model",
  "result": {
    "destination_url": "https://huggingface.co/company-56/custom-llama-classifier",
    "status": "completed",
    "model_files_uploaded": 12,
    "total_size_bytes": 14000000000
  }
}

System jobs

Administrative jobs that support platform operations.

materialized_views_collect_access_rules_billing_data

This is an internal system job. You may see it in job listings but it runs automatically as part of materialized view refreshes.

Collects billing data for access rules associated with a materialized view refresh. This job runs after a materialize-view job completes to track data consumption from different access rules and generate billing records. Executor: Spark Data Plane

Input example

{
  "dataset_id": 1234,
  "nio_last_modified_at": "2024-01-15T10:30:00Z",
  "compiled_query": "SELECT _nio_access_rule_id, COUNT(*) as rows, _nio_price_per_row FROM company_56.dataset_1234 WHERE _nio_last_modified_at = '2024-01-15T10:30:00Z' GROUP BY _nio_access_rule_id, _nio_price_per_row",
  "refresh_job_id": "550e8400-e29b-41d4-a716-446655440000"
}

Output example

{
  "type": "materialized_views_collect_access_rules_billing_data",
  "sources": [
    {
      "access_rule_id": 456,
      "rows": 50000,
      "price_per_row": { "micro_cents_usd": 100 }
    },
    {
      "access_rule_id": 789,
      "rows": 25000,
      "price_per_row": { "micro_cents_usd": 150 }
    }
  ]
}

Query Processing

How queries flow through the job queue to execution

Data Planes

Where jobs are executed by operators

Tracking Job Status

Monitor jobs and handle completion events

Data Flow

How data moves through the system

Overview

NQL Reference

Connectors

Integrations

Workflow Reference

Webhook Reference

Security Reference

Architecture Reference

Model Inference Reference

Rosetta Stone Reference

UI Reference

SDKs

Glossary

Billing

​Overview

​Job states

​Core job structure

​Forecasting jobs

​forecast

​forecast-internal

​costs

​nql-forecast

​Materialization jobs

​materialize-view

​Dataset operations

​datasets_create_table

​datasets_delete_table

​datasets_truncate_table

​datasets_sample

​datasets_calculate_column_stats

​datasets_execute_dml

​Model operations

​model_inference_run

​model_training_run

​models_train_classifier

​models_deliver_model

​System jobs

​materialized_views_collect_access_rules_billing_data

​Related content

Query Processing

Data Planes

Tracking Job Status

Data Flow

Overview

Job states

Core job structure

Forecasting jobs

forecast

forecast-internal

costs

nql-forecast

Materialization jobs

materialize-view

Dataset operations

datasets_create_table

datasets_delete_table

datasets_truncate_table

datasets_sample

datasets_calculate_column_stats

datasets_execute_dml

Model operations

model_inference_run

model_training_run

models_train_classifier

models_deliver_model

System jobs

materialized_views_collect_access_rules_billing_data

Related content