Workflow Orchestration

A workflow is a declarative pipeline that executes multiple data operations in sequence. Instead of manually triggering each step and waiting for it to finish before starting the next, you define the full pipeline once and let the system handle orchestration.

Why workflows exist

Workflows solve problems that arise when data pipelines involve multiple dependent steps: Manual orchestration is error-prone. If you need to create a materialized view, wait for it to finish, then create a second view that reads from the first, doing this manually introduces risk. You might trigger the second step before the first completes, or forget a step entirely. Complex pipelines need coordination. As pipelines grow beyond two or three steps, tracking which step is running, what’s finished, and what’s next becomes increasingly difficult—especially when multiple team members are involved. Repeatable processes should be automated. If you’re running the same sequence of operations daily or weekly, defining it once as a workflow eliminates repetitive manual work and ensures consistency.

Workflows, tasks, and jobs

Understanding the relationship between these three concepts is key to working with the system effectively.

The three layers

Workflows define what should happen—a named, versioned sequence of operations. A workflow is a blueprint, not an execution. You create a workflow once and trigger it many times. Tasks are the individual steps within a workflow. Each task calls a specific operation (like creating a materialized view or refreshing one) with specific parameters. Tasks execute sequentially in the order they’re defined. Jobs are how tasks actually execute. The workflow system is an orchestration layer built on top of the existing jobs system. When a task runs, it creates a job—the same kind of job that’s created when you run a query or refresh a view manually.

How execution flows

When you trigger a workflow, here’s what happens:

The system creates a run—a specific execution instance with its own run_id
The first task starts and creates a job in the jobs system
The orchestrator waits for that job to complete
Once the job succeeds, the next task starts and creates its own job
This continues until all tasks complete or one fails

Each task creates its own independent job. This means you can monitor execution at two levels:

Workflow level — check the overall run status to see if the pipeline succeeded or failed
Job level — inspect individual jobs for detailed timing, errors, and output

Jobs created by workflows are automatically tagged with workflow_enqueued. On the Jobs page, you can filter by the Workflow Enqueued status or search by workflow_id or workflow_run_id to distinguish workflow-generated jobs from manually triggered ones.

Task dependencies and data passing

Tasks in a workflow execute sequentially—each task waits for the previous one to complete before starting. This sequential model is simple but powerful: later tasks can reference data created by earlier ones. For example, a workflow might:

Create a materialized view called active_users
Create a second materialized view that queries company_data.active_users

The workflow guarantees that step 1 finishes (and the dataset exists) before step 2 begins. Without workflows, you’d need to poll the jobs API and build this coordination logic yourself.

Referencing earlier outputs

When a task creates a dataset (via CreateMaterializedViewIfNotExists), subsequent tasks can reference it using the fully qualified path company_data.<dataset_name>. The workflow’s sequential execution guarantees the dataset exists by the time later tasks need it.

Task output and workflow context

Beyond dataset references, tasks produce structured JSON output after execution — dataset IDs, snapshot IDs, row counts, and other metadata. This output enables more flexible data passing between tasks. The export mechanism captures task output into a workflow context ($context), an object that accumulates data as the workflow progresses. Variable expressions (${…}) then inject context values or previous task output into subsequent task parameters. For full syntax details, see the Specification Syntax reference.

Two approaches to data passing

Workflows support two ways to pass data between tasks: Dataset references — When a task creates a dataset, later tasks query it by name (company_data.<dataset_name>). This is the simplest approach and works well when tasks consume each other’s data directly via NQL. Output and context — When tasks need to pass metadata (dataset IDs, row counts, status information) rather than query data, use export and ${…} variable expressions. This is useful for logging, conditional DML operations, or any case where a downstream task needs a value produced by an upstream task. Most workflows use dataset references. Output and context become valuable when you need to pass specific values — like inserting a dataset ID into a log table or using a row count in a subsequent operation.

Error handling

Workflows use fail-fast behavior. If any task fails:

The workflow immediately stops execution
Remaining tasks are skipped
The run status becomes failed

This design prevents cascading failures—if step 1 fails to create a dataset, there’s no point in running step 2 which depends on it. There are no automatic retries. When a task fails, you need to investigate the cause, fix the underlying issue, and trigger a new run. You can inspect the failed task’s job for detailed error information.

Supported operations

Workflows currently support these operations:

Task	What it does
`CreateMaterializedViewIfNotExists`	Creates a new materialized view from an NQL query
`RefreshMaterializedView`	Refreshes an existing materialized view with the latest data
`ExecuteDml`	Executes a DML statement (INSERT, UPDATE, DELETE)
`RunModelInference`	Runs a model inference job
`CreateRosettaStoneMappingsIfNotExist`	Creates Rosetta Stone attribute mappings for a dataset

These operations cover the most common pipeline patterns: building datasets, keeping them fresh, modifying data, running AI inference, and managing schema mappings. Direct ExecuteNQL (arbitrary SELECT queries) is not supported—use CreateMaterializedViewIfNotExists to execute a query and persist the results. For full parameter details on each task, see the Task Reference.

Scheduling

Workflows can be triggered manually via the API or the platform UI, or run automatically on a schedule. Scheduled workflows use cron expressions to define when they run—hourly, daily, weekly, or on any custom cadence. Scheduling is particularly valuable for refresh pipelines. Rather than remembering to refresh your materialized views every morning, a scheduled workflow handles it automatically and ensures the correct refresh order.

Current limitations

The workflow system is designed for sequential, deterministic pipelines. It intentionally keeps things simple:

Sequential execution only — tasks run one at a time, in order. There is no parallel task execution.
No conditional logic — every task in the workflow runs. Tasks can pass data via context, but there is no if/else branching based on output values.
No loops — you cannot iterate over a list of items or repeat tasks.
No automatic retries — failed tasks must be investigated and rerun manually.

These constraints keep the system predictable and easy to reason about. For most data pipeline use cases—creating datasets, refreshing views, running DML—sequential execution with fail-fast error handling is sufficient.

Automating Multi-Step Pipelines

Step-by-step guide to creating, managing, and running workflows

Task Reference

Complete reference for all supported workflow tasks and parameters

Materialized Views

How materialized views work—the most common operation in workflows

Data Planes

The compute environment where workflow tasks execute

Overview

Core Primitives

Rosetta Stone

NQL

Data Formats

Identifiers

Architecture

Workflows

Webhooks

Data Collaboration MCP Server

Model Inference

Security

Compliance

Data Activation

Apps

Why workflows exist

Workflows, tasks, and jobs

The three layers

How execution flows

Task dependencies and data passing

Referencing earlier outputs

Task output and workflow context

Two approaches to data passing

Error handling

Supported operations

Scheduling

Current limitations

Automating Multi-Step Pipelines

Task Reference

Materialized Views

Data Planes

​Why workflows exist

​Workflows, tasks, and jobs

​The three layers

​How execution flows

​Task dependencies and data passing

​Referencing earlier outputs

​Task output and workflow context

​Two approaches to data passing

​Error handling

​Supported operations

​Scheduling

​Current limitations

​Related content

Automating Multi-Step Pipelines

Task Reference

Materialized Views

Data Planes

Why workflows exist

Workflows, tasks, and jobs

The three layers

How execution flows

Task dependencies and data passing

Referencing earlier outputs

Task output and workflow context

Two approaches to data passing

Error handling

Supported operations

Scheduling

Current limitations

Related content