Skip to main content
A workflow is a declarative pipeline that executes multiple data operations in sequence. Instead of manually triggering each step and waiting for it to finish before starting the next, you define the full pipeline once and let the system handle orchestration.

Why workflows exist

Workflows solve problems that arise when data pipelines involve multiple dependent steps: Manual orchestration is error-prone. If you need to create a materialized view, wait for it to finish, then create a second view that reads from the first, doing this manually introduces risk. You might trigger the second step before the first completes, or forget a step entirely. Complex pipelines need coordination. As pipelines grow beyond two or three steps, tracking which step is running, what’s finished, and what’s next becomes increasingly difficult—especially when multiple team members are involved. Repeatable processes should be automated. If you’re running the same sequence of operations daily or weekly, defining it once as a workflow eliminates repetitive manual work and ensures consistency.

Workflows, tasks, and jobs

Understanding the relationship between these three concepts is key to working with the system effectively.

The three layers

Workflows define what should happen—a named, versioned sequence of operations. A workflow is a blueprint, not an execution. You create a workflow once and trigger it many times. Tasks are the individual steps within a workflow. Each task calls a specific operation (like creating a materialized view or refreshing one) with specific parameters. Tasks execute sequentially in the order they’re defined. Jobs are how tasks actually execute. The workflow system is an orchestration layer built on top of the existing jobs system. When a task runs, it creates a job—the same kind of job that’s created when you run a query or refresh a view manually.

How execution flows

When you trigger a workflow, here’s what happens:
  1. The system creates a run—a specific execution instance with its own run_id
  2. The first task starts and creates a job in the jobs system
  3. The orchestrator waits for that job to complete
  4. Once the job succeeds, the next task starts and creates its own job
  5. This continues until all tasks complete or one fails
Each task creates its own independent job. This means you can monitor execution at two levels:
  • Workflow level — check the overall run status to see if the pipeline succeeded or failed
  • Job level — inspect individual jobs for detailed timing, errors, and output
Jobs created by workflows are automatically tagged with workflow_enqueued. This makes it easy to distinguish workflow-generated jobs from manually triggered ones when browsing job history.

Task dependencies and data passing

Tasks in a workflow execute sequentially—each task waits for the previous one to complete before starting. This sequential model is simple but powerful: later tasks can reference data created by earlier ones. For example, a workflow might:
  1. Create a materialized view called active_users
  2. Create a second materialized view that queries company_data.active_users
The workflow guarantees that step 1 finishes (and the dataset exists) before step 2 begins. Without workflows, you’d need to poll the jobs API and build this coordination logic yourself.

Referencing earlier outputs

When a task creates a dataset (via CreateMaterializedViewIfNotExists), subsequent tasks can reference it using the fully qualified path company_data.<dataset_name>. The workflow’s sequential execution guarantees the dataset exists by the time later tasks need it.

Task output and workflow context

Beyond dataset references, tasks produce structured JSON output after execution — dataset IDs, snapshot IDs, row counts, and other metadata. This output enables more flexible data passing between tasks. The export mechanism captures task output into a workflow context ($context), an object that accumulates data as the workflow progresses. Variable expressions (${…}) then inject context values or previous task output into subsequent task parameters. For full syntax details, see the Specification Syntax reference.

Two approaches to data passing

Workflows support two ways to pass data between tasks: Dataset references — When a task creates a dataset, later tasks query it by name (company_data.<dataset_name>). This is the simplest approach and works well when tasks consume each other’s data directly via NQL. Output and context — When tasks need to pass metadata (dataset IDs, row counts, status information) rather than query data, use export and ${…} variable expressions. This is useful for logging, conditional DML operations, or any case where a downstream task needs a value produced by an upstream task. Most workflows use dataset references. Output and context become valuable when you need to pass specific values — like inserting a dataset ID into a log table or using a row count in a subsequent operation.

Error handling

Workflows use fail-fast behavior. If any task fails:
  1. The workflow immediately stops execution
  2. Remaining tasks are skipped
  3. The run status becomes failed
This design prevents cascading failures—if step 1 fails to create a dataset, there’s no point in running step 2 which depends on it. There are no automatic retries. When a task fails, you need to investigate the cause, fix the underlying issue, and trigger a new run. You can inspect the failed task’s job for detailed error information.

Supported operations

Workflows currently support three operations:
TaskWhat it does
CreateMaterializedViewIfNotExistsCreates a new materialized view from an NQL query
RefreshMaterializedViewRefreshes an existing materialized view with the latest data
ExecuteDmlExecutes a DML statement (INSERT, UPDATE, DELETE)
These operations cover the most common pipeline patterns: building datasets, keeping them fresh, and modifying data. Direct ExecuteNQL (arbitrary SELECT queries) is not supported—use CreateMaterializedViewIfNotExists to execute a query and persist the results.

Scheduling

Workflows can be triggered manually via the API or run automatically on a schedule. Scheduled workflows use cron expressions to define when they run—hourly, daily, weekly, or on any custom cadence. Scheduling is particularly valuable for refresh pipelines. Rather than remembering to refresh your materialized views every morning, a scheduled workflow handles it automatically and ensures the correct refresh order.

Current limitations

The workflow system is designed for sequential, deterministic pipelines. It intentionally keeps things simple:
  • Sequential execution only — tasks run one at a time, in order. There is no parallel task execution.
  • No conditional logic — every task in the workflow runs. Tasks can pass data via context, but there is no if/else branching based on output values.
  • No loops — you cannot iterate over a list of items or repeat tasks.
  • No automatic retries — failed tasks must be investigated and rerun manually.
These constraints keep the system predictable and easy to reason about. For most data pipeline use cases—creating datasets, refreshing views, running DML—sequential execution with fail-fast error handling is sufficient.