Why workflows exist
Workflows solve problems that arise when data pipelines involve multiple dependent steps: Manual orchestration is error-prone. If you need to create a materialized view, wait for it to finish, then create a second view that reads from the first, doing this manually introduces risk. You might trigger the second step before the first completes, or forget a step entirely. Complex pipelines need coordination. As pipelines grow beyond two or three steps, tracking which step is running, what’s finished, and what’s next becomes increasingly difficult—especially when multiple team members are involved. Repeatable processes should be automated. If you’re running the same sequence of operations daily or weekly, defining it once as a workflow eliminates repetitive manual work and ensures consistency.Workflows, tasks, and jobs
Understanding the relationship between these three concepts is key to working with the system effectively.The three layers
Workflows define what should happen—a named, versioned sequence of operations. A workflow is a blueprint, not an execution. You create a workflow once and trigger it many times. Tasks are the individual steps within a workflow. Each task calls a specific operation (like creating a materialized view or refreshing one) with specific parameters. Tasks execute sequentially in the order they’re defined. Jobs are how tasks actually execute. The workflow system is an orchestration layer built on top of the existing jobs system. When a task runs, it creates a job—the same kind of job that’s created when you run a query or refresh a view manually.How execution flows
When you trigger a workflow, here’s what happens:- The system creates a run—a specific execution instance with its own
run_id - The first task starts and creates a job in the jobs system
- The orchestrator waits for that job to complete
- Once the job succeeds, the next task starts and creates its own job
- This continues until all tasks complete or one fails
- Workflow level — check the overall run status to see if the pipeline succeeded or failed
- Job level — inspect individual jobs for detailed timing, errors, and output
Jobs created by workflows are automatically tagged with
workflow_enqueued. This makes it easy to distinguish workflow-generated jobs from manually triggered ones when browsing job history.Task dependencies and data passing
Tasks in a workflow execute sequentially—each task waits for the previous one to complete before starting. This sequential model is simple but powerful: later tasks can reference data created by earlier ones. For example, a workflow might:- Create a materialized view called
active_users - Create a second materialized view that queries
company_data.active_users
Referencing earlier outputs
When a task creates a dataset (viaCreateMaterializedViewIfNotExists), subsequent tasks can reference it using the fully qualified path company_data.<dataset_name>. The workflow’s sequential execution guarantees the dataset exists by the time later tasks need it.
Task output and workflow context
Beyond dataset references, tasks produce structured JSON output after execution — dataset IDs, snapshot IDs, row counts, and other metadata. This output enables more flexible data passing between tasks. The export mechanism captures task output into a workflow context ($context), an object that accumulates data as the workflow progresses. Variable expressions (${…}) then inject context values or previous task output into subsequent task parameters.
For full syntax details, see the Specification Syntax reference.
Two approaches to data passing
Workflows support two ways to pass data between tasks: Dataset references — When a task creates a dataset, later tasks query it by name (company_data.<dataset_name>). This is the simplest approach and works well when tasks consume each other’s data directly via NQL.
Output and context — When tasks need to pass metadata (dataset IDs, row counts, status information) rather than query data, use export and ${…} variable expressions. This is useful for logging, conditional DML operations, or any case where a downstream task needs a value produced by an upstream task.
Most workflows use dataset references. Output and context become valuable when you need to pass specific values — like inserting a dataset ID into a log table or using a row count in a subsequent operation.
Error handling
Workflows use fail-fast behavior. If any task fails:- The workflow immediately stops execution
- Remaining tasks are skipped
- The run status becomes
failed
Supported operations
Workflows currently support three operations:| Task | What it does |
|---|---|
CreateMaterializedViewIfNotExists | Creates a new materialized view from an NQL query |
RefreshMaterializedView | Refreshes an existing materialized view with the latest data |
ExecuteDml | Executes a DML statement (INSERT, UPDATE, DELETE) |
ExecuteNQL (arbitrary SELECT queries) is not supported—use CreateMaterializedViewIfNotExists to execute a query and persist the results.
Scheduling
Workflows can be triggered manually via the API or run automatically on a schedule. Scheduled workflows use cron expressions to define when they run—hourly, daily, weekly, or on any custom cadence. Scheduling is particularly valuable for refresh pipelines. Rather than remembering to refresh your materialized views every morning, a scheduled workflow handles it automatically and ensures the correct refresh order.Current limitations
The workflow system is designed for sequential, deterministic pipelines. It intentionally keeps things simple:- Sequential execution only — tasks run one at a time, in order. There is no parallel task execution.
- No conditional logic — every task in the workflow runs. Tasks can pass data via context, but there is no if/else branching based on output values.
- No loops — you cannot iterate over a list of items or repeat tasks.
- No automatic retries — failed tasks must be investigated and rerun manually.

