Skip to main content
Each task in a workflow’s do block calls a supported task. This reference documents every available task, its parameters, output schema, and usage examples. For how tasks fit into the overall workflow specification, see Workflow Specification Syntax.

Supported tasks

CreateMaterializedViewIfNotExists

Task that creates a materialized view if it does not already exist. Parameters:
ParameterTypeRequiredDescription
nqlstringYesAn NQL CREATE MATERIALIZED VIEW statement.
computePoolIdstringNoThe compute pool ID to use for running the task. When omitted, the resolution depends on whether the task operates on an existing dataset: - If it does (e.g. RefreshMaterializedView, ExecuteDml, CreateDatasetSample), the dataset’s default compute pool is used; if the dataset has no default, the dataplane’s default compute pool is used. - If it does not (e.g. CreateMaterializedViewIfNotExists, where the dataset is being created, or RunModelInference, which is not tied to a dataset), the dataplane’s default compute pool is used directly.
Output:
FieldTypeAlways presentDescription
datasetIdintegerNoThe ID of the created or existing dataset.
createdbooleanNoWhether the materialized view was newly created by this task.
snapshotIdinteger or nullNoThe Iceberg snapshot ID of the initial refresh. Non-null only when created is true.
recalculationIdstring or nullNoThe recalculation ID, if applicable. Non-null only when created is true.
rowStatsobject or nullNoRow-level statistics produced by a refresh. ## Platform behavior - Snowflake dataplanes populate this object with real counts. - AWS dataplanes return null — row-level statistics are not yet produced for materialized-view refreshes on AWS.
Example:
document:
  dsl: 1.0.0
  namespace: analytics
  name: create-active-users-view
  version: 1.0.0
do:
  - createView:
      call: CreateMaterializedViewIfNotExists
      with:
        nql: CREATE MATERIALIZED VIEW active_users AS SELECT user_id, email, last_login FROM company_data.users WHERE is_active = true

RefreshMaterializedView

Task that triggers a refresh of an existing materialized view. Parameters:
ParameterTypeRequiredDescription
datasetIdintegerNoThe numeric id of an existing dataset.
datasetNamestringNoThe name of a dataset. Must contain only alphanumeric characters and underscores, with a maximum length of 256 characters.
computePoolIdstringNoThe compute pool ID to use for running the task. When omitted, the resolution depends on whether the task operates on an existing dataset: - If it does (e.g. RefreshMaterializedView, ExecuteDml, CreateDatasetSample), the dataset’s default compute pool is used; if the dataset has no default, the dataplane’s default compute pool is used. - If it does not (e.g. CreateMaterializedViewIfNotExists, where the dataset is being created, or RunModelInference, which is not tied to a dataset), the dataplane’s default compute pool is used directly.
Output:
FieldTypeAlways presentDescription
datasetIdintegerNoThe ID of the refreshed dataset.
snapshotIdintegerNoThe new Iceberg snapshot ID after the refresh.
recalculationIdstring or nullNoThe recalculation ID, if applicable.
rowStatsobject or nullNoRow-level statistics produced by a refresh. ## Platform behavior - Snowflake dataplanes populate this object with real counts. - AWS dataplanes return null — row-level statistics are not yet produced for materialized-view refreshes on AWS.
Example:
document:
  dsl: 1.0.0
  namespace: analytics
  name: refresh-active-users
  version: 1.0.0
do:
  - refreshView:
      call: RefreshMaterializedView
      with:
        datasetName: active_users

ExecuteDml

Task that executes a DML statement on a dataset. Parameters:
ParameterTypeRequiredDescription
nqlstringYesAn NQL DML statement. Supports INSERT, UPDATE, and DELETE.
computePoolIdstringNoThe compute pool ID to use for running the task. When omitted, the resolution depends on whether the task operates on an existing dataset: - If it does (e.g. RefreshMaterializedView, ExecuteDml, CreateDatasetSample), the dataset’s default compute pool is used; if the dataset has no default, the dataplane’s default compute pool is used. - If it does not (e.g. CreateMaterializedViewIfNotExists, where the dataset is being created, or RunModelInference, which is not tied to a dataset), the dataplane’s default compute pool is used directly.
Output:
FieldTypeAlways presentDescription
affectedRowsintegerYesTotal rows affected by the DML statement (insert + update + delete).
insertedRowsintegerYesRows inserted by the DML statement.
updatedRowsintegerYesRows updated by the DML statement.
deletedRowsintegerYesRows deleted by the DML statement.
Example:
document:
  dsl: 1.0.0
  namespace: etl
  name: insert-audit-record
  version: 1.0.0
do:
  - insertAudit:
      call: ExecuteDml
      with:
        nql: INSERT INTO company_data.audit_log (action, timestamp) VALUES ('manual_run', CURRENT_TIMESTAMP)

RunModelInference

Task that runs a model inference job. Parameters:
ParameterTypeRequiredDescription
modelenum (anthropic.claude-haiku-4.5, anthropic.claude-sonnet-4.5, anthropic.claude-opus-4.5, openai.gpt-oss-120b, openai.gpt-4.1, openai.o4-mini)YesThe narrative model ID to use for inference.
messagesarrayYesA list of messages to send to the model.
inferenceConfigobjectYesConfiguration for the model inference.
computePoolIdstringNoThe compute pool ID to use for running the task. When omitted, the resolution depends on whether the task operates on an existing dataset: - If it does (e.g. RefreshMaterializedView, ExecuteDml, CreateDatasetSample), the dataset’s default compute pool is used; if the dataset has no default, the dataplane’s default compute pool is used. - If it does not (e.g. CreateMaterializedViewIfNotExists, where the dataset is being created, or RunModelInference, which is not tied to a dataset), the dataplane’s default compute pool is used directly.
Output:
FieldTypeAlways presentDescription
structuredOutputobjectNoThe structured output from the model, conforming to the provided outputFormatSchema.
usageobjectNoToken usage information.
Example:
document:
  dsl: 1.0.0
  namespace: ml
  name: classify-records
  version: 1.0.0
do:
  - classify:
      call: RunModelInference
      with:
        model: anthropic.claude-sonnet-4.5
        messages:
          - role: user
            text: 'Classify the following record as spam or not spam: ...'
        inferenceConfig:
          outputFormatSchema:
            type: object
            properties:
              classification:
                type: string
                enum:
                  - spam
                  - not_spam
            required:
              - classification

LabelConnectedComponents

Task that runs bipartite label propagation for cross-system customer identity resolution. Finds connected components in a customer identity graph by linking customer IDs across platforms via shared identifiers. Parameters:
ParameterTypeRequiredDefaultDescription
edgeDatasetstringYesThe name of a dataset. Must contain only alphanumeric characters and underscores, with a maximum length of 256 characters.
outputDatasetstringYesThe name of a dataset. Must contain only alphanumeric characters and underscores, with a maximum length of 256 characters.
maxDegreeThresholdintegerNo100Maximum number of connections a single vertex can have before it is excluded as a “supernode.” Prevents a single overly-connected identifier from incorrectly merging thousands of unrelated customers.
maxComponentSizeintegerNo100Maximum number of members allowed in a single resolved component. Prevents runaway merges that would create implausibly large identity groups.
maxIterationsintegerNo10Upper bound on how many times the label propagation loop can run before stopping, even if not fully converged. Safety valve against infinite loops.
convergenceThresholdnumberNo0.000001Stop label propagation when the fraction of vertices that changed label in an iteration drops below this value. Must be in the range [0, 1].
sourceIdColstringYesColumn name in the edge table containing the customer ID.
sourceSystemColstringYesColumn name identifying which platform the customer ID came from.
bridgeKeyColstringYesColumn name for the shared identifier value.
bridgeKeyTypeColstringYesColumn name for the type/category of the shared identifier.
firstPartySourcesarrayNo[]Ordered list of first-party platform identifiers. Order determines priority when selecting the representative component ID.
thirdPartySourcesarrayNo[]List of third-party platform identifiers.
computePoolIdstringNoThe compute pool ID to use for running the task. When omitted, the resolution depends on whether the task operates on an existing dataset: - If it does (e.g. RefreshMaterializedView, ExecuteDml, CreateDatasetSample), the dataset’s default compute pool is used; if the dataset has no default, the dataplane’s default compute pool is used. - If it does not (e.g. CreateMaterializedViewIfNotExists, where the dataset is being created, or RunModelInference, which is not tied to a dataset), the dataplane’s default compute pool is used directly.
Output:
FieldTypeAlways presentDescription
datasetIdintegerYesThe ID of the dataset.
Example:
document:
  dsl: 1.0.0
  namespace: identity
  name: resolve-connected-components
  version: 1.0.0
do:
  - resolveIdentities:
      call: LabelConnectedComponents
      with:
        edgeDataset: edge_table
        outputDataset: connected_components_result
        maxDegreeThreshold: 500
        maxComponentSize: 10000
        maxIterations: 50
        sourceIdCol: customer_id
        sourceSystemCol: source_system
        bridgeKeyCol: bridge_key
        bridgeKeyTypeCol: bridge_key_type
        firstPartySources:
          - AFTERPAY
          - CASHAPP
          - SQUARE
        thirdPartySources:
          - EXPERIAN
          - ACXIOM
        computePoolId: 11111111-1111-1111-1111-111111111111

CreateRosettaStoneMappingsIfNotExist

Task that creates Rosetta Stone attribute mappings for a dataset. Parameters:
ParameterTypeRequiredDefaultDescription
datasetIdintegerNoThe numeric id of an existing dataset.
datasetNamestringNoThe name of a dataset. Must contain only alphanumeric characters and underscores, with a maximum length of 256 characters.
mappingsarrayYesA list of mapping definitions to create.
allowPartialbooleanNotrueWhen true, individual mapping failures don’t prevent other valid mappings from being created. When false, any single failure causes the entire operation to fail.
Output:
FieldTypeAlways presentDescription
createdMappingsarrayYesMappings that were successfully created.
failedMappingsarrayYesMappings that failed to create.
conflictMappingsarrayYesMappings skipped because an identical mapping already exists.
Example:
document:
  dsl: 1.0.0
  namespace: etl
  name: map-identity-seed
  version: 1.0.0
do:
  - mapToRosettaStone:
      call: CreateRosettaStoneMappingsIfNotExist
      with:
        datasetName: identity_seed
        mappings:
          - attributeId: 92
            mapping:
              type: object_mapping
              propertyMappings:
                - path: value
                  expression: SHA2(NORMALIZE_EMAIL(email), 256)
                - path: type
                  expression: '''sha256_email'''
          - attributeId: 50
            mapping:
              type: value_mapping
              expression: country_code

CreateDatasetSample

Task that generates a sample for a dataset. Parameters:
ParameterTypeRequiredDescription
datasetIdintegerNoThe numeric id of an existing dataset.
datasetNamestringNoThe name of a dataset. Must contain only alphanumeric characters and underscores, with a maximum length of 256 characters.
computePoolIdstringNoThe compute pool ID to use for running the task. When omitted, the resolution depends on whether the task operates on an existing dataset: - If it does (e.g. RefreshMaterializedView, ExecuteDml, CreateDatasetSample), the dataset’s default compute pool is used; if the dataset has no default, the dataplane’s default compute pool is used. - If it does not (e.g. CreateMaterializedViewIfNotExists, where the dataset is being created, or RunModelInference, which is not tied to a dataset), the dataplane’s default compute pool is used directly.
Output:
FieldTypeAlways presentDescription
datasetIdintegerYesThe id of the dataset whose sample was generated.
rowCountintegerYesThe number of rows captured in the sample.
Example:
document:
  dsl: 1.0.0
  namespace: analytics
  name: create-dataset-sample-after-refresh
  version: 1.0.0
do:
  - refreshView:
      call: RefreshMaterializedView
      with:
        datasetName: active_users
  - createDatasetSample:
      call: CreateDatasetSample
      with:
        datasetName: active_users

RecalculateStatistics

Task that triggers a recalculation of a dataset’s column statistics and waits for it to complete. Parameters:
ParameterTypeRequiredDescription
datasetIdintegerNoThe numeric id of an existing dataset.
datasetNamestringNoThe name of a dataset. Must contain only alphanumeric characters and underscores, with a maximum length of 256 characters.
computePoolIdstringNoThe compute pool ID to use for running the task. When omitted, the resolution depends on whether the task operates on an existing dataset: - If it does (e.g. RefreshMaterializedView, ExecuteDml, CreateDatasetSample), the dataset’s default compute pool is used; if the dataset has no default, the dataplane’s default compute pool is used. - If it does not (e.g. CreateMaterializedViewIfNotExists, where the dataset is being created, or RunModelInference, which is not tied to a dataset), the dataplane’s default compute pool is used directly.
Output:
FieldTypeAlways presentDescription
totalRowsintegerYesThe total number of rows the statistics were calculated over.
columnCountintegerYesThe number of columns the statistics were calculated for.
Example:
document:
  dsl: 1.0.0
  namespace: analytics
  name: recalculate-statistics-after-load
  version: 1.0.0
do:
  - loadData:
      call: ExecuteDml
      with:
        nql: INSERT INTO company_data.active_users (user_id, email) SELECT user_id, email FROM company_data.users WHERE is_active = true
  - recalculateStats:
      call: RecalculateStatistics
      with:
        datasetName: active_users

Workflow Specification Syntax

Full specification format for document, schedule, and task blocks

Automating Multi-Step Pipelines

Step-by-step guide to creating and running workflows

Materialized Views

How materialized views work

Workflows API

REST API endpoints for managing workflows