Task Reference - Narrative I/O Knowledge Base

Each task in a workflow’s do block calls a supported task. This reference documents every available task, its parameters, output schema, and usage examples. For how tasks fit into the overall workflow specification, see Workflow Specification Syntax.

Supported tasks

CreateMaterializedViewIfNotExists

Task that creates a materialized view if it does not already exist. Parameters:

Parameter	Type	Required	Description
`nql`	string	Yes	An NQL `CREATE MATERIALIZED VIEW` statement.
`computePoolId`	string	No	The compute pool ID to use for running the task. When omitted, the resolution depends on whether the task operates on an existing dataset: - If it does (e.g. `RefreshMaterializedView`, `ExecuteDml`, `CreateDatasetSample`), the dataset’s default compute pool is used; if the dataset has no default, the dataplane’s default compute pool is used. - If it does not (e.g. `CreateMaterializedViewIfNotExists`, where the dataset is being created, or `RunModelInference`, which is not tied to a dataset), the dataplane’s default compute pool is used directly.

Output:

Field	Type	Always present	Description
`datasetId`	integer	No	The ID of the created or existing dataset.
`created`	boolean	No	Whether the materialized view was newly created by this task.
`snapshotId`	integer or null	No	The Iceberg snapshot ID of the initial refresh. Non-null only when `created` is `true`.
`recalculationId`	string or null	No	The recalculation ID, if applicable. Non-null only when `created` is `true`.
`rowStats`	object or null	No	Row-level statistics produced by a refresh. ## Platform behavior - Snowflake dataplanes populate this object with real counts. - AWS dataplanes return `null` — row-level statistics are not yet produced for materialized-view refreshes on AWS.

Example:

document:
  dsl: 1.0.0
  namespace: analytics
  name: create-active-users-view
  version: 1.0.0
do:
  - createView:
      call: CreateMaterializedViewIfNotExists
      with:
        nql: CREATE MATERIALIZED VIEW active_users AS SELECT user_id, email, last_login FROM company_data.users WHERE is_active = true

RefreshMaterializedView

Task that triggers a refresh of an existing materialized view. Parameters:

Parameter	Type	Required	Description
`datasetId`	integer	No	The numeric id of an existing dataset.
`datasetName`	string	No	The name of a dataset. Must contain only alphanumeric characters and underscores, with a maximum length of 256 characters.
`computePoolId`	string	No	The compute pool ID to use for running the task. When omitted, the resolution depends on whether the task operates on an existing dataset: - If it does (e.g. `RefreshMaterializedView`, `ExecuteDml`, `CreateDatasetSample`), the dataset’s default compute pool is used; if the dataset has no default, the dataplane’s default compute pool is used. - If it does not (e.g. `CreateMaterializedViewIfNotExists`, where the dataset is being created, or `RunModelInference`, which is not tied to a dataset), the dataplane’s default compute pool is used directly.

Output:

Field	Type	Always present	Description
`datasetId`	integer	No	The ID of the refreshed dataset.
`snapshotId`	integer	No	The new Iceberg snapshot ID after the refresh.
`recalculationId`	string or null	No	The recalculation ID, if applicable.
`rowStats`	object or null	No	Row-level statistics produced by a refresh. ## Platform behavior - Snowflake dataplanes populate this object with real counts. - AWS dataplanes return `null` — row-level statistics are not yet produced for materialized-view refreshes on AWS.

Example:

document:
  dsl: 1.0.0
  namespace: analytics
  name: refresh-active-users
  version: 1.0.0
do:
  - refreshView:
      call: RefreshMaterializedView
      with:
        datasetName: active_users

ExecuteDml

Task that executes a DML statement on a dataset. Parameters:

Parameter	Type	Required	Description
`nql`	string	Yes	An NQL DML statement. Supports `INSERT`, `UPDATE`, and `DELETE`.
`computePoolId`	string	No	The compute pool ID to use for running the task. When omitted, the resolution depends on whether the task operates on an existing dataset: - If it does (e.g. `RefreshMaterializedView`, `ExecuteDml`, `CreateDatasetSample`), the dataset’s default compute pool is used; if the dataset has no default, the dataplane’s default compute pool is used. - If it does not (e.g. `CreateMaterializedViewIfNotExists`, where the dataset is being created, or `RunModelInference`, which is not tied to a dataset), the dataplane’s default compute pool is used directly.

Output:

Field	Type	Always present	Description
`affectedRows`	integer	Yes	Total rows affected by the DML statement (insert + update + delete).
`insertedRows`	integer	Yes	Rows inserted by the DML statement.
`updatedRows`	integer	Yes	Rows updated by the DML statement.
`deletedRows`	integer	Yes	Rows deleted by the DML statement.

Example:

document:
  dsl: 1.0.0
  namespace: etl
  name: insert-audit-record
  version: 1.0.0
do:
  - insertAudit:
      call: ExecuteDml
      with:
        nql: INSERT INTO company_data.audit_log (action, timestamp) VALUES ('manual_run', CURRENT_TIMESTAMP)

RunModelInference

Task that runs a model inference job. Parameters:

Parameter	Type	Required	Description
`model`	enum (`anthropic.claude-haiku-4.5`, `anthropic.claude-sonnet-4.5`, `anthropic.claude-opus-4.5`, `openai.gpt-oss-120b`, `openai.gpt-4.1`, `openai.o4-mini`)	Yes	The narrative model ID to use for inference.
`messages`	array	Yes	A list of messages to send to the model.
`inferenceConfig`	object	Yes	Configuration for the model inference.
`computePoolId`	string	No	The compute pool ID to use for running the task. When omitted, the resolution depends on whether the task operates on an existing dataset: - If it does (e.g. `RefreshMaterializedView`, `ExecuteDml`, `CreateDatasetSample`), the dataset’s default compute pool is used; if the dataset has no default, the dataplane’s default compute pool is used. - If it does not (e.g. `CreateMaterializedViewIfNotExists`, where the dataset is being created, or `RunModelInference`, which is not tied to a dataset), the dataplane’s default compute pool is used directly.

Output:

Field	Type	Always present	Description
`structuredOutput`	object	No	The structured output from the model, conforming to the provided outputFormatSchema.
`usage`	object	No	Token usage information.

Example:

document:
  dsl: 1.0.0
  namespace: ml
  name: classify-records
  version: 1.0.0
do:
  - classify:
      call: RunModelInference
      with:
        model: anthropic.claude-sonnet-4.5
        messages:
          - role: user
            text: 'Classify the following record as spam or not spam: ...'
        inferenceConfig:
          outputFormatSchema:
            type: object
            properties:
              classification:
                type: string
                enum:
                  - spam
                  - not_spam
            required:
              - classification

LabelConnectedComponents

Task that runs bipartite label propagation for cross-system customer identity resolution. Finds connected components in a customer identity graph by linking customer IDs across platforms via shared identifiers. Parameters:

Parameter	Type	Required	Default	Description
`edgeDataset`	string	Yes	—	The name of a dataset. Must contain only alphanumeric characters and underscores, with a maximum length of 256 characters.
`outputDataset`	string	Yes	—	The name of a dataset. Must contain only alphanumeric characters and underscores, with a maximum length of 256 characters.
`maxDegreeThreshold`	integer	No	`100`	Maximum number of connections a single vertex can have before it is excluded as a “supernode.” Prevents a single overly-connected identifier from incorrectly merging thousands of unrelated customers.
`maxComponentSize`	integer	No	`100`	Maximum number of members allowed in a single resolved component. Prevents runaway merges that would create implausibly large identity groups.
`maxIterations`	integer	No	`10`	Upper bound on how many times the label propagation loop can run before stopping, even if not fully converged. Safety valve against infinite loops.
`convergenceThreshold`	number	No	`0.000001`	Stop label propagation when the fraction of vertices that changed label in an iteration drops below this value. Must be in the range `[0, 1]`.
`sourceIdCol`	string	Yes	—	Column name in the edge table containing the customer ID.
`sourceSystemCol`	string	Yes	—	Column name identifying which platform the customer ID came from.
`bridgeKeyCol`	string	Yes	—	Column name for the shared identifier value.
`bridgeKeyTypeCol`	string	Yes	—	Column name for the type/category of the shared identifier.
`firstPartySources`	array	No	`[]`	Ordered list of first-party platform identifiers. Order determines priority when selecting the representative component ID.
`thirdPartySources`	array	No	`[]`	List of third-party platform identifiers.
`computePoolId`	string	No	—	The compute pool ID to use for running the task. When omitted, the resolution depends on whether the task operates on an existing dataset: - If it does (e.g. `RefreshMaterializedView`, `ExecuteDml`, `CreateDatasetSample`), the dataset’s default compute pool is used; if the dataset has no default, the dataplane’s default compute pool is used. - If it does not (e.g. `CreateMaterializedViewIfNotExists`, where the dataset is being created, or `RunModelInference`, which is not tied to a dataset), the dataplane’s default compute pool is used directly.

Output:

Field	Type	Always present	Description
`datasetId`	integer	Yes	The ID of the dataset.

Example:

document:
  dsl: 1.0.0
  namespace: identity
  name: resolve-connected-components
  version: 1.0.0
do:
  - resolveIdentities:
      call: LabelConnectedComponents
      with:
        edgeDataset: edge_table
        outputDataset: connected_components_result
        maxDegreeThreshold: 500
        maxComponentSize: 10000
        maxIterations: 50
        sourceIdCol: customer_id
        sourceSystemCol: source_system
        bridgeKeyCol: bridge_key
        bridgeKeyTypeCol: bridge_key_type
        firstPartySources:
          - AFTERPAY
          - CASHAPP
          - SQUARE
        thirdPartySources:
          - EXPERIAN
          - ACXIOM
        computePoolId: 11111111-1111-1111-1111-111111111111

CreateRosettaStoneMappingsIfNotExist

Task that creates Rosetta Stone attribute mappings for a dataset. Parameters:

Parameter	Type	Required	Default	Description
`datasetId`	integer	No	—	The numeric id of an existing dataset.
`datasetName`	string	No	—	The name of a dataset. Must contain only alphanumeric characters and underscores, with a maximum length of 256 characters.
`mappings`	array	Yes	—	A list of mapping definitions to create.
`allowPartial`	boolean	No	`true`	When `true`, individual mapping failures don’t prevent other valid mappings from being created. When `false`, any single failure causes the entire operation to fail.

Output:

Field	Type	Always present	Description
`createdMappings`	array	Yes	Mappings that were successfully created.
`failedMappings`	array	Yes	Mappings that failed to create.
`conflictMappings`	array	Yes	Mappings skipped because an identical mapping already exists.

Example:

document:
  dsl: 1.0.0
  namespace: etl
  name: map-identity-seed
  version: 1.0.0
do:
  - mapToRosettaStone:
      call: CreateRosettaStoneMappingsIfNotExist
      with:
        datasetName: identity_seed
        mappings:
          - attributeId: 92
            mapping:
              type: object_mapping
              propertyMappings:
                - path: value
                  expression: SHA2(NORMALIZE_EMAIL(email), 256)
                - path: type
                  expression: '''sha256_email'''
          - attributeId: 50
            mapping:
              type: value_mapping
              expression: country_code

CreateDatasetSample

Task that generates a sample for a dataset. Parameters:

Parameter	Type	Required	Description
`datasetId`	integer	No	The numeric id of an existing dataset.
`datasetName`	string	No	The name of a dataset. Must contain only alphanumeric characters and underscores, with a maximum length of 256 characters.
`computePoolId`	string	No	The compute pool ID to use for running the task. When omitted, the resolution depends on whether the task operates on an existing dataset: - If it does (e.g. `RefreshMaterializedView`, `ExecuteDml`, `CreateDatasetSample`), the dataset’s default compute pool is used; if the dataset has no default, the dataplane’s default compute pool is used. - If it does not (e.g. `CreateMaterializedViewIfNotExists`, where the dataset is being created, or `RunModelInference`, which is not tied to a dataset), the dataplane’s default compute pool is used directly.

Output:

Field	Type	Always present	Description
`datasetId`	integer	Yes	The id of the dataset whose sample was generated.
`rowCount`	integer	Yes	The number of rows captured in the sample.

Example:

document:
  dsl: 1.0.0
  namespace: analytics
  name: create-dataset-sample-after-refresh
  version: 1.0.0
do:
  - refreshView:
      call: RefreshMaterializedView
      with:
        datasetName: active_users
  - createDatasetSample:
      call: CreateDatasetSample
      with:
        datasetName: active_users

RecalculateStatistics

Task that triggers a recalculation of a dataset’s column statistics and waits for it to complete. Parameters:

Parameter	Type	Required	Description
`datasetId`	integer	No	The numeric id of an existing dataset.
`datasetName`	string	No	The name of a dataset. Must contain only alphanumeric characters and underscores, with a maximum length of 256 characters.
`computePoolId`	string	No	The compute pool ID to use for running the task. When omitted, the resolution depends on whether the task operates on an existing dataset: - If it does (e.g. `RefreshMaterializedView`, `ExecuteDml`, `CreateDatasetSample`), the dataset’s default compute pool is used; if the dataset has no default, the dataplane’s default compute pool is used. - If it does not (e.g. `CreateMaterializedViewIfNotExists`, where the dataset is being created, or `RunModelInference`, which is not tied to a dataset), the dataplane’s default compute pool is used directly.

Output:

Field	Type	Always present	Description
`totalRows`	integer	Yes	The total number of rows the statistics were calculated over.
`columnCount`	integer	Yes	The number of columns the statistics were calculated for.

Example:

document:
  dsl: 1.0.0
  namespace: analytics
  name: recalculate-statistics-after-load
  version: 1.0.0
do:
  - loadData:
      call: ExecuteDml
      with:
        nql: INSERT INTO company_data.active_users (user_id, email) SELECT user_id, email FROM company_data.users WHERE is_active = true
  - recalculateStats:
      call: RecalculateStatistics
      with:
        datasetName: active_users

Workflow Specification Syntax

Full specification format for document, schedule, and task blocks

Automating Multi-Step Pipelines

Step-by-step guide to creating and running workflows

Materialized Views

How materialized views work

Workflows API

REST API endpoints for managing workflows

​Supported tasks

​CreateMaterializedViewIfNotExists

​RefreshMaterializedView

​ExecuteDml

​RunModelInference

​LabelConnectedComponents

​CreateRosettaStoneMappingsIfNotExist

​CreateDatasetSample

​RecalculateStatistics

​Related content

Workflow Specification Syntax

Automating Multi-Step Pipelines

Materialized Views

Workflows API

Supported tasks

CreateMaterializedViewIfNotExists

RefreshMaterializedView

ExecuteDml

RunModelInference

LabelConnectedComponents

CreateRosettaStoneMappingsIfNotExist

CreateDatasetSample

RecalculateStatistics

Related content