Data Enrichment with AI_COMPLETE

This cookbook demonstrates how to use the AI_COMPLETE function to enrich records with AI-generated structured attributes directly within NQL. The entire inference pipeline runs inside your Snowflake data plane—your data never leaves your infrastructure.

AI_COMPLETE is available on Snowflake data planes only.

What you will build

A materialized view that takes raw customer support tickets and enriches each record with:

A standardized issue category
A priority level
A sentiment score
Extracted product mentions

The same pattern applies to any enrichment task: product classification, lead scoring, content tagging, entity extraction, and more.

Prerequisites

A Snowflake data plane with AI model access configured (see setup below)
A dataset with text content to enrich
Familiarity with NQL syntax and materialized views

One-time Snowflake setup

AI_COMPLETE runs through Snowflake Cortex, so the customer Snowflake account must grant Cortex access to the Narrative application and allow cross-region model invocation. A Snowflake account admin runs these statements once:

-- Allow Cortex to route requests to models hosted in any region
ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = 'ANY_REGION';

-- Grant the Cortex role to the Narrative application
GRANT DATABASE ROLE snowflake.cortex_user TO APPLICATION <narrative_application_name>;

Replace <narrative_application_name> with the name of the Narrative Snowflake Native App installed in the account (for example, NARRATIVE_DATA_COLLABORATION_HMI).

The GRANT DATABASE ROLE ... TO APPLICATION statement is a temporary workaround required by current Snowflake Native App permissioning. Snowflake may change how Native Apps access Cortex in the future, at which point this setup step may change. Contact Narrative support if AI_COMPLETE queries begin failing with permission errors.

CORTEX_ENABLED_CROSS_REGION = 'ANY_REGION' lets Cortex invoke models that are hosted outside the account’s home region. Narrative recommends ANY_REGION because model availability varies by region. If your compliance posture requires region pinning, consult Snowflake’s cross-region inference documentation for the available alternatives.

To confirm the grant succeeded, run a minimal query with a literal prompt:

CREATE MATERIALIZED VIEW "test_ai_complete" AS
WITH test_prompt AS (SELECT 'respond with ok' AS prompt)
SELECT
  AI_COMPLETE(
    'openai-gpt-5-nano',
    test_prompt.prompt,
    '{}',
    '{"type": "json", "schema": {"type": "object", "properties": {"reply": {"type": "string"}}, "required": ["reply"], "additionalProperties": false}}',
    TRUE
  ) AS model_response
FROM test_prompt

If this query fails with a privileges error, re-run the grant statements above.

Step 1: Define the output schema

Before writing NQL, design the JSON Schema that describes the structured attributes you want the model to produce. This schema constrains the model’s output, guaranteeing predictable, typed results.

{
  "type": "json",
  "schema": {
    "type": "object",
    "properties": {
      "category": {
        "type": "string",
        "description": "The primary issue category",
        "enum": [
          "billing",
          "technical",
          "account_access",
          "feature_request",
          "general_inquiry"
        ]
      },
      "priority": {
        "type": "string",
        "description": "Urgency level based on the issue description",
        "enum": ["low", "medium", "high", "critical"]
      },
      "sentiment": {
        "type": "number",
        "description": "Customer sentiment from -1 (very negative) to 1 (very positive)"
      },
      "product_mentioned": {
        "type": "string",
        "description": "The specific product or feature referenced, or 'none' if not applicable"
      }
    },
    "required": ["category", "priority", "sentiment", "product_mentioned"],
    "additionalProperties": false
  }
}

Key schema design decisions:

Decision	Why
`enum` on `category` and `priority`	Restricts output to known values, preventing free-form drift
`number` for `sentiment`	Enables numeric filtering and aggregation downstream
`additionalProperties: false`	Prevents the model from adding unexpected fields
`description` on each field	Guides the model’s interpretation of what each field means

Keep schemas focused on the fields your downstream queries will actually use. Every additional field increases token usage and inference cost.

Step 2: Build prompt strings in a CTE

AI_COMPLETE requires its prompt parameter to be a column reference, not a string literal. Use a Common Table Expression (CTE) to assemble prompts from your source columns.

WITH prompts AS (
  SELECT
    company_data."support_tickets".ticket_id,
    company_data."support_tickets".customer_id,
    company_data."support_tickets".created_at,
    'Analyze the following customer support ticket and extract structured attributes.'
      || ' Ticket subject: ' || company_data."support_tickets".subject
      || ' Ticket body: ' || company_data."support_tickets".body
    AS prompt_text
  FROM company_data."support_tickets"
  WHERE company_data."support_tickets".created_at > CURRENT_TIMESTAMP - INTERVAL '7' DAY
)
SELECT prompts.ticket_id, prompts.prompt_text
FROM prompts

Tips for building effective prompts:

Prefix each field with a label (e.g., ' Ticket subject: ') so the model can distinguish between data elements
Filter rows before calling AI_COMPLETE to avoid unnecessary inference costs
Use COALESCE to handle nulls: COALESCE(company_data."support_tickets".subject, 'No subject')

Step 3: Call AI_COMPLETE

Pass the prompt column to AI_COMPLETE along with your schema. The response is a JSON string that you parse in the next step.

WITH prompts AS (
  SELECT
    company_data."support_tickets".ticket_id,
    company_data."support_tickets".customer_id,
    company_data."support_tickets".created_at,
    'Analyze the following customer support ticket and extract structured attributes.'
      || ' Ticket subject: ' || company_data."support_tickets".subject
      || ' Ticket body: ' || company_data."support_tickets".body
    AS prompt_text
  FROM company_data."support_tickets"
  WHERE company_data."support_tickets".created_at > CURRENT_TIMESTAMP - INTERVAL '7' DAY
),
enriched AS (
  SELECT
    prompts.ticket_id,
    prompts.customer_id,
    prompts.created_at,
    AI_COMPLETE(
      'openai-gpt-5',
      prompts.prompt_text,
      '{"temperature": 0}',
      '{"type": "json", "schema": {"type": "object", "properties": {"category": {"type": "string", "description": "The primary issue category", "enum": ["billing", "technical", "account_access", "feature_request", "general_inquiry"]}, "priority": {"type": "string", "description": "Urgency level", "enum": ["low", "medium", "high", "critical"]}, "sentiment": {"type": "number", "description": "Sentiment from -1 (very negative) to 1 (very positive)"}, "product_mentioned": {"type": "string", "description": "Product or feature referenced, or none"}}, "required": ["category", "priority", "sentiment", "product_mentioned"], "additionalProperties": false}}',
      TRUE
    ) AS ai_response
  FROM prompts
)
SELECT enriched.ticket_id, enriched.ai_response
FROM enriched

Parameter choices:

Parameter	Value	Reason
`model`	`'openai-gpt-5'`	Choose the model that best fits your accuracy and cost requirements
`model_parameters`	`'{"temperature": 0}'`	Temperature 0 produces deterministic, consistent classifications
`show_details`	`TRUE`	Returns the full `structured_output` array for reliable JSON parsing

Step 4: Parse the response into columns

Use PARSE_JSON and path notation to extract individual fields from the AI response into typed columns.

SELECT
  enriched.ticket_id,
  enriched.customer_id,
  enriched.created_at,
  CAST(
    PARSE_JSON(enriched.ai_response)['structured_output'][0]['raw_message']['category'] AS STRING
  ) AS category,
  CAST(
    PARSE_JSON(enriched.ai_response)['structured_output'][0]['raw_message']['priority'] AS STRING
  ) AS priority,
  CAST(
    PARSE_JSON(enriched.ai_response)['structured_output'][0]['raw_message']['sentiment'] AS DOUBLE
  ) AS sentiment,
  CAST(
    PARSE_JSON(enriched.ai_response)['structured_output'][0]['raw_message']['product_mentioned'] AS STRING
  ) AS product_mentioned
FROM enriched

The path ['structured_output'][0]['raw_message'] navigates to the model’s schema-validated output. Each field within raw_message corresponds to a property in your response_format schema.

Complete example

Here is the full query wrapped as a materialized view that refreshes weekly:

CREATE MATERIALIZED VIEW "support_tickets_enriched"
DISPLAY_NAME = 'Support Tickets - AI Enriched'
DESCRIPTION = 'Tickets enriched with AI-classified category, priority, sentiment, and product mentions'
TAGS = ('ai_enrichment', 'support', 'classification')
WRITE_MODE = 'overwrite'
REFRESH_SCHEDULE = '@weekly'
AS
WITH prompts AS (
  SELECT
    company_data."support_tickets".ticket_id,
    company_data."support_tickets".customer_id,
    company_data."support_tickets".created_at,
    'Analyze the following customer support ticket and extract structured attributes.'
      || ' Ticket subject: ' || COALESCE(company_data."support_tickets".subject, 'No subject')
      || ' Ticket body: ' || COALESCE(company_data."support_tickets".body, 'No body')
    AS prompt_text
  FROM company_data."support_tickets"
  WHERE company_data."support_tickets".created_at > CURRENT_TIMESTAMP - INTERVAL '30' DAY
),
ai_results AS (
  SELECT
    prompts.ticket_id,
    prompts.customer_id,
    prompts.created_at,
    AI_COMPLETE(
      'openai-gpt-5',
      prompts.prompt_text,
      '{"temperature": 0}',
      '{"type": "json", "schema": {"type": "object", "properties": {"category": {"type": "string", "description": "The primary issue category", "enum": ["billing", "technical", "account_access", "feature_request", "general_inquiry"]}, "priority": {"type": "string", "description": "Urgency level", "enum": ["low", "medium", "high", "critical"]}, "sentiment": {"type": "number", "description": "Sentiment from -1 (very negative) to 1 (very positive)"}, "product_mentioned": {"type": "string", "description": "Product or feature referenced, or none"}}, "required": ["category", "priority", "sentiment", "product_mentioned"], "additionalProperties": false}}',
      TRUE
    ) AS ai_response
  FROM prompts
)
SELECT
  ai_results.ticket_id,
  ai_results.customer_id,
  ai_results.created_at,
  CAST(
    PARSE_JSON(ai_results.ai_response)['structured_output'][0]['raw_message']['category'] AS STRING
  ) AS category,
  CAST(
    PARSE_JSON(ai_results.ai_response)['structured_output'][0]['raw_message']['priority'] AS STRING
  ) AS priority,
  CAST(
    PARSE_JSON(ai_results.ai_response)['structured_output'][0]['raw_message']['sentiment'] AS DOUBLE
  ) AS sentiment,
  CAST(
    PARSE_JSON(ai_results.ai_response)['structured_output'][0]['raw_message']['product_mentioned'] AS STRING
  ) AS product_mentioned
FROM ai_results

Adapting this pattern

The CTE → AI_COMPLETE → PARSE_JSON pattern works for any enrichment task. Here are variations for common scenarios.

Product classification

Classify products into a taxonomy from their name and description:

WITH product_prompts AS (
  SELECT
    company_data."products".product_id,
    'Classify this product into a category.'
      || ' Brand: ' || COALESCE(company_data."products".brand_name, 'Unknown')
      || ' Product name: ' || company_data."products".product_name
      || ' Description: ' || COALESCE(company_data."products".description, 'No description')
    AS prompt_text
  FROM company_data."products"
),
classified AS (
  SELECT
    product_prompts.product_id,
    AI_COMPLETE(
      'openai-gpt-5',
      product_prompts.prompt_text,
      '{"temperature": 0}',
      '{"type": "json", "schema": {"type": "object", "properties": {"primary_category": {"type": "string", "description": "Top-level product category"}, "sub_category": {"type": "string", "description": "More specific product category"}, "confidence": {"type": "number", "description": "Classification confidence from 0 to 1"}}, "required": ["primary_category", "sub_category", "confidence"], "additionalProperties": false}}',
      TRUE
    ) AS ai_response
  FROM product_prompts
)
SELECT
  classified.product_id,
  CAST(
    PARSE_JSON(classified.ai_response)['structured_output'][0]['raw_message']['primary_category'] AS STRING
  ) AS primary_category,
  CAST(
    PARSE_JSON(classified.ai_response)['structured_output'][0]['raw_message']['sub_category'] AS STRING
  ) AS sub_category,
  CAST(
    PARSE_JSON(classified.ai_response)['structured_output'][0]['raw_message']['confidence'] AS DOUBLE
  ) AS confidence
FROM classified

Entity extraction

Extract structured entities from free-text fields:

WITH text_prompts AS (
  SELECT
    company_data."articles".article_id,
    'Extract key entities from this text.'
      || ' Text: ' || company_data."articles".body
    AS prompt_text
  FROM company_data."articles"
),
extracted AS (
  SELECT
    text_prompts.article_id,
    AI_COMPLETE(
      'openai-gpt-5',
      text_prompts.prompt_text,
      '{"temperature": 0}',
      '{"type": "json", "schema": {"type": "object", "properties": {"people": {"type": "array", "items": {"type": "string"}, "description": "Names of people mentioned"}, "organizations": {"type": "array", "items": {"type": "string"}, "description": "Organization names mentioned"}, "topics": {"type": "array", "items": {"type": "string"}, "description": "Key topics discussed"}}, "required": ["people", "organizations", "topics"], "additionalProperties": false}}',
      TRUE
    ) AS ai_response
  FROM text_prompts
)
SELECT
  extracted.article_id,
  PARSE_JSON(extracted.ai_response)['structured_output'][0]['raw_message']['people'] AS people,
  PARSE_JSON(extracted.ai_response)['structured_output'][0]['raw_message']['organizations'] AS organizations,
  PARSE_JSON(extracted.ai_response)['structured_output'][0]['raw_message']['topics'] AS topics
FROM extracted

Best practices

Prompt design:

Label each data element clearly (e.g., ' Brand: ', ' Description: ')
Include a brief instruction at the start of the prompt describing the task
Use COALESCE to handle nullable columns and avoid sending nulls to the model

Schema design:

Use enum to constrain categorical outputs to known values
Set additionalProperties: false to prevent unexpected fields
Add description to each property to guide the model
Use number types for scores you want to filter or aggregate

Performance and cost:

Filter rows with WHERE before the AI_COMPLETE call to reduce inference volume
Use FETCH NEXT N ROWS ONLY to limit batch sizes during development and testing
Set temperature to 0 for deterministic, reproducible results
Keep schemas focused—fewer fields means fewer tokens per response

AI_COMPLETE Reference

Function syntax and parameter details

Model Inference

How AI inference works within your data plane

Structured Output

Designing JSON schemas for AI responses

Creating Materialized Views

Automating queries with scheduled materialized views

Overview

NQL Patterns

Use Cases

SDK Recipes

What you will build

Prerequisites

One-time Snowflake setup

Step 1: Define the output schema

Step 2: Build prompt strings in a CTE

Step 3: Call AI_COMPLETE

Step 4: Parse the response into columns

Complete example

Adapting this pattern

Product classification

Entity extraction

Best practices

AI_COMPLETE Reference

Model Inference

Structured Output

Creating Materialized Views

​What you will build

​Prerequisites

​One-time Snowflake setup

​Step 1: Define the output schema

​Step 2: Build prompt strings in a CTE

​Step 3: Call AI_COMPLETE

​Step 4: Parse the response into columns

​Complete example

​Adapting this pattern

​Product classification

​Entity extraction

​Best practices

​Related content

AI_COMPLETE Reference

Model Inference

Structured Output

Creating Materialized Views

What you will build

Prerequisites

One-time Snowflake setup

Step 1: Define the output schema

Step 2: Build prompt strings in a CTE

Step 3: Call AI_COMPLETE

Step 4: Parse the response into columns

Complete example

Adapting this pattern

Product classification

Entity extraction

Best practices

Related content