Skip to main content
This cookbook demonstrates how to use the AI_COMPLETE function to enrich records with AI-generated structured attributes directly within NQL. The entire inference pipeline runs inside your Snowflake data plane—your data never leaves your infrastructure.
AI_COMPLETE is available on Snowflake data planes only.

What you will build

A materialized view that takes raw customer support tickets and enriches each record with:
  • A standardized issue category
  • A priority level
  • A sentiment score
  • Extracted product mentions
The same pattern applies to any enrichment task: product classification, lead scoring, content tagging, entity extraction, and more.

Prerequisites

Step 1: Define the output schema

Before writing NQL, design the JSON Schema that describes the structured attributes you want the model to produce. This schema constrains the model’s output, guaranteeing predictable, typed results.
{
  "type": "json",
  "schema": {
    "type": "object",
    "properties": {
      "category": {
        "type": "string",
        "description": "The primary issue category",
        "enum": [
          "billing",
          "technical",
          "account_access",
          "feature_request",
          "general_inquiry"
        ]
      },
      "priority": {
        "type": "string",
        "description": "Urgency level based on the issue description",
        "enum": ["low", "medium", "high", "critical"]
      },
      "sentiment": {
        "type": "number",
        "description": "Customer sentiment from -1 (very negative) to 1 (very positive)"
      },
      "product_mentioned": {
        "type": "string",
        "description": "The specific product or feature referenced, or 'none' if not applicable"
      }
    },
    "required": ["category", "priority", "sentiment", "product_mentioned"],
    "additionalProperties": false
  }
}
Key schema design decisions:
DecisionWhy
enum on category and priorityRestricts output to known values, preventing free-form drift
number for sentimentEnables numeric filtering and aggregation downstream
additionalProperties: falsePrevents the model from adding unexpected fields
description on each fieldGuides the model’s interpretation of what each field means
Keep schemas focused on the fields your downstream queries will actually use. Every additional field increases token usage and inference cost.

Step 2: Build prompt strings in a CTE

AI_COMPLETE requires its prompt parameter to be a column reference, not a string literal. Use a Common Table Expression (CTE) to assemble prompts from your source columns.
WITH prompts AS (
  SELECT
    company_data."support_tickets".ticket_id,
    company_data."support_tickets".customer_id,
    company_data."support_tickets".created_at,
    'Analyze the following customer support ticket and extract structured attributes.'
      || ' Ticket subject: ' || company_data."support_tickets".subject
      || ' Ticket body: ' || company_data."support_tickets".body
    AS prompt_text
  FROM company_data."support_tickets"
  WHERE company_data."support_tickets".created_at > CURRENT_TIMESTAMP - INTERVAL '7' DAY
)
SELECT prompts.ticket_id, prompts.prompt_text
FROM prompts
Tips for building effective prompts:
  • Prefix each field with a label (e.g., ' Ticket subject: ') so the model can distinguish between data elements
  • Filter rows before calling AI_COMPLETE to avoid unnecessary inference costs
  • Use COALESCE to handle nulls: COALESCE(company_data."support_tickets".subject, 'No subject')

Step 3: Call AI_COMPLETE

Pass the prompt column to AI_COMPLETE along with your schema. The response is a JSON string that you parse in the next step.
WITH prompts AS (
  SELECT
    company_data."support_tickets".ticket_id,
    company_data."support_tickets".customer_id,
    company_data."support_tickets".created_at,
    'Analyze the following customer support ticket and extract structured attributes.'
      || ' Ticket subject: ' || company_data."support_tickets".subject
      || ' Ticket body: ' || company_data."support_tickets".body
    AS prompt_text
  FROM company_data."support_tickets"
  WHERE company_data."support_tickets".created_at > CURRENT_TIMESTAMP - INTERVAL '7' DAY
),
enriched AS (
  SELECT
    prompts.ticket_id,
    prompts.customer_id,
    prompts.created_at,
    AI_COMPLETE(
      'openai-gpt-5',
      prompts.prompt_text,
      '{"temperature": 0}',
      '{"type": "json", "schema": {"type": "object", "properties": {"category": {"type": "string", "description": "The primary issue category", "enum": ["billing", "technical", "account_access", "feature_request", "general_inquiry"]}, "priority": {"type": "string", "description": "Urgency level", "enum": ["low", "medium", "high", "critical"]}, "sentiment": {"type": "number", "description": "Sentiment from -1 (very negative) to 1 (very positive)"}, "product_mentioned": {"type": "string", "description": "Product or feature referenced, or none"}}, "required": ["category", "priority", "sentiment", "product_mentioned"], "additionalProperties": false}}',
      TRUE
    ) AS ai_response
  FROM prompts
)
SELECT enriched.ticket_id, enriched.ai_response
FROM enriched
Parameter choices:
ParameterValueReason
model'openai-gpt-5'Choose the model that best fits your accuracy and cost requirements
model_parameters'{"temperature": 0}'Temperature 0 produces deterministic, consistent classifications
show_detailsTRUEReturns the full structured_output array for reliable JSON parsing

Step 4: Parse the response into columns

Use PARSE_JSON and path notation to extract individual fields from the AI response into typed columns.
SELECT
  enriched.ticket_id,
  enriched.customer_id,
  enriched.created_at,
  CAST(
    PARSE_JSON(enriched.ai_response)['structured_output'][0]['raw_message']['category'] AS STRING
  ) AS category,
  CAST(
    PARSE_JSON(enriched.ai_response)['structured_output'][0]['raw_message']['priority'] AS STRING
  ) AS priority,
  CAST(
    PARSE_JSON(enriched.ai_response)['structured_output'][0]['raw_message']['sentiment'] AS DOUBLE
  ) AS sentiment,
  CAST(
    PARSE_JSON(enriched.ai_response)['structured_output'][0]['raw_message']['product_mentioned'] AS STRING
  ) AS product_mentioned
FROM enriched
The path ['structured_output'][0]['raw_message'] navigates to the model’s schema-validated output. Each field within raw_message corresponds to a property in your response_format schema.

Complete example

Here is the full query wrapped as a materialized view that refreshes weekly:
CREATE MATERIALIZED VIEW "support_tickets_enriched"
DISPLAY_NAME = 'Support Tickets - AI Enriched'
DESCRIPTION = 'Tickets enriched with AI-classified category, priority, sentiment, and product mentions'
TAGS = ('ai_enrichment', 'support', 'classification')
WRITE_MODE = 'overwrite'
REFRESH_SCHEDULE = '@weekly'
AS
WITH prompts AS (
  SELECT
    company_data."support_tickets".ticket_id,
    company_data."support_tickets".customer_id,
    company_data."support_tickets".created_at,
    'Analyze the following customer support ticket and extract structured attributes.'
      || ' Ticket subject: ' || COALESCE(company_data."support_tickets".subject, 'No subject')
      || ' Ticket body: ' || COALESCE(company_data."support_tickets".body, 'No body')
    AS prompt_text
  FROM company_data."support_tickets"
  WHERE company_data."support_tickets".created_at > CURRENT_TIMESTAMP - INTERVAL '30' DAY
),
ai_results AS (
  SELECT
    prompts.ticket_id,
    prompts.customer_id,
    prompts.created_at,
    AI_COMPLETE(
      'openai-gpt-5',
      prompts.prompt_text,
      '{"temperature": 0}',
      '{"type": "json", "schema": {"type": "object", "properties": {"category": {"type": "string", "description": "The primary issue category", "enum": ["billing", "technical", "account_access", "feature_request", "general_inquiry"]}, "priority": {"type": "string", "description": "Urgency level", "enum": ["low", "medium", "high", "critical"]}, "sentiment": {"type": "number", "description": "Sentiment from -1 (very negative) to 1 (very positive)"}, "product_mentioned": {"type": "string", "description": "Product or feature referenced, or none"}}, "required": ["category", "priority", "sentiment", "product_mentioned"], "additionalProperties": false}}',
      TRUE
    ) AS ai_response
  FROM prompts
)
SELECT
  ai_results.ticket_id,
  ai_results.customer_id,
  ai_results.created_at,
  CAST(
    PARSE_JSON(ai_results.ai_response)['structured_output'][0]['raw_message']['category'] AS STRING
  ) AS category,
  CAST(
    PARSE_JSON(ai_results.ai_response)['structured_output'][0]['raw_message']['priority'] AS STRING
  ) AS priority,
  CAST(
    PARSE_JSON(ai_results.ai_response)['structured_output'][0]['raw_message']['sentiment'] AS DOUBLE
  ) AS sentiment,
  CAST(
    PARSE_JSON(ai_results.ai_response)['structured_output'][0]['raw_message']['product_mentioned'] AS STRING
  ) AS product_mentioned
FROM ai_results

Adapting this pattern

The CTE → AI_COMPLETE → PARSE_JSON pattern works for any enrichment task. Here are variations for common scenarios.

Product classification

Classify products into a taxonomy from their name and description:
WITH product_prompts AS (
  SELECT
    company_data."products".product_id,
    'Classify this product into a category.'
      || ' Brand: ' || COALESCE(company_data."products".brand_name, 'Unknown')
      || ' Product name: ' || company_data."products".product_name
      || ' Description: ' || COALESCE(company_data."products".description, 'No description')
    AS prompt_text
  FROM company_data."products"
),
classified AS (
  SELECT
    product_prompts.product_id,
    AI_COMPLETE(
      'openai-gpt-5',
      product_prompts.prompt_text,
      '{"temperature": 0}',
      '{"type": "json", "schema": {"type": "object", "properties": {"primary_category": {"type": "string", "description": "Top-level product category"}, "sub_category": {"type": "string", "description": "More specific product category"}, "confidence": {"type": "number", "description": "Classification confidence from 0 to 1"}}, "required": ["primary_category", "sub_category", "confidence"], "additionalProperties": false}}',
      TRUE
    ) AS ai_response
  FROM product_prompts
)
SELECT
  classified.product_id,
  CAST(
    PARSE_JSON(classified.ai_response)['structured_output'][0]['raw_message']['primary_category'] AS STRING
  ) AS primary_category,
  CAST(
    PARSE_JSON(classified.ai_response)['structured_output'][0]['raw_message']['sub_category'] AS STRING
  ) AS sub_category,
  CAST(
    PARSE_JSON(classified.ai_response)['structured_output'][0]['raw_message']['confidence'] AS DOUBLE
  ) AS confidence
FROM classified

Entity extraction

Extract structured entities from free-text fields:
WITH text_prompts AS (
  SELECT
    company_data."articles".article_id,
    'Extract key entities from this text.'
      || ' Text: ' || company_data."articles".body
    AS prompt_text
  FROM company_data."articles"
),
extracted AS (
  SELECT
    text_prompts.article_id,
    AI_COMPLETE(
      'openai-gpt-5',
      text_prompts.prompt_text,
      '{"temperature": 0}',
      '{"type": "json", "schema": {"type": "object", "properties": {"people": {"type": "array", "items": {"type": "string"}, "description": "Names of people mentioned"}, "organizations": {"type": "array", "items": {"type": "string"}, "description": "Organization names mentioned"}, "topics": {"type": "array", "items": {"type": "string"}, "description": "Key topics discussed"}}, "required": ["people", "organizations", "topics"], "additionalProperties": false}}',
      TRUE
    ) AS ai_response
  FROM text_prompts
)
SELECT
  extracted.article_id,
  PARSE_JSON(extracted.ai_response)['structured_output'][0]['raw_message']['people'] AS people,
  PARSE_JSON(extracted.ai_response)['structured_output'][0]['raw_message']['organizations'] AS organizations,
  PARSE_JSON(extracted.ai_response)['structured_output'][0]['raw_message']['topics'] AS topics
FROM extracted

Best practices

Prompt design:
  • Label each data element clearly (e.g., ' Brand: ', ' Description: ')
  • Include a brief instruction at the start of the prompt describing the task
  • Use COALESCE to handle nullable columns and avoid sending nulls to the model
Schema design:
  • Use enum to constrain categorical outputs to known values
  • Set additionalProperties: false to prevent unexpected fields
  • Add description to each property to guide the model
  • Use number types for scores you want to filter or aggregate
Performance and cost:
  • Filter rows with WHERE before the AI_COMPLETE call to reduce inference volume
  • Use FETCH NEXT N ROWS ONLY to limit batch sizes during development and testing
  • Set temperature to 0 for deterministic, reproducible results
  • Keep schemas focused—fewer fields means fewer tokens per response

AI_COMPLETE Reference

Function syntax and parameter details

Model Inference

How AI inference works within your data plane

Structured Output

Designing JSON schemas for AI responses

Creating Materialized Views

Automating queries with scheduled materialized views