Model Inference

Model Inference enables you to run large language model (LLM) operations directly within your data plane. Unlike traditional AI APIs that require sending data to external services, Model Inference keeps your data secure within your own infrastructure while still leveraging powerful AI capabilities.

Why Model Inference matters

When working with sensitive data, sending it to external AI services creates compliance and security risks. Model Inference solves this by hosting models within your data plane infrastructure:

Traditional AI APIs	Model Inference
Data sent to external servers	Data stays in your infrastructure
Subject to third-party data policies	You control data residency
Network latency to external services	Local execution within data plane
Compliance complexity	Simplified compliance posture

This architecture enables AI-powered features—like auto-generating dataset descriptions or translating natural language to technical formats—without compromising your data governance.

How Model Inference works

Model Inference operates through Narrative’s job queue, following the same pattern as other asynchronous operations:

Submit request: Your application sends an inference request specifying the model, messages, and output schema
Job creation: The control plane creates an inference job and queues it
Local execution: The data plane operator picks up the job and runs inference locally
Structured response: Results are returned in a predictable format defined by your JSON Schema

Key capabilities

Supported models

Model Inference supports models from multiple providers, all hosted within your data plane:

Provider	Models
Anthropic	Claude Haiku 4.5, Claude Sonnet 4.5, Claude Opus 4.5
OpenAI	GPT-4.1, o4-mini, GPT-oss-120b

For detailed model specifications, see the official documentation from Anthropic and OpenAI.

Structured output

Every inference request includes a JSON Schema that defines the expected response format. The model is constrained to return valid JSON matching your schema, making responses predictable and easy to parse programmatically.

const inferenceConfig = {
  output_format_schema: {
    type: 'object',
    properties: {
      summary: { type: 'string' },
      confidence: { type: 'number', minimum: 0, maximum: 1 },
      categories: {
        type: 'array',
        items: { type: 'string' }
      }
    },
    required: ['summary', 'confidence']
  }
};

This guarantees you receive a response with exactly the fields you expect, in the types you specify.

Conversation context

Inference requests support multi-turn conversations through the messages array. Each message has a role (system, user, or assistant) and a content array of content blocks. The dominant block type is text; agent-loop flows additionally emit tool_use (the model requesting a tool call) and tool_result (the response) blocks.

const messages = [
  {
    role: 'system',
    content: [{ type: 'text', text: 'You are a data classification assistant.' }]
  },
  {
    role: 'user',
    content: [{ type: 'text', text: 'Classify this dataset based on its columns...' }]
  }
];

Legacy {role, text} shape is still accepted. Older integrations that send { role: 'user', text: 'hello' } continue to work — the API auto-canonicalizes the request into a single text content block. New code should use the content-block shape; responses always emit it.

Configuration options

Fine-tune model behavior with inference configuration parameters:

Parameter	Description	Default
`max_tokens`	Maximum tokens in the response	Model default
`temperature`	Randomness (0 = deterministic, 1 = creative)	Model default
`top_p`	Nucleus sampling parameter	Model default
`stop_sequences`	Strings that stop generation	None

Common use cases

Model Inference powers AI features throughout the Narrative platform:

Dataset descriptions: Automatically generate human-readable descriptions from dataset metadata and samples
Schedule translation: Convert natural language schedules (“every weekday at 9am”) to CRON expressions
Data classification: Categorize records based on content analysis
Schema suggestions: Recommend Rosetta Stone attribute mappings based on column names and sample data

Data privacy

Because inference runs within your data plane:

No external API calls: Data is never sent to Anthropic, OpenAI, or any external service
Your infrastructure: Models run on compute resources within your data plane
Compliance-friendly: Simplifies GDPR, CCPA, and other regulatory requirements
Audit trail: All inference jobs are logged through the standard job system

For more details, see Data Privacy in Model Inference.

Running Model Inference

Submit inference requests using the SDK

Structured Output

How JSON Schema ensures predictable responses

Supported Models

Available models and their capabilities

Data Privacy

How inference keeps data secure

Overview

Core Primitives

Rosetta Stone

NQL

Data Formats

Identifiers

Architecture

Workflows

Webhooks

Data Collaboration MCP Server

Security

Compliance

Data Activation

Apps

Model Inference

Why Model Inference matters

How Model Inference works

Key capabilities

Supported models

Structured output

Conversation context

Configuration options

Common use cases

Data privacy

Running Model Inference

Structured Output

Supported Models

Data Privacy

​Why Model Inference matters

​How Model Inference works

​Key capabilities

​Supported models

​Structured output

​Conversation context

​Configuration options

​Common use cases

​Data privacy

​Related content

Running Model Inference

Structured Output

Supported Models

Data Privacy

Why Model Inference matters

How Model Inference works

Key capabilities

Supported models

Structured output

Conversation context

Configuration options

Common use cases

Data privacy

Related content