Skip to main content
Model Inference enables you to run large language model (LLM) operations directly within your data plane. Unlike traditional AI APIs that require sending data to external services, Model Inference keeps your data secure within your own infrastructure while still leveraging powerful AI capabilities.

Why Model Inference matters

When working with sensitive data, sending it to external AI services creates compliance and security risks. Model Inference solves this by hosting models within your data plane infrastructure:
Traditional AI APIsModel Inference
Data sent to external serversData stays in your infrastructure
Subject to third-party data policiesYou control data residency
Network latency to external servicesLocal execution within data plane
Compliance complexitySimplified compliance posture
This architecture enables AI-powered features—like auto-generating dataset descriptions or translating natural language to technical formats—without compromising your data governance.

How Model Inference works

Model Inference operates through Narrative’s job queue, following the same pattern as other asynchronous operations:
  1. Submit request: Your application sends an inference request specifying the model, messages, and output schema
  2. Job creation: The control plane creates an inference job and queues it
  3. Local execution: The data plane operator picks up the job and runs inference locally
  4. Structured response: Results are returned in a predictable format defined by your JSON Schema

Key capabilities

Supported models

Model Inference supports models from multiple providers, all hosted within your data plane:
ProviderModels
AnthropicClaude Haiku 4.5, Claude Sonnet 4.5, Claude Opus 4.5
OpenAIGPT-4.1, o4-mini, GPT-oss-120b
For detailed model specifications, see the official documentation from Anthropic and OpenAI.

Structured output

Every inference request includes a JSON Schema that defines the expected response format. The model is constrained to return valid JSON matching your schema, making responses predictable and easy to parse programmatically.
const inferenceConfig = {
  output_format_schema: {
    type: 'object',
    properties: {
      summary: { type: 'string' },
      confidence: { type: 'number', minimum: 0, maximum: 1 },
      categories: {
        type: 'array',
        items: { type: 'string' }
      }
    },
    required: ['summary', 'confidence']
  }
};
This guarantees you receive a response with exactly the fields you expect, in the types you specify.

Conversation context

Inference requests support multi-turn conversations through the messages array. Each message has a role (system, user, or assistant) and text content:
const messages = [
  { role: 'system', text: 'You are a data classification assistant.' },
  { role: 'user', text: 'Classify this dataset based on its columns...' }
];

Configuration options

Fine-tune model behavior with inference configuration parameters:
ParameterDescriptionDefault
max_tokensMaximum tokens in the responseModel default
temperatureRandomness (0 = deterministic, 1 = creative)Model default
top_pNucleus sampling parameterModel default
stop_sequencesStrings that stop generationNone

Common use cases

Model Inference powers AI features throughout the Narrative platform:
  • Dataset descriptions: Automatically generate human-readable descriptions from dataset metadata and samples
  • Schedule translation: Convert natural language schedules (“every weekday at 9am”) to CRON expressions
  • Data classification: Categorize records based on content analysis
  • Schema suggestions: Recommend Rosetta Stone attribute mappings based on column names and sample data

Data privacy

Because inference runs within your data plane:
  • No external API calls: Data is never sent to Anthropic, OpenAI, or any external service
  • Your infrastructure: Models run on compute resources within your data plane
  • Compliance-friendly: Simplifies GDPR, CCPA, and other regulatory requirements
  • Audit trail: All inference jobs are logged through the standard job system
For more details, see Data Privacy in Model Inference.