Model Inference API

The Model Inference API enables you to run LLM inference within your data plane. This reference documents all methods, types, and interfaces available in the TypeScript SDK.

Methods

runModelInference

Submits a model inference request and returns a job that can be polled for results.

async runModelInference(request: ModelInferenceRunRequest): Promise<ModelInferenceRunJob>

Parameters:

Name	Type	Required	Description
`request`	`ModelInferenceRunRequest`	Yes	The inference request configuration

Returns: Promise<ModelInferenceRunJob> - A job object that can be polled for completion. Example:

import { NarrativeApi } from '@narrative.io/data-collaboration-sdk-ts';

const api = new NarrativeApi({
  apiKey: process.env.NARRATIVE_API_KEY,
});

const job = await api.runModelInference({
  data_plane_id: 'dp_abc123',
  model: 'anthropic.claude-sonnet-4.5',
  messages: [
    {
      role: 'system',
      content: [{ type: 'text', text: 'You are a helpful assistant.' }]
    },
    {
      role: 'user',
      content: [{ type: 'text', text: 'Summarize this data in one sentence.' }]
    }
  ],
  inference_config: {
    output_format_schema: {
      type: 'object',
      properties: {
        summary: { type: 'string' }
      },
      required: ['summary']
    },
    max_tokens: 500,
    temperature: 0.7
  }
});

console.log('Job ID:', job.id);

Types

InferenceModel

Model identifiers supported by the Narrative model inference API.

type InferenceModel =
  | 'anthropic.claude-haiku-4.5'
  | 'anthropic.claude-sonnet-4.5'
  | 'anthropic.claude-sonnet-4.6'
  | 'anthropic.claude-sonnet-5.0'
  | 'anthropic.claude-opus-4.5'
  | 'anthropic.claude-opus-4.6'
  | 'anthropic.claude-opus-4.7'
  | 'anthropic.claude-opus-4.8'
  | 'openai.gpt-oss-120b'
  | 'openai.gpt-4.1'
  | 'openai.o4-mini';

Model	Provider	Use Case
`anthropic.claude-haiku-4.5`	Anthropic	Fast, cost-effective tasks
`anthropic.claude-sonnet-4.5`	Anthropic	Balanced performance and capability
`anthropic.claude-sonnet-4.6`	Anthropic	Balanced model with improved reasoning
`anthropic.claude-sonnet-5.0`	Anthropic	Latest Sonnet generation (forced-tool-use path, no `temperature` / `top_p`)
`anthropic.claude-opus-4.5`	Anthropic	Complex reasoning and analysis
`anthropic.claude-opus-4.6`	Anthropic	Highly capable Opus generation
`anthropic.claude-opus-4.7`	Anthropic	Newer Opus generation (forced-tool-use path, no `temperature` / `top_p`)
`anthropic.claude-opus-4.8`	Anthropic	Latest Opus generation (forced-tool-use path, no `temperature` / `top_p`)
`openai.gpt-oss-120b`	OpenAI	Open-source large model
`openai.gpt-4.1`	OpenAI	Advanced reasoning
`openai.o4-mini`	OpenAI	Fast, efficient responses

Models flagged as forced-tool-use enforce output_format_schema via client-side validation instead of native grammar constraints, and require the top-level object schema to set additionalProperties: false. See Structured Output for details.

MessageRole

The role of a message in the conversation.

type MessageRole = 'user' | 'assistant' | 'system';

Role	Description
`system`	Sets the model’s behavior and context
`user`	Input from the user or application
`assistant`	Previous model responses (for multi-turn conversations)

InferenceMessage

A message in the inference conversation. The content field is an ordered array of content blocks; the dominant block type is text. Agent-loop flows additionally emit tool_use (model requesting a tool call) and tool_result (response) blocks — see ContentBlock below.

interface InferenceMessage {
  role: MessageRole;
  content: ContentBlock[];
}

type ContentBlock =
  | { type: 'text';        text: string }
  | { type: 'tool_use';    tool_use_id: string; name: string; arguments: Record<string, unknown> }
  | { type: 'tool_result'; tool_use_id: string; content: ContentBlock[]; is_error: boolean };

Property	Type	Required	Description
`role`	`MessageRole`	Yes	The role of the message sender
`content`	`ContentBlock[]`	Yes	Ordered content blocks making up the message

Example:

const messages: InferenceMessage[] = [
  {
    role: 'system',
    content: [{ type: 'text', text: 'You are a data classification expert.' }]
  },
  {
    role: 'user',
    content: [{ type: 'text', text: 'Classify the following record: {...}' }]
  }
];

The legacy { role, text: string } shape is still accepted on requests for backwards compatibility — the API auto-canonicalizes it into a single text content block. Responses always emit the content-block shape. Migrate to the new shape on next edit.

InferenceConfig

Configuration parameters for the inference request.

interface InferenceConfig {
  output_format_schema: Record<string, unknown>;
  max_tokens?: number;
  temperature?: number;
  top_p?: number;
  stop_sequences?: string[];
}

Property	Type	Required	Description
`output_format_schema`	`Record<string, unknown>`	Yes	JSON Schema defining the expected output format
`max_tokens`	`number`	No	Maximum number of tokens to generate
`temperature`	`number`	No	Sampling temperature (0-1). Lower = more deterministic
`top_p`	`number`	No	Nucleus sampling parameter (0-1)
`stop_sequences`	`string[]`	No	Sequences that will stop generation

Example:

const config: InferenceConfig = {
  output_format_schema: {
    type: 'object',
    properties: {
      category: {
        type: 'string',
        enum: ['retail', 'finance', 'healthcare', 'technology']
      },
      confidence: {
        type: 'number',
        minimum: 0,
        maximum: 1
      },
      reasoning: {
        type: 'string'
      }
    },
    required: ['category', 'confidence']
  },
  max_tokens: 1000,
  temperature: 0.3
};

ModelInferenceRunRequest

The complete request body for running a model inference job.

interface ModelInferenceRunRequest {
  data_plane_id: string;
  model: InferenceModel;
  messages: InferenceMessage[];
  inference_config: InferenceConfig;
  tags?: string[];
}

Property	Type	Required	Description
`data_plane_id`	`string`	Yes	The data plane ID where inference will execute
`model`	`InferenceModel`	Yes	The model to use for inference
`messages`	`InferenceMessage[]`	Yes	The conversation messages
`inference_config`	`InferenceConfig`	Yes	Configuration for the inference
`tags`	`string[]`	No	Optional tags for organizing and filtering jobs

InferenceUsage

Token usage metrics from the inference response.

interface InferenceUsage {
  total_tokens: number;
  prompt_tokens: number;
  completion_tokens: number;
}

Property	Type	Description
`total_tokens`	`number`	Total tokens used (prompt + completion)
`prompt_tokens`	`number`	Tokens in the input messages
`completion_tokens`	`number`	Tokens in the generated response

ModelInferenceRunResult

The result from a completed model inference job.

interface ModelInferenceRunResult<T = unknown> {
  usage: InferenceUsage;
  structured_output: T | null;
  failed_structured_output: FailedStructuredOutput | null;
}

interface FailedStructuredOutput {
  message: string;
  raw_output: unknown;
}

Property	Type	Description
`usage`	`InferenceUsage`	Token usage metrics
`structured_output`	`T \| null`	The model’s response, typed according to your schema. `null` when the response failed schema validation — check `failed_structured_output` in that case.
`failed_structured_output`	`FailedStructuredOutput \| null`	Set when the model returned JSON that did not validate against `output_format_schema`. Carries the verbatim `raw_output` the model emitted and a `message` from the validator explaining why it was rejected. `null` on a successful validation.

structured_output and failed_structured_output are mutually exclusive — at most one is populated on any completed job. The inference job completes either way, so always check failed_structured_output before consuming structured_output.

Example with typed output:

interface ClassificationResult {
  category: string;
  confidence: number;
  reasoning?: string;
}

// After job completion
const result = job.result as ModelInferenceRunResult<ClassificationResult>;

console.log('Category:', result.structured_output.category);
console.log('Confidence:', result.structured_output.confidence);
console.log('Tokens used:', result.usage.total_tokens);

ModelInferenceRunJob

The job object returned when submitting an inference request. Extends the base job type with inference-specific result typing.

interface ModelInferenceRunJob extends Job {
  type: 'model_inference';
  result?: ModelInferenceRunResult;
}

Property	Type	Description
`id`	`string`	Unique job identifier
`type`	`'model_inference'`	The job type
`state`	`'pending' \| 'running' \| 'completed' \| 'failed'`	Current job state
`result`	`ModelInferenceRunResult`	Present when job completes successfully
`failures`	`object[]`	Present when job fails
`created_at`	`string`	ISO timestamp of job creation
`updated_at`	`string`	ISO timestamp of last update

Error handling

Model Inference jobs can fail for several reasons:

Error	Cause	Solution
Invalid schema	JSON Schema is malformed	Validate schema before submission
Model unavailable	Requested model not available in data plane	Check supported models
Token limit exceeded	Response would exceed max_tokens	Increase max_tokens or simplify request
Invalid data plane	Data plane ID not found or no access	Verify data plane ID and permissions

Example error handling:

const job = await api.runModelInference(request);

// Poll for completion
const completedJob = await waitForJob(job.id);

if (completedJob.state === 'failed') {
  console.error('Inference failed:', completedJob.failures);
  // Handle specific failure types
} else {
  console.log('Result:', completedJob.result.structured_output);
}

Running Model Inference

Step-by-step guide to submitting inference requests

Structured Output

Working with JSON Schema for typed responses

Tracking Jobs

Monitor inference job status

Supported Models

Available models and specifications

Overview

Connectors

Integrations

Workflow Reference

Webhook Reference

Security Reference

Architecture Reference

Error Reference

Model Inference Reference

Rosetta Stone Reference

UI Reference

SDKs

Glossary

Billing

Methods

runModelInference

Types

InferenceModel

MessageRole

InferenceMessage

InferenceConfig

ModelInferenceRunRequest

InferenceUsage

ModelInferenceRunResult

ModelInferenceRunJob

Error handling

Running Model Inference

Structured Output

Tracking Jobs

Supported Models

​Methods

​runModelInference

​Types

​InferenceModel

​MessageRole

​InferenceMessage

​InferenceConfig

​ModelInferenceRunRequest

​InferenceUsage

​ModelInferenceRunResult

​ModelInferenceRunJob

​Error handling

​Related content

Running Model Inference

Structured Output

Tracking Jobs

Supported Models

Methods

runModelInference

Types

InferenceModel

MessageRole

InferenceMessage

InferenceConfig

ModelInferenceRunRequest

InferenceUsage

ModelInferenceRunResult

ModelInferenceRunJob

Error handling

Related content