The Model Inference API enables you to run LLM inference within your data plane . This reference documents all methods, types, and interfaces available in the TypeScript SDK.
Methods
runModelInference
Submits a model inference request and returns a job that can be polled for results.
async runModelInference ( request : ModelInferenceRunRequest ): Promise < ModelInferenceRunJob >
Parameters:
Name Type Required Description requestModelInferenceRunRequestYes The inference request configuration
Returns: Promise<ModelInferenceRunJob> - A job object that can be polled for completion.
Example:
import { NarrativeApi } from '@narrative.io/data-collaboration-sdk-ts' ;
const api = new NarrativeApi ({
apiKey: process . env . NARRATIVE_API_KEY ,
});
const job = await api . runModelInference ({
data_plane_id: 'dp_abc123' ,
model: 'anthropic.claude-sonnet-4.5' ,
messages: [
{
role: 'system' ,
content: [{ type: 'text' , text: 'You are a helpful assistant.' }]
},
{
role: 'user' ,
content: [{ type: 'text' , text: 'Summarize this data in one sentence.' }]
}
],
inference_config: {
output_format_schema: {
type: 'object' ,
properties: {
summary: { type: 'string' }
},
required: [ 'summary' ]
},
max_tokens: 500 ,
temperature: 0.7
}
});
console . log ( 'Job ID:' , job . id );
Types
InferenceModel
Model identifiers supported by the Narrative model inference API.
type InferenceModel =
| 'anthropic.claude-haiku-4.5'
| 'anthropic.claude-sonnet-4.5'
| 'anthropic.claude-sonnet-4.6'
| 'anthropic.claude-opus-4.5'
| 'anthropic.claude-opus-4.6'
| 'openai.gpt-oss-120b'
| 'openai.gpt-4.1'
| 'openai.o4-mini' ;
Model Provider Use Case anthropic.claude-haiku-4.5Anthropic Fast, cost-effective tasks anthropic.claude-sonnet-4.5Anthropic Balanced performance and capability anthropic.claude-sonnet-4.6Anthropic Latest balanced model with improved reasoning anthropic.claude-opus-4.5Anthropic Complex reasoning and analysis anthropic.claude-opus-4.6Anthropic Latest most capable model for complex reasoning openai.gpt-oss-120bOpenAI Open-source large model openai.gpt-4.1OpenAI Advanced reasoning openai.o4-miniOpenAI Fast, efficient responses
MessageRole
The role of a message in the conversation.
type MessageRole = 'user' | 'assistant' | 'system' ;
Role Description systemSets the model’s behavior and context userInput from the user or application assistantPrevious model responses (for multi-turn conversations)
InferenceMessage
A message in the inference conversation. The content field is an ordered array of
content blocks; the dominant block type is text. Agent-loop flows additionally emit
tool_use (model requesting a tool call) and tool_result (response) blocks — see
ContentBlock below.
interface InferenceMessage {
role : MessageRole ;
content : ContentBlock [];
}
type ContentBlock =
| { type : 'text' ; text : string }
| { type : 'tool_use' ; tool_use_id : string ; name : string ; arguments : Record < string , unknown > }
| { type : 'tool_result' ; tool_use_id : string ; content : ContentBlock []; is_error : boolean };
Property Type Required Description roleMessageRoleYes The role of the message sender contentContentBlock[]Yes Ordered content blocks making up the message
Example:
const messages : InferenceMessage [] = [
{
role: 'system' ,
content: [{ type: 'text' , text: 'You are a data classification expert.' }]
},
{
role: 'user' ,
content: [{ type: 'text' , text: 'Classify the following record: {...}' }]
}
];
The legacy { role, text: string } shape is still accepted on requests for backwards
compatibility — the API auto-canonicalizes it into a single text content block.
Responses always emit the content-block shape. Migrate to the new shape on next edit.
InferenceConfig
Configuration parameters for the inference request.
interface InferenceConfig {
output_format_schema : Record < string , unknown >;
max_tokens ?: number ;
temperature ?: number ;
top_p ?: number ;
stop_sequences ?: string [];
}
Property Type Required Description output_format_schemaRecord<string, unknown>Yes JSON Schema defining the expected output format max_tokensnumberNo Maximum number of tokens to generate temperaturenumberNo Sampling temperature (0-1). Lower = more deterministic top_pnumberNo Nucleus sampling parameter (0-1) stop_sequencesstring[]No Sequences that will stop generation
Example:
const config : InferenceConfig = {
output_format_schema: {
type: 'object' ,
properties: {
category: {
type: 'string' ,
enum: [ 'retail' , 'finance' , 'healthcare' , 'technology' ]
},
confidence: {
type: 'number' ,
minimum: 0 ,
maximum: 1
},
reasoning: {
type: 'string'
}
},
required: [ 'category' , 'confidence' ]
},
max_tokens: 1000 ,
temperature: 0.3
};
ModelInferenceRunRequest
The complete request body for running a model inference job.
interface ModelInferenceRunRequest {
data_plane_id : string ;
model : InferenceModel ;
messages : InferenceMessage [];
inference_config : InferenceConfig ;
tags ?: string [];
}
Property Type Required Description data_plane_idstringYes The data plane ID where inference will execute modelInferenceModelYes The model to use for inference messagesInferenceMessage[]Yes The conversation messages inference_configInferenceConfigYes Configuration for the inference tagsstring[]No Optional tags for organizing and filtering jobs
InferenceUsage
Token usage metrics from the inference response.
interface InferenceUsage {
total_tokens : number ;
prompt_tokens : number ;
completion_tokens : number ;
}
Property Type Description total_tokensnumberTotal tokens used (prompt + completion) prompt_tokensnumberTokens in the input messages completion_tokensnumberTokens in the generated response
ModelInferenceRunResult
The result from a completed model inference job.
interface ModelInferenceRunResult < T = unknown > {
usage : InferenceUsage ;
structured_output : T ;
}
Property Type Description usageInferenceUsageToken usage metrics structured_outputTThe model’s response, typed according to your schema
Example with typed output:
interface ClassificationResult {
category : string ;
confidence : number ;
reasoning ?: string ;
}
// After job completion
const result = job . result as ModelInferenceRunResult < ClassificationResult >;
console . log ( 'Category:' , result . structured_output . category );
console . log ( 'Confidence:' , result . structured_output . confidence );
console . log ( 'Tokens used:' , result . usage . total_tokens );
ModelInferenceRunJob
The job object returned when submitting an inference request. Extends the base job type with inference-specific result typing.
interface ModelInferenceRunJob extends Job {
type : 'model_inference' ;
result ?: ModelInferenceRunResult ;
}
Property Type Description idstringUnique job identifier type'model_inference'The job type state'pending' | 'running' | 'completed' | 'failed'Current job state resultModelInferenceRunResultPresent when job completes successfully failuresobject[]Present when job fails created_atstringISO timestamp of job creation updated_atstringISO timestamp of last update
Error handling
Model Inference jobs can fail for several reasons:
Error Cause Solution Invalid schema JSON Schema is malformed Validate schema before submission Model unavailable Requested model not available in data plane Check supported models Token limit exceeded Response would exceed max_tokens Increase max_tokens or simplify request Invalid data plane Data plane ID not found or no access Verify data plane ID and permissions
Example error handling:
const job = await api . runModelInference ( request );
// Poll for completion
const completedJob = await waitForJob ( job . id );
if ( completedJob . state === 'failed' ) {
console . error ( 'Inference failed:' , completedJob . failures );
// Handle specific failure types
} else {
console . log ( 'Result:' , completedJob . result . structured_output );
}
Related content
Running Model Inference Step-by-step guide to submitting inference requests
Structured Output Working with JSON Schema for typed responses
Tracking Jobs Monitor inference job status
Supported Models Available models and specifications