Methods
runModelInference
Submits a model inference request and returns a job that can be polled for results.| Name | Type | Required | Description |
|---|---|---|---|
request | ModelInferenceRunRequest | Yes | The inference request configuration |
Promise<ModelInferenceRunJob> - A job object that can be polled for completion.
Example:
Types
InferenceModel
Model identifiers supported by the Narrative model inference API.| Model | Provider | Use Case |
|---|---|---|
anthropic.claude-haiku-4.5 | Anthropic | Fast, cost-effective tasks |
anthropic.claude-sonnet-4.5 | Anthropic | Balanced performance and capability |
anthropic.claude-opus-4.5 | Anthropic | Complex reasoning and analysis |
openai.gpt-oss-120b | OpenAI | Open-source large model |
openai.gpt-4.1 | OpenAI | Advanced reasoning |
openai.o4-mini | OpenAI | Fast, efficient responses |
MessageRole
The role of a message in the conversation.| Role | Description |
|---|---|
system | Sets the model’s behavior and context |
user | Input from the user or application |
assistant | Previous model responses (for multi-turn conversations) |
InferenceMessage
A message in the inference conversation.| Property | Type | Required | Description |
|---|---|---|---|
role | MessageRole | Yes | The role of the message sender |
text | string | Yes | The message content |
InferenceConfig
Configuration parameters for the inference request.| Property | Type | Required | Description |
|---|---|---|---|
output_format_schema | Record<string, unknown> | Yes | JSON Schema defining the expected output format |
max_tokens | number | No | Maximum number of tokens to generate |
temperature | number | No | Sampling temperature (0-1). Lower = more deterministic |
top_p | number | No | Nucleus sampling parameter (0-1) |
stop_sequences | string[] | No | Sequences that will stop generation |
ModelInferenceRunRequest
The complete request body for running a model inference job.| Property | Type | Required | Description |
|---|---|---|---|
data_plane_id | string | Yes | The data plane ID where inference will execute |
model | InferenceModel | Yes | The model to use for inference |
messages | InferenceMessage[] | Yes | The conversation messages |
inference_config | InferenceConfig | Yes | Configuration for the inference |
tags | string[] | No | Optional tags for organizing and filtering jobs |
InferenceUsage
Token usage metrics from the inference response.| Property | Type | Description |
|---|---|---|
total_tokens | number | Total tokens used (prompt + completion) |
prompt_tokens | number | Tokens in the input messages |
completion_tokens | number | Tokens in the generated response |
ModelInferenceRunResult
The result from a completed model inference job.| Property | Type | Description |
|---|---|---|
usage | InferenceUsage | Token usage metrics |
structured_output | T | The model’s response, typed according to your schema |
ModelInferenceRunJob
The job object returned when submitting an inference request. Extends the base job type with inference-specific result typing.| Property | Type | Description |
|---|---|---|
id | string | Unique job identifier |
type | 'model_inference' | The job type |
state | 'pending' | 'running' | 'completed' | 'failed' | Current job state |
result | ModelInferenceRunResult | Present when job completes successfully |
failures | object[] | Present when job fails |
created_at | string | ISO timestamp of job creation |
updated_at | string | ISO timestamp of last update |
Error handling
Model Inference jobs can fail for several reasons:| Error | Cause | Solution |
|---|---|---|
| Invalid schema | JSON Schema is malformed | Validate schema before submission |
| Model unavailable | Requested model not available in data plane | Check supported models |
| Token limit exceeded | Response would exceed max_tokens | Increase max_tokens or simplify request |
| Invalid data plane | Data plane ID not found or no access | Verify data plane ID and permissions |

