Running Model Inference

Model Inference enables you to run LLM operations within your data plane, keeping data secure while leveraging AI capabilities. This guide covers how to submit inference requests and handle results.

Prerequisites

SDK installed and configured (see Authentication)
A data plane ID where inference will run
An API key with appropriate permissions

Basic inference request

Submit a simple inference request with a system prompt and user message:

import { NarrativeApi } from '@narrative.io/data-collaboration-sdk-ts';

const api = new NarrativeApi({
  apiKey: process.env.NARRATIVE_API_KEY,
});

const job = await api.runModelInference({
  data_plane_id: 'dp_your_data_plane_id',
  model: 'anthropic.claude-sonnet-4.5',
  messages: [
    {
      role: 'system',
      content: [{ type: 'text', text: 'You are a helpful data assistant.' }]
    },
    {
      role: 'user',
      content: [{ type: 'text', text: 'What are common use cases for customer data collaboration?' }]
    }
  ],
  inference_config: {
    output_format_schema: {
      type: 'object',
      properties: {
        use_cases: {
          type: 'array',
          items: { type: 'string' }
        }
      },
      required: ['use_cases']
    }
  }
});

console.log('Inference job created:', job.id);

Tracking job completion

Inference jobs are asynchronous. Poll for completion using the job ID:

async function waitForInference(jobId: string, maxWaitMs = 60000) {
  const startTime = Date.now();
  const pollInterval = 2000;

  while (Date.now() - startTime < maxWaitMs) {
    const job = await api.getJob(jobId);

    if (job.state === 'completed') {
      return { success: true, result: job.result };
    }

    if (job.state === 'failed') {
      return { success: false, error: job.failures };
    }

    await new Promise(resolve => setTimeout(resolve, pollInterval));
  }

  throw new Error(`Inference job ${jobId} timed out`);
}

// Usage
const result = await waitForInference(job.id);

if (result.success) {
  console.log('Use cases:', result.result.structured_output.use_cases);
  console.log('Tokens used:', result.result.usage.total_tokens);
}

For more polling patterns, see Tracking Job Status.

Configuring inference parameters

Fine-tune the model’s behavior with configuration options:

const job = await api.runModelInference({
  data_plane_id: 'dp_your_data_plane_id',
  model: 'anthropic.claude-sonnet-4.5',
  messages: [
    {
      role: 'user',
      content: [{ type: 'text', text: 'Generate a creative tagline for a data platform.' }]
    }
  ],
  inference_config: {
    output_format_schema: {
      type: 'object',
      properties: {
        tagline: { type: 'string' },
        tone: { type: 'string', enum: ['professional', 'playful', 'bold'] }
      },
      required: ['tagline', 'tone']
    },
    max_tokens: 200,
    temperature: 0.8,  // Higher for more creative responses
    top_p: 0.9
  }
});

Parameter	Effect	Typical Values
`temperature`	Controls randomness. Lower = more deterministic	0.0-0.3 for factual, 0.7-1.0 for creative
`max_tokens`	Limits response length	100-4000 depending on task
`top_p`	Nucleus sampling threshold	0.9-1.0 for most cases
`stop_sequences`	Strings that end generation	`["\n\n", "END"]`

Multi-turn conversations

Include previous messages for context-aware responses:

const messages = [
  {
    role: 'system',
    content: [{ type: 'text', text: 'You are a data classification assistant.' }]
  },
  {
    role: 'user',
    content: [{ type: 'text', text: 'I have a dataset with email, purchase_date, and amount columns.' }]
  },
  {
    role: 'assistant',
    content: [{ type: 'text', text: 'This appears to be transactional customer data.' }]
  },
  {
    role: 'user',
    content: [{ type: 'text', text: 'What privacy considerations should I be aware of?' }]
  }
];

const job = await api.runModelInference({
  data_plane_id: 'dp_your_data_plane_id',
  model: 'anthropic.claude-sonnet-4.5',
  messages,
  inference_config: {
    output_format_schema: {
      type: 'object',
      properties: {
        considerations: {
          type: 'array',
          items: {
            type: 'object',
            properties: {
              issue: { type: 'string' },
              recommendation: { type: 'string' }
            },
            required: ['issue', 'recommendation']
          }
        }
      },
      required: ['considerations']
    }
  }
});

Tool-use messages (agent loops)

In agent-style flows, the model emits tool_use content blocks requesting a tool call and you reply with a matching tool_result block. Both share the same tool_use_id so the model can correlate the request and response across turns.

const messages = [
  {
    role: 'system',
    content: [{ type: 'text', text: 'You are a dataset discovery agent.' }]
  },
  {
    role: 'user',
    content: [{ type: 'text', text: 'Find datasets containing email addresses.' }]
  },
  {
    role: 'assistant',
    content: [
      {
        type: 'tool_use',
        tool_use_id: 'tooluse_DWXPKZ50JDGib5GmShyUgJ',
        name: 'n-narrative_datasets_describe',
        arguments: { search: 'email' }
      }
    ]
  },
  {
    role: 'user',
    content: [
      {
        type: 'tool_result',
        tool_use_id: 'tooluse_DWXPKZ50JDGib5GmShyUgJ',
        content: [{ type: 'text', text: '[{"id": 123, "name": "company_emails"}]' }],
        is_error: false
      }
    ]
  }
];

The legacy { role, text: string } request shape is still accepted for backwards compatibility — the API auto-canonicalizes it into a single text content block. Responses always emit the content-block shape. New integrations should use content blocks directly. See Model Inference API reference.

Choosing a model

Select the model based on your task requirements:

Model	Best For
`anthropic.claude-haiku-4.5`	Fast, simple tasks (classification, extraction)
`anthropic.claude-sonnet-4.5`	Balanced tasks (summarization, analysis)
`anthropic.claude-sonnet-4.6`	Balanced model with improved reasoning
`anthropic.claude-sonnet-5.0`	Latest Sonnet generation (forced-tool-use, drops `temperature` / `top_p`)
`anthropic.claude-opus-4.5`	Complex reasoning (multi-step analysis, nuanced decisions)
`anthropic.claude-opus-4.6`	Highly capable Opus generation
`anthropic.claude-opus-4.7`	Newer Opus generation (forced-tool-use, drops `temperature` / `top_p`)
`anthropic.claude-opus-4.8`	Latest Opus generation (forced-tool-use, drops `temperature` / `top_p`)
`openai.gpt-4.1`	Advanced reasoning tasks
`openai.o4-mini`	Fast, cost-effective tasks

For detailed guidance, see Choosing the Right Model.

Typing responses

Use TypeScript generics to type the structured output:

interface SentimentResult {
  sentiment: 'positive' | 'negative' | 'neutral';
  confidence: number;
  key_phrases: string[];
}

const job = await api.runModelInference({
  data_plane_id: 'dp_your_data_plane_id',
  model: 'anthropic.claude-haiku-4.5',
  messages: [
    {
      role: 'user',
      content: [{ type: 'text', text: 'Analyze the sentiment: "This product exceeded my expectations!"' }]
    }
  ],
  inference_config: {
    output_format_schema: {
      type: 'object',
      properties: {
        sentiment: { type: 'string', enum: ['positive', 'negative', 'neutral'] },
        confidence: { type: 'number', minimum: 0, maximum: 1 },
        key_phrases: { type: 'array', items: { type: 'string' } }
      },
      required: ['sentiment', 'confidence', 'key_phrases']
    }
  }
});

// After polling for completion
const result = completedJob.result as ModelInferenceRunResult<SentimentResult>;

// Fully typed access
const sentiment: string = result.structured_output.sentiment;
const confidence: number = result.structured_output.confidence;

Error handling

Handle common inference errors:

try {
  const job = await api.runModelInference(request);
  const result = await waitForInference(job.id);

  if (!result.success) {
    // Job failed during execution
    console.error('Inference failed:', result.error);
    return;
  }

  // Process successful result
  console.log('Output:', result.result.structured_output);

} catch (error) {
  if (error.status === 400) {
    console.error('Invalid request:', error.message);
    // Check schema, messages, or model
  } else if (error.status === 403) {
    console.error('Access denied to data plane');
  } else if (error.status === 404) {
    console.error('Data plane not found');
  } else {
    console.error('Unexpected error:', error);
  }
}

Best practices

Practice	Description
Use specific schemas	Define precise JSON Schema to get consistent outputs
Choose appropriate models	Use smaller models for simple tasks to save cost and time
Set reasonable max_tokens	Avoid unnecessarily large values that increase latency
Include system prompts	Guide model behavior with clear instructions
Handle failures gracefully	Implement retries for transient errors

Structured Output Guide

Deep dive into JSON Schema for inference

Choosing Models

Select the right model for your task

Model Inference API

Complete API reference

Tracking Jobs

Monitor job status and handle completion

Overview

Platform

Data Ingestion

Data Planes

Querying with NQL

Data Collaboration

Identity Resolution

Data Activation

Audience Studio

Connectors

Compliance

Rosetta Stone

Account Settings

Workflows

Webhooks

Data Collaboration MCP Server

Tools

SDKs

Prerequisites

Basic inference request

Tracking job completion

Configuring inference parameters

Multi-turn conversations

Tool-use messages (agent loops)

Choosing a model

Typing responses

Error handling

Best practices

Structured Output Guide

Choosing Models

Model Inference API

Tracking Jobs

​Prerequisites

​Basic inference request

​Tracking job completion

​Configuring inference parameters

​Multi-turn conversations

​Tool-use messages (agent loops)

​Choosing a model

​Typing responses

​Error handling

​Best practices

​Related content

Structured Output Guide

Choosing Models

Model Inference API

Tracking Jobs

Prerequisites

Basic inference request

Tracking job completion

Configuring inference parameters

Multi-turn conversations

Tool-use messages (agent loops)

Choosing a model

Typing responses

Error handling

Best practices

Related content