Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.narrative.io/llms.txt

Use this file to discover all available pages before exploring further.

The Agent Conversations API lets you ask a language model a question, have it call tools to gather more information, and get back a structured answer — all through a few HTTP calls. The model can use server-side tools (anything exposed by a Model Context Protocol server, like the Narrative docs search) or client-side tools (where it asks you a question and waits for your reply). This page is a complete reference: the moving pieces, the parameters, three end-to-end examples, and the most common things that go wrong.

What it does

A typical request goes like this:
  1. You create a conversation with a system prompt, a model choice, and a list of tools the model is allowed to use.
  2. You start a run by sending the model a user message.
  3. The model decides whether to answer directly or call one of the tools.
    • If it picks a server-side tool, the platform calls it for you and feeds the result back into the next round of reasoning. This can repeat several times.
    • If it picks a client-side tool, the run pauses and asks you to provide the answer. You start a new run with the answer; the model continues from there.
  4. Eventually the model produces a final answer matching the structured schema you provided. The run reaches completed and you fetch the result.
The platform keeps the full conversation in a database, so you can resume a paused run later, inspect every message the model saw, and start follow-up runs on the same conversation.
The shape mirrors the OpenAI Assistants API: a thread is a conversation, a run is one user-initiated turn, a step is one inference iteration inside a run. If you already know that mental model, the only new ideas here are tool aliases and optimistic-concurrency versioning (covered below).

API endpoints

All endpoints live under /agents and require a Bearer token with agent_conversations read/write permission.
MethodPathPurpose
POST/agents/conversationsCreate an empty conversation, pinning its model + tool catalog
GET/agents/conversations/{id}Read conversation metadata (current version, defaults)
GET/agents/conversations/{id}/messages?since=NPage through messages with sequence_no > N
POST/agents/conversations/{id}/runsStart a new run — either a fresh user message, or tool outputs resuming a paused run
GET/agents/runs/{id}Read a run’s current status, results, and pending tool calls
The first request (POST /agents/conversations) returns the conversation id you need for all the others. The fourth (POST .../runs) returns the run id you can poll with the fifth.

Core concepts in one paragraph each

Conversation

The long-lived container. The system_prompt is pinned for life — once set at creation time, it applies to every run that follows and cannot be changed. Everything else in defaults (model, tools, max iterations, temperature, output schema, etc.) is exactly that — a default: it applies unless a particular run explicitly replaces it. The conversation also carries a monotonic version counter that every successful run bumps. The counter doubles as both a delta cursor (for GET .../messages?since=N) and a compare-and-swap token (more on that under Troubleshooting).

Run

One user-initiated turn of conversation. Each run kicks off a workflow on the platform that performs zero or more inference rounds plus tool calls. A run is asynchronous: the POST returns immediately with status: "pending", and you poll GET /agents/runs/{id} until it reaches a terminal state (completed, requires_action, or failed). A run may override the conversation’s defaults for that single turn via config_override — pick a different model, raise max_iterations for a hard question, swap in a different tool catalog, tighten temperature for a deterministic answer, etc. The only thing a run cannot change is the system_prompt (that’s permanently set on the conversation). Two run-only settings have no conversation-level counterpart: tool_choice (which biases the model toward a particular tool on the first iteration) and the run’s payload itself.

Server-side tool (MCP)

A tool the platform calls on your behalf. You declare it under mcp_servers with a URL and a list of tool descriptors. When the model calls one of these tools, the platform makes the HTTP request, gets the result, and feeds it back into the next inference round — all inside the same run.

Client-side tool

A tool you answer. You declare it under client_tools. When the model calls one of these, the run terminates with status: "requires_action". You read the model’s question from pending_tool_calls, decide on an answer, and start a new run with payload.kind: "tool_outputs" carrying your reply.

Tool aliases

Both server-side and client-side tools have an alias (1–8 letters/digits, must start with a letter). The model sees tool names in the form {alias}-{tool_name} — e.g. a search tool under the docs alias becomes docs-search. The alias is how the platform routes a tool call back to its definition. It’s also how it distinguishes “this is for the server to handle” from “this is for the client to handle” without you having to flag each tool individually.

Tool choice

Per-run policy that nudges the model toward (or away from) using tools on the first iteration. Three options:
  • {"kind": "auto"} — model decides. The default.
  • {"kind": "any"} — model must call some tool (no plain-text answer allowed).
  • {"kind": "specific_tool", "name": "user-confirm_booking"} — model must call exactly this tool. Useful for forcing a clarifying question or running an action you know is needed.
The choice only applies to iteration 1 of the current run. From iteration 2 onward inside that same run the model is back on {"kind": "auto"}. There is no carry-over to subsequent runs — every new run picks its own tool_choice afresh. That means you can absolutely re-impose a forced tool call on a follow-up run (for example, push the model toward a specific client-side tool every time you need to ask the user something), so long as you set tool_choice on the run’s request body.

Configuration parameters

When you create a conversation, you pass a defaults object that pins how every run on that conversation behaves. Each field can be overridden per-run via config_override, except where noted.
FieldRequiredWhat it meansTypical value
modelyesWhich language model to use. The platform exposes a fixed set of identifiers."anthropic.claude-opus-4.6"
data_plane_idyesWhich compute environment runs the inference. Each company has at least one.UUID from your platform admin
execution_clusteryesWhich job-executor pool routes the inference job to AWS Bedrock or Snowflake Cortex. Inference itself runs in the external model service, not on a platform cluster — the executor only dispatches the HTTP call. Use "shared" for almost every case; the value only matters if your company runs a dedicated executor pool with isolation requirements."shared"
max_iterationsno (default 3)How many inference rounds the model is allowed before the platform forces a failed run. Each iteration costs one model call.3 for trivial questions, 8–15 for multi-step research
max_tokensno (default 2048)Cap on the model’s reply per iteration. Doesn’t include the prompt.10244096
temperatureno (default 0.0)How creative the model is allowed to be. 0.0 is deterministic; 1.0 is creative.0.0 for factual answers, 0.7 for brainstorming
output_format_schemanoA JSON Schema describing the structure of the final answer. The platform extracts the text field from the model’s reply. Supports a subset of Draft 2020-12 — see the note below the table.See examples below
mcp_serversno (default [])List of MCP servers + their tools that the model may call.See Example 2
client_toolsno (default [])List of tools that pause the run with requires_action.See Example 3
system_promptnoA “stage direction” prepended to every run. Pinned at creation time, cannot be overridden per-run."You are a helpful assistant. Answer concisely."
mcp_servers and client_tools are wholesale-replaced, not merged, when overridden in a run. If you set them in config_override, you replace the whole list. This keeps tool namespacing predictable.
output_format_schema is a JSON Schema subset. The platform accepts the same subset of Draft 2020-12 that Bedrock structured-output accepts. Common features that do not work: union types (oneOf, anyOf, allOf, not), schema composition ($ref, $defs), conditional shape (if/then/else), pattern regexes on strings, and format validators (date-time, email, etc.). Supported: type, properties, required, enum, additionalProperties: false, minimum/maximum on numbers, minLength/maxLength on strings, minItems/maxItems on arrays. See JSON Schema Reference for the full list with examples.

Run payload shapes

POST /agents/conversations/{id}/runs accepts one of two payload kinds:
// A fresh user message
{
  "payload": { "kind": "user_message", "text": "What is 2 + 2?" }
}

// The user/system answering a tool call from a previously paused run
{
  "payload": {
    "kind": "tool_outputs",
    "outputs": [
      {
        "tool_use_id": "tooluse_xyz",
        "content": "User confirmed the proposed slot.",
        "is_error": false
      }
    ]
  }
}
The full request body also includes:
  • client_op_id — a UUID you generate, unique per (conversation, request). Used as an idempotency key — re-sending the same client_op_id returns the original run unchanged. This lets you retry safely across network blips.
  • expected_version — the conversation’s current version, which you got from the most recent GET /agents/conversations/{id}. The platform rejects the run if the version has moved on since you read it (see Version conflicts).
  • tool_choice — optional, per-run only.
  • config_override — optional, sparse — only the fields you want to change.

Run status lifecycle

Every run starts at pending and eventually reaches one of three terminal states:
pending  →  running  →  completed         ← model produced a final answer
                    ↘   requires_action   ← model called a client-side tool
                    ↘   failed            ← platform error or non-recoverable model error
Poll GET /agents/runs/{id} every few seconds (start with 2–4 seconds; back off if you don’t care about latency). When you see one of the terminal states you can stop polling. completed runs populate final_text — the string the model produced for the text field of your output schema. requires_action runs populate pending_tool_calls — an array of client-side tool calls waiting for you to answer. See Example 3 for the resume flow. failed runs populate error.type (an opaque incident code) and error.message (a human-readable detail). The response also includes error.title and error.docs_url pointing at the relevant error catalog page.
One error vocabulary for both surfaces. The agent API exposes errors in two physical shapes — HTTP 4xx/5xx with an RFC 7807 body for synchronous failures (bad request body, conversation not found, version conflict, etc.), and HTTP 200 with status: "failed" and an error object for failures that happen inside the workflow after the run has been accepted (max iterations exceeded, MCP server unreachable, invalid effective config, etc.).The two shapes carry the same caller-facing fields:
RFC 7807 (synchronous)RunErrorDto on a failed run (asynchronous)Meaning
type (URL)error.docs_urlStable URL to the docs page for this failure class
titleerror.titleShort, caller-facing summary
status (HTTP)n/a (the call itself returned 200)
detailerror.messageRequest-specific detail string
instance (path)n/a
log_idn/a; correlate via the run id
error.type (incident code)Internal stable tag for log dashboards
Both shapes link into the same error catalog, which means a single playbook covers both. Whether your client got a 409 on POST .../runs or a 200 with error.type: "AgentLoopMaxIterationsExceeded" on GET /agents/runs/{id}, the docs URL in the response is the canonical “how do I recover” entry point.The workflow side itself never knows about caller-facing presentation — it only emits opaque incident codes ("AgentLoopMaxIterationsExceeded", "UnknownTool", etc.). The translation to title + docs_url happens at the API boundary against a single catalog kept in sync with the docs pages.

Example 1 — hello world (no tools)

The simplest possible run: one user message, one assistant reply, no tools.

Create the conversation

curl -X POST "$API/agents/conversations" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "hello world",
    "system_prompt": "You are a helpful assistant. Answer concisely.",
    "defaults": {
      "model": "anthropic.claude-opus-4.6",
      "data_plane_id": "f79cbdae-4848-47ca-95e8-69588364d185",
      "execution_cluster": "shared",
      "max_iterations": 3,
      "max_tokens": 1024,
      "temperature": 0.0,
      "output_format_schema": {
        "type": "object",
        "additionalProperties": false,
        "required": ["text"],
        "properties": { "text": { "type": "string" } }
      }
    }
  }'
Response (abbreviated):
{ "id": "<conv-uuid>", "version": 0, "name": "hello world", ... }

Start the run

curl -X POST "$API/agents/conversations/<conv-uuid>/runs" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "client_op_id": "'"$(uuidgen | tr A-Z a-z)"'",
    "expected_version": 0,
    "payload": { "kind": "user_message", "text": "What is 2 + 2?" }
  }'
Response:
{ "id": "<run-uuid>", "status": "pending", "started_at": "...", ... }

Poll until terminal

curl "$API/agents/runs/<run-uuid>" -H "Authorization: Bearer $TOKEN"
After a few seconds:
{
  "id": "<run-uuid>",
  "status": "completed",
  "iterations_used": 1,
  "usage": { "completion_tokens": 8, "prompt_tokens": 169, "total_tokens": 177 },
  "final_text": "4",
  "error": null,
  ...
}

Read the message stream

curl "$API/agents/conversations/<conv-uuid>/messages?since=0" \
  -H "Authorization: Bearer $TOKEN"
{
  "current_version": 2,
  "messages": [
    { "sequence_no": 1, "role": "user",      "content_blocks": [{"type": "text", "text": "What is 2 + 2?"}], ... },
    { "sequence_no": 2, "role": "assistant", "content_blocks": [{"type": "text", "text": "4"}],              ... }
  ]
}
That’s the whole loop: one user turn (sequence 1) → one assistant turn (sequence 2) → current_version: 2. Future runs on this conversation start with expected_version: 2.

Example 2 — searching docs via an MCP server

Now the model has tools. It searches the Narrative docs for “rate limiting”, makes a few attempts to locate the right page, and produces a grounded summary.

Create the conversation with mcp_servers

curl -X POST "$API/agents/conversations" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "docs search",
    "system_prompt": "You are an AI assistant that answers questions about the Narrative Data Marketplace. Use the available tools to look things up before answering. Once you have enough information, produce a concise final answer that matches the provided output schema.",
    "defaults": {
      "model": "anthropic.claude-opus-4.6",
      "data_plane_id": "f79cbdae-4848-47ca-95e8-69588364d185",
      "execution_cluster": "shared",
      "max_iterations": 8,
      "max_tokens": 2048,
      "temperature": 0.0,
      "output_format_schema": {
        "type": "object",
        "additionalProperties": false,
        "required": ["text"],
        "properties": { "text": { "type": "string" } }
      },
      "mcp_servers": [{
        "alias": "docs",
        "url": "https://docs.narrative.io/mcp",
        "tools": [
          {
            "name": "search_narrative_i_o_knowledge_base",
            "description": "Search the Narrative.io documentation knowledge base.",
            "input_schema": {
              "type": "object",
              "additionalProperties": false,
              "required": ["query"],
              "properties": { "query": { "type": "string" } }
            }
          },
          {
            "name": "query_docs_filesystem_narrative_i_o_knowledge_base",
            "description": "Read full doc pages via shell-like commands. Use after search surfaces a path.",
            "input_schema": {
              "type": "object",
              "additionalProperties": false,
              "required": ["command"],
              "properties": { "command": { "type": "string" } }
            }
          }
        ]
      }]
    }
  }'
Key points:
  • alias: "docs" — the model sees these tools as docs-search_narrative_i_o_knowledge_base and docs-query_docs_filesystem_narrative_i_o_knowledge_base.
  • max_iterations: 8 — gives the model room to: search, read, refine, then answer.

Start the run

Same shape as Example 1 — just a user message. The model decides what tools to call.
{
  "client_op_id": "...",
  "expected_version": 0,
  "payload": { "kind": "user_message", "text": "Search the Narrative docs for information about rate limiting and summarize what you find in 2-3 sentences." },
  "tool_choice": { "kind": "auto" }
}

What you see while polling

status cycles pendingrunningrunning → … — each running you see corresponds roughly to one inference iteration. Be patient: MCP roundtrips can take 30–60 seconds total for multi-step research like this. Final state:
{
  "status": "completed",
  "iterations_used": 6,
  "usage": { "completion_tokens": 510, "prompt_tokens": 14842, "total_tokens": 15352 },
  "submitted_inference_job_ids": [
    "<job-1>", "<job-2>", "<job-3>", "<job-4>", "<job-5>", "<job-6>"
  ],
  "final_text": "According to the Narrative.io documentation, the API returns a 429 ...",
  ...
}
The message stream now has 12 entries — the user message, plus alternating assistant (with tool_use blocks) and tool (with tool_result blocks) turns, and a final assistant turn carrying the answer.
Prompt tokens grow with each iteration because each round re-sends the full conversation history to the model. Six iterations on a conversation with several search hits can easily reach 15,000 prompt tokens. Tighter system prompts and shorter tool descriptions help.

Example 3 — asking the user a question (client-side tool)

Sometimes the model needs information only the caller has. Configure a client_tools entry, force the model to use it with tool_choice, and the run will pause at requires_action until you answer.

Create the conversation with client_tools

curl -X POST "$API/agents/conversations" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "booking flow",
    "system_prompt": "You are a booking assistant. Use the available tool to propose a meeting slot to the user.",
    "defaults": {
      "model": "anthropic.claude-opus-4.6",
      "data_plane_id": "f79cbdae-4848-47ca-95e8-69588364d185",
      "execution_cluster": "shared",
      "max_iterations": 5,
      "output_format_schema": {
        "type": "object",
        "additionalProperties": false,
        "required": ["text"],
        "properties": { "text": { "type": "string" } }
      },
      "client_tools": [{
        "alias": "user",
        "tools": [{
          "name": "confirm_booking",
          "description": "Ask the user to confirm a proposed booking slot.",
          "input_schema": {
            "type": "object",
            "additionalProperties": false,
            "required": ["proposed_slot"],
            "properties": { "proposed_slot": { "type": "string" } }
          }
        }]
      }]
    }
  }'

Start the run and force the tool call

{
  "client_op_id": "...",
  "expected_version": 0,
  "payload": { "kind": "user_message", "text": "Find me a meeting time for tomorrow afternoon." },
  "tool_choice": { "kind": "specific_tool", "name": "user-confirm_booking" }
}

Poll until requires_action

{
  "status": "requires_action",
  "iterations_used": 1,
  "pending_tool_calls": [
    {
      "tool_use_id": "tooluse_DWXPKZ50JDGib5GmShyUgJ",
      "name": "user-confirm_booking",
      "arguments": { "proposed_slot": "tomorrow afternoon" }
    }
  ],
  "final_text": null
}
The run is paused. pending_tool_calls[].tool_use_id is the handle you’ll need to resume.

Resume with tool_outputs

Re-read the conversation to get the new version (it has advanced because the assistant turn is now persisted):
curl "$API/agents/conversations/<conv-uuid>" -H "Authorization: Bearer $TOKEN"
# → { "version": 2, ... }
Now start a follow-up run with the answer:
{
  "client_op_id": "...",
  "expected_version": 2,
  "payload": {
    "kind": "tool_outputs",
    "outputs": [{
      "tool_use_id": "tooluse_DWXPKZ50JDGib5GmShyUgJ",
      "content": "User confirmed the proposed slot.",
      "is_error": false
    }]
  }
}
The model picks up where it left off, sees the tool result, and produces a final answer:
{
  "status": "completed",
  "iterations_used": 1,
  "final_text": "Great news! Your meeting has been confirmed for tomorrow afternoon. ..."
}
You can answer with is_error: true to tell the model the tool failed (e.g. the user declined). The model decides whether to try another approach, ask a different question, or fall through to a final answer.

Troubleshooting

Version conflicts (409)

You get a 409 from POST .../runs with error.type pointing at /errors/version-conflict. What happened: between the moment you read version: N and the moment you posted a run with expected_version: N, something else added messages to the conversation (a previous run’s finalize, or a concurrent caller). The platform refuses to start a run that would conflict at finalize time. How to recover:
# 1. Refetch the conversation to get the new version
curl "$API/agents/conversations/<conv-uuid>" -H "Authorization: Bearer $TOKEN"

# 2. Refetch the messages since your last known version, so you know what changed
curl "$API/agents/conversations/<conv-uuid>/messages?since=<your-old-version>" \
  -H "Authorization: Bearer $TOKEN"

# 3. Re-post the same run body, but with the fresh expected_version
If you keep hitting this without obvious cause, check whether a previous run that you thought was completed is actually requires_action — the latest assistant turn might be a tool-call prompt that needs your reply, not a finished answer.

”Bad request” with a tool-alias message (400)

The defaults.mcp_servers[].alias or defaults.client_tools[].alias you sent isn’t in the allowed shape. Aliases must:
  • be 1–8 characters total,
  • start with a letter,
  • contain only ASCII letters and digits (no underscores, no dashes, no other punctuation),
  • be unique across mcp_servers and client_tools for the conversation.
Full rules and examples: /errors/invalid-tool-alias.

”Tool wire name too long” (400)

The combined {alias}-{tool_name} exceeds 64 characters (an underlying Bedrock limit). Shorten the alias or the underlying tool name. See /errors/tool-name-too-long.

requires_action and you don’t know what to answer

pending_tool_calls[] lists every tool call awaiting your reply, with the tool name and the arguments the model produced. When you post tool_outputs, every entry in pending_tool_calls must have a matching outputs[] entry — no missing, no extras. If you have nothing useful to say for a particular call (e.g. the user dismissed the prompt), still post an entry with is_error: true and a short reason. Common mistakes around tool outputs:
  • Unknown tool_use_id — you sent an id the latest assistant turn never produced. Almost always a typo or a stale resume payload.
  • Not a client tool call — you sent a tool_use_id whose alias resolves to a server-side tool. The platform already answered that one; only client-side ids belong in tool_outputs.
  • Incomplete tool outputs — you missed one of the pending ids, or sent extras the model didn’t ask for.

Run failed with AgentLoopMaxIterationsExceeded

The model used up its max_iterations budget without producing a final answer. Either:
  • raise max_iterations in defaults (or config_override per run),
  • tighten the system prompt so the model is steered toward an answer sooner,
  • inspect the message stream — repeated identical tool calls suggest the model is stuck in a loop because the tool isn’t giving it useful new information.
Details: /errors/max-iterations-exceeded.

Run failed with AgentLoopSchemaDecodeFailed

The model produced a final answer that didn’t match the output_format_schema. Usually means the model returned prose instead of a JSON object, or missed the required text field. Fixes:
  • be explicit in the system prompt: “respond with a JSON object matching this schema; no other text.”
  • consider widening max_tokens — sometimes the model is truncated mid-JSON.
  • inspect the last assistant turn to see what the model actually said.
Details: /errors/schema-decode-failed.

Anything else

Every other failure class has its own page. Start at the error catalog and use the page name that matches the error.docs_url or type in the response.

Where to go next