The Agent Conversations API lets you ask a language model a question, have it call tools to gather more information, and get back a structured answer — all through a few HTTP calls. The model can use server-side tools (anything exposed by a Model Context Protocol server, like the Narrative docs search) or client-side tools (where it asks you a question and waits for your reply). This page is a complete reference: the moving pieces, the parameters, three end-to-end examples, and the most common things that go wrong.Documentation Index
Fetch the complete documentation index at: https://docs.narrative.io/llms.txt
Use this file to discover all available pages before exploring further.
What it does
A typical request goes like this:- You create a conversation with a system prompt, a model choice, and a list of tools the model is allowed to use.
- You start a run by sending the model a user message.
- The model decides whether to answer directly or call one of the tools.
- If it picks a server-side tool, the platform calls it for you and feeds the result back into the next round of reasoning. This can repeat several times.
- If it picks a client-side tool, the run pauses and asks you to provide the answer. You start a new run with the answer; the model continues from there.
- Eventually the model produces a final answer matching the structured schema you provided.
The run reaches
completedand you fetch the result.
The shape mirrors the OpenAI Assistants API: a
thread is a conversation, a run is one user-initiated turn, a step is one inference
iteration inside a run. If you already know that mental model, the only new ideas here are
tool aliases and optimistic-concurrency versioning (covered below).
API endpoints
All endpoints live under/agents and require a Bearer token with
agent_conversations read/write permission.
| Method | Path | Purpose |
|---|---|---|
POST | /agents/conversations | Create an empty conversation, pinning its model + tool catalog |
GET | /agents/conversations/{id} | Read conversation metadata (current version, defaults) |
GET | /agents/conversations/{id}/messages?since=N | Page through messages with sequence_no > N |
POST | /agents/conversations/{id}/runs | Start a new run — either a fresh user message, or tool outputs resuming a paused run |
GET | /agents/runs/{id} | Read a run’s current status, results, and pending tool calls |
POST /agents/conversations) returns the conversation id you need for
all the others. The fourth (POST .../runs) returns the run id you can poll with the fifth.
Core concepts in one paragraph each
Conversation
The long-lived container. Thesystem_prompt is pinned for life — once set at creation
time, it applies to every run that follows and cannot be changed. Everything else in defaults
(model, tools, max iterations, temperature, output schema, etc.) is exactly that — a
default: it applies unless a particular run explicitly replaces it. The conversation also
carries a monotonic version counter that every successful run bumps. The counter doubles as
both a delta cursor (for GET .../messages?since=N) and a compare-and-swap token (more on
that under Troubleshooting).
Run
One user-initiated turn of conversation. Each run kicks off a workflow on the platform that performs zero or more inference rounds plus tool calls. A run is asynchronous: thePOST
returns immediately with status: "pending", and you poll GET /agents/runs/{id} until it
reaches a terminal state (completed, requires_action, or failed).
A run may override the conversation’s defaults for that single turn via config_override —
pick a different model, raise max_iterations for a hard question, swap in a different tool
catalog, tighten temperature for a deterministic answer, etc. The only thing a run cannot
change is the system_prompt (that’s permanently set on the conversation). Two run-only
settings have no conversation-level counterpart: tool_choice (which biases the model toward
a particular tool on the first iteration) and the run’s payload itself.
Server-side tool (MCP)
A tool the platform calls on your behalf. You declare it undermcp_servers with a URL and a
list of tool descriptors. When the model calls one of these tools, the platform makes the
HTTP request, gets the result, and feeds it back into the next inference round — all inside
the same run.
Client-side tool
A tool you answer. You declare it underclient_tools. When the model calls one of these,
the run terminates with status: "requires_action". You read the model’s question from
pending_tool_calls, decide on an answer, and start a new run with payload.kind: "tool_outputs" carrying your reply.
Tool aliases
Both server-side and client-side tools have analias (1–8 letters/digits, must start with a
letter). The model sees tool names in the form {alias}-{tool_name} — e.g. a search tool
under the docs alias becomes docs-search. The alias is how the platform routes a tool call
back to its definition. It’s also how it distinguishes “this is for the server to handle” from
“this is for the client to handle” without you having to flag each tool individually.
Tool choice
Per-run policy that nudges the model toward (or away from) using tools on the first iteration. Three options:{"kind": "auto"}— model decides. The default.{"kind": "any"}— model must call some tool (no plain-text answer allowed).{"kind": "specific_tool", "name": "user-confirm_booking"}— model must call exactly this tool. Useful for forcing a clarifying question or running an action you know is needed.
{"kind": "auto"}. There is no carry-over to subsequent
runs — every new run picks its own tool_choice afresh. That means you can absolutely
re-impose a forced tool call on a follow-up run (for example, push the model toward a specific
client-side tool every time you need to ask the user something), so long as you set
tool_choice on the run’s request body.
Configuration parameters
When you create a conversation, you pass adefaults object that pins how every run on that
conversation behaves. Each field can be overridden per-run via config_override, except where
noted.
| Field | Required | What it means | Typical value |
|---|---|---|---|
model | yes | Which language model to use. The platform exposes a fixed set of identifiers. | "anthropic.claude-opus-4.6" |
data_plane_id | yes | Which compute environment runs the inference. Each company has at least one. | UUID from your platform admin |
execution_cluster | yes | Which job-executor pool routes the inference job to AWS Bedrock or Snowflake Cortex. Inference itself runs in the external model service, not on a platform cluster — the executor only dispatches the HTTP call. Use "shared" for almost every case; the value only matters if your company runs a dedicated executor pool with isolation requirements. | "shared" |
max_iterations | no (default 3) | How many inference rounds the model is allowed before the platform forces a failed run. Each iteration costs one model call. | 3 for trivial questions, 8–15 for multi-step research |
max_tokens | no (default 2048) | Cap on the model’s reply per iteration. Doesn’t include the prompt. | 1024–4096 |
temperature | no (default 0.0) | How creative the model is allowed to be. 0.0 is deterministic; 1.0 is creative. | 0.0 for factual answers, 0.7 for brainstorming |
output_format_schema | no | A JSON Schema describing the structure of the final answer. The platform extracts the text field from the model’s reply. Supports a subset of Draft 2020-12 — see the note below the table. | See examples below |
mcp_servers | no (default []) | List of MCP servers + their tools that the model may call. | See Example 2 |
client_tools | no (default []) | List of tools that pause the run with requires_action. | See Example 3 |
system_prompt | no | A “stage direction” prepended to every run. Pinned at creation time, cannot be overridden per-run. | "You are a helpful assistant. Answer concisely." |
mcp_servers and client_tools are wholesale-replaced, not merged, when overridden in a
run. If you set them in config_override, you replace the whole list. This keeps tool
namespacing predictable.Run payload shapes
POST /agents/conversations/{id}/runs accepts one of two payload kinds:
client_op_id— a UUID you generate, unique per(conversation, request). Used as an idempotency key — re-sending the sameclient_op_idreturns the original run unchanged. This lets you retry safely across network blips.expected_version— the conversation’s current version, which you got from the most recentGET /agents/conversations/{id}. The platform rejects the run if the version has moved on since you read it (see Version conflicts).tool_choice— optional, per-run only.config_override— optional, sparse — only the fields you want to change.
Run status lifecycle
Every run starts atpending and eventually reaches one of three terminal states:
GET /agents/runs/{id} every few seconds (start with 2–4 seconds; back off if you don’t
care about latency). When you see one of the terminal states you can stop polling.
completed runs populate final_text — the string the model produced for the text field of
your output schema.
requires_action runs populate pending_tool_calls — an array of client-side tool calls
waiting for you to answer. See Example 3 for the resume flow.
failed runs populate error.type (an opaque incident code) and error.message (a
human-readable detail). The response also includes error.title and error.docs_url pointing
at the relevant error catalog page.
One error vocabulary for both surfaces. The agent API exposes errors in two physical shapes
— HTTP 4xx/5xx with an RFC 7807 body for
synchronous failures (bad request body, conversation not found, version conflict, etc.), and
HTTP 200 with
Both shapes link into the same error catalog,
which means a single playbook covers both. Whether your client got a 409 on
status: "failed" and an error object for failures that happen inside the
workflow after the run has been accepted (max iterations exceeded, MCP server unreachable,
invalid effective config, etc.).The two shapes carry the same caller-facing fields:| RFC 7807 (synchronous) | RunErrorDto on a failed run (asynchronous) | Meaning |
|---|---|---|
type (URL) | error.docs_url | Stable URL to the docs page for this failure class |
title | error.title | Short, caller-facing summary |
status (HTTP) | n/a (the call itself returned 200) | — |
detail | error.message | Request-specific detail string |
instance (path) | n/a | — |
log_id | n/a; correlate via the run id | — |
| — | error.type (incident code) | Internal stable tag for log dashboards |
POST .../runs or a
200 with error.type: "AgentLoopMaxIterationsExceeded" on GET /agents/runs/{id}, the docs URL
in the response is the canonical “how do I recover” entry point.The workflow side itself never knows about caller-facing presentation — it only emits opaque
incident codes ("AgentLoopMaxIterationsExceeded", "UnknownTool", etc.). The translation to
title + docs_url happens at the API boundary against a single catalog kept in sync with the
docs pages.Example 1 — hello world (no tools)
The simplest possible run: one user message, one assistant reply, no tools.Create the conversation
Start the run
Poll until terminal
Read the message stream
current_version: 2. Future runs on this conversation start with expected_version: 2.
Example 2 — searching docs via an MCP server
Now the model has tools. It searches the Narrative docs for “rate limiting”, makes a few attempts to locate the right page, and produces a grounded summary.Create the conversation with mcp_servers
alias: "docs"— the model sees these tools asdocs-search_narrative_i_o_knowledge_baseanddocs-query_docs_filesystem_narrative_i_o_knowledge_base.max_iterations: 8— gives the model room to: search, read, refine, then answer.
Start the run
Same shape as Example 1 — just a user message. The model decides what tools to call.What you see while polling
status cycles pending → running → running → … — each running you see corresponds
roughly to one inference iteration. Be patient: MCP roundtrips can take 30–60 seconds total
for multi-step research like this.
Final state:
assistant (with
tool_use blocks) and tool (with tool_result blocks) turns, and a final assistant turn
carrying the answer.
Example 3 — asking the user a question (client-side tool)
Sometimes the model needs information only the caller has. Configure aclient_tools entry,
force the model to use it with tool_choice, and the run will pause at requires_action
until you answer.
Create the conversation with client_tools
Start the run and force the tool call
Poll until requires_action
pending_tool_calls[].tool_use_id is the handle you’ll need to resume.
Resume with tool_outputs
Re-read the conversation to get the new version (it has advanced because the assistant turn is
now persisted):
You can answer with
is_error: true to tell the model the tool failed (e.g. the user
declined). The model decides whether to try another approach, ask a different question, or
fall through to a final answer.Troubleshooting
Version conflicts (409)
You get a 409 fromPOST .../runs with error.type pointing at
/errors/version-conflict.
What happened: between the moment you read version: N and the moment you posted a run
with expected_version: N, something else added messages to the conversation (a previous
run’s finalize, or a concurrent caller). The platform refuses to start a run that would
conflict at finalize time.
How to recover:
completed is actually requires_action — the latest assistant turn might be a tool-call
prompt that needs your reply, not a finished answer.
”Bad request” with a tool-alias message (400)
Thedefaults.mcp_servers[].alias or defaults.client_tools[].alias you sent isn’t in the
allowed shape. Aliases must:
- be 1–8 characters total,
- start with a letter,
- contain only ASCII letters and digits (no underscores, no dashes, no other punctuation),
- be unique across
mcp_serversandclient_toolsfor the conversation.
/errors/invalid-tool-alias.
”Tool wire name too long” (400)
The combined{alias}-{tool_name} exceeds 64 characters (an underlying Bedrock limit).
Shorten the alias or the underlying tool name. See
/errors/tool-name-too-long.
requires_action and you don’t know what to answer
pending_tool_calls[] lists every tool call awaiting your reply, with the tool name and the
arguments the model produced. When you post tool_outputs, every entry in
pending_tool_calls must have a matching outputs[] entry — no missing, no extras. If you
have nothing useful to say for a particular call (e.g. the user dismissed the prompt), still
post an entry with is_error: true and a short reason.
Common mistakes around tool outputs:
- Unknown
tool_use_id— you sent an id the latest assistant turn never produced. Almost always a typo or a stale resume payload. - Not a client tool call — you sent a
tool_use_idwhose alias resolves to a server-side tool. The platform already answered that one; only client-side ids belong intool_outputs. - Incomplete tool outputs — you missed one of the pending ids, or sent extras the model didn’t ask for.
Run failed with AgentLoopMaxIterationsExceeded
The model used up its max_iterations budget without producing a final answer. Either:
- raise
max_iterationsindefaults(orconfig_overrideper run), - tighten the system prompt so the model is steered toward an answer sooner,
- inspect the message stream — repeated identical tool calls suggest the model is stuck in a loop because the tool isn’t giving it useful new information.
/errors/max-iterations-exceeded.
Run failed with AgentLoopSchemaDecodeFailed
The model produced a final answer that didn’t match the output_format_schema. Usually means
the model returned prose instead of a JSON object, or missed the required text field.
Fixes:
- be explicit in the system prompt: “respond with a JSON object matching this schema; no other text.”
- consider widening
max_tokens— sometimes the model is truncated mid-JSON. - inspect the last assistant turn to see what the model actually said.
/errors/schema-decode-failed.
Anything else
Every other failure class has its own page. Start at the error catalog and use the page name that matches theerror.docs_url or type in the response.
Where to go next
- Full error catalog — every failure mode the API can return, with cause and fix.
- Concepts: Model Context Protocol — the open standard
behind
mcp_servers. Helpful if you’re wiring up a tool the platform doesn’t ship by default.

