What it does
A typical request goes like this:- You create a conversation with a system prompt, a model choice, and a list of tools the model is allowed to use.
- You start a run by sending the model a user message.
- The model decides whether to answer directly or call one of the tools.
- If it picks an MCP-resolved tool, the platform calls it for you and feeds the result back into the next round of reasoning. This can repeat several times.
- If it picks a caller-declared tool, the run pauses and asks you to provide the answer. You start a new run with the answer; the model continues from there.
- Eventually the model produces a final answer matching the structured schema you provided.
The run reaches
completedand you fetch the result.
The shape mirrors the OpenAI Assistants API: a
thread is a conversation, a run is one user-initiated turn, a step is one inference
iteration inside a run. If you already know that mental model, the only new ideas here are
tool aliases and optimistic-concurrency versioning (covered below).
API endpoints
All endpoints live under/agents and require a Bearer token with
agent_conversations read/write permission.
Scope: per-user, not per-company. Conversations and runs are keyed on the bearer
token’s
(company_id, user_id) pair. Peers in the same company cannot see each other’s
conversations, runs, or messages — every endpoint returns 404 for cross-user access,
identical to cross-company access. The API does not distinguish “doesn’t exist” from
“owned by another user”.| Method | Path | Purpose |
|---|---|---|
POST | /agents/conversations | Create an empty conversation, pinning its model + tool catalog |
GET | /agents/conversations?page=N&per_page=K | List the calling user’s conversations (paginated, newest first) |
GET | /agents/conversations/{id} | Read conversation metadata (current version, defaults) |
GET | /agents/conversations/{id}/messages?since=N | Page through messages with sequence_no > N |
POST | /agents/conversations/{id}/runs | Start a new run — either a fresh user message, or tool outputs resuming a paused run |
GET | /agents/runs/{id} | Read a run’s current status, results, and pending tool calls |
POST /agents/conversations) returns the conversation id you need for
all the others. The fourth (POST .../runs) returns the run id you can poll with the fifth.
Listing your conversations
GET /agents/conversations returns the calling user’s conversations in created_at
descending order, wrapped in the standard pagination envelope:
per_page is 50. Results are filtered by both company_id and user_id —
peers in the same company do not see each other’s conversations (see the scope
callout above).
Core concepts in one paragraph each
Conversation
The long-lived container. Thesystem_prompt is pinned for life — once set at creation
time, it applies to every run that follows and cannot be changed. Everything else in defaults
(model, tools, max iterations, temperature, output schema, etc.) is exactly that — a
default: it applies unless a particular run explicitly replaces it. The conversation also
carries a monotonic version counter that every successful run bumps. The counter doubles as
both a delta cursor (for GET .../messages?since=N) and a compare-and-swap token (more on
that under Troubleshooting).
Run
One user-initiated turn of conversation. Each run kicks off a workflow on the platform that performs zero or more inference rounds plus tool calls. A run is asynchronous: thePOST
returns immediately with status: "pending", and you poll GET /agents/runs/{id} until it
reaches a terminal state (completed, requires_action, or failed).
A run may override the conversation’s defaults for that single turn via config_override —
pick a different model, raise max_iterations for a hard question, swap in a different tool
catalog, tighten temperature for a deterministic answer, etc. The only thing a run cannot
change is the system_prompt (that’s permanently set on the conversation). Two run-only
settings have no conversation-level counterpart: tool_choice (which biases the model toward
a particular tool on the first iteration) and the run’s payload itself.
Tools
A run’s tool catalog comes from two sources, listed side by side in the conversationdefaults:
mcp_servers[]— each entry has analias(1–8 letters/digits, must start with a letter), a URL, and an optional description. You do not declare individual tools — the platform discovers each server’s catalog at run start via the JSON-RPCtools/listmethod and stitches the wire names as{alias}-{tool_name}before handing them to the model. When the model emits a call to{alias}-{tool_name}, the workflow makes thetools/callrequest, gets the result, and feeds it back into the next inference round inside the same run.tools[]— caller-declared tools you answer yourself. No alias — the model sees the barename. When the model calls one of these, the run terminates withstatus: "requires_action"; you read the question frompending_tool_calls, decide on an answer, and start a new run withpayload.kind: "tool_outputs"carrying your reply.
Discovery is per-run, not cached. Every run re-fetches
tools/list from each registered
MCP server. There is no inter-run cache — if the server adds, removes, or renames a tool,
the next run picks it up automatically. The trade-off: each new run pays one HTTP roundtrip
per registered server before the first inference. For 1–3 typical servers this is in the
noise (10–100ms total); if you ever register a slow-discovering server, expect it to delay
every run’s first iteration. Discovery failures terminate the run with
MCP Discovery Failed.Narrative-owned MCP servers are authenticated automatically. When an
mcp_servers[].url matches the Data Collaboration MCP Server
(https://mcp.narrative.io/mcp in prod), the platform mints a Default-scoped API token
for the conversation’s user and company and attaches it as
Authorization: Bearer ... on every tools/list and tools/call request to that
server. The token is created server-side, lives only for the duration of the request
chain, and is never persisted to agent_runs.effective_config, Temporal event history,
or the GET /agents/runs/{id} echo. Public or third-party MCP servers (any URL outside
the allowlist) continue to be called without auth — the existing behavior is unchanged.Tool input schemas are normalized at discovery time. MCP servers can declare
inputSchema shapes that Bedrock Converse and Anthropic Messages don’t accept verbatim
— oneOf, $ref, format validators, additionalProperties: true, and similar JSON
Schema features. The platform runs every discovered tool’s schema through a converter
that translates oneOf to anyOf, inlines non-recursive $ref, replaces recursive
$ref with {} to break cycles, forces additionalProperties: false, and lifts
unsupported keywords (format, pattern, range validators) into the property’s
description. Tools that previously had to be dropped wholesale because they carried
$ref now load successfully; nothing is rejected at the discovery layer. The converter
emits strict: false because Bedrock caps strict-tool counts per request — the model
still gets the schema, and each MCP server validates arguments on its own side at call
time.{alias}-
prefix; caller-declared tool names must not contain a dash. This is what lets the workflow
classify a hallucinated or unknown call without ambiguity: dash + unknown alias →
AgentLoopUnknownToolAlias; dash-free + unknown name → same error. The validation rule “no
dash in tools[].name” is the cornerstone.
Tool choice
Per-run policy that nudges the model toward (or away from) using tools on the first iteration. Three options:{"kind": "auto"}— model decides. The default.{"kind": "any"}— model must call some tool (no plain-text answer allowed).{"kind": "specific_tool", "name": "confirm_booking"}— model must call exactly this caller-declared tool. To pin an MCP-resolved tool instead, include the explicitmcp_alias:{"kind": "specific_tool", "mcp_alias": "docs", "name": "search_kb"}. The platform validates the target against the catalog synchronously and returns 400 with Unknown Tool Choice Name or Unknown Tool Choice MCP Alias on a miss.
{"kind": "auto"}. There is no carry-over to subsequent
runs — every new run picks its own tool_choice afresh. That means you can absolutely
re-impose a forced tool call on a follow-up run (for example, push the model toward a specific
caller-declared tool every time you need to ask the user something), so long as you set
tool_choice on the run’s request body.
Configuration parameters
When you create a conversation, you pass adefaults object that pins how every run on that
conversation behaves. Each field can be overridden per-run via config_override, except where
noted.
| Field | Required | What it means | Typical value |
|---|---|---|---|
model | yes | Which language model to use. The platform exposes a fixed set of identifiers. | "anthropic.claude-opus-4.6" |
data_plane_id | yes | Which compute environment runs the inference. Each company has at least one. | UUID from your platform admin |
execution_cluster | yes | Which job-executor pool routes the inference job to AWS Bedrock or Snowflake Cortex. Inference itself runs in the external model service, not on a platform cluster — the executor only dispatches the HTTP call. Use "shared" for almost every case; the value only matters if your company runs a dedicated executor pool with isolation requirements. | "shared" |
max_iterations | no (default 3) | How many inference rounds the model is allowed before the platform forces a failed run. Each iteration costs one model call. | 3 for trivial questions, 8–15 for multi-step research |
max_tokens | no (default 2048) | Cap on the model’s reply per iteration. Doesn’t include the prompt. | 1024–4096 |
temperature | no (default 0.0) | How creative the model is allowed to be. 0.0 is deterministic; 1.0 is creative. | 0.0 for factual answers, 0.7 for brainstorming |
output_format_schema | no | A JSON Schema describing the structure of the final answer. When omitted, the run is in text mode and the answer comes back as final_text. When provided, the run is in structured mode and the answer comes back as final_structured_output, conforming verbatim to your schema. The two response fields are mutually exclusive. Supports a subset of Draft 2020-12 — see the note below the table. | See examples below |
mcp_servers | no (default []) | List of MCP servers the model may call. Each entry is {alias, url, description?}; tools are discovered per-run via tools/list. | See Example 2 |
tools | no (default []) | List of caller-declared tools that pause the run with requires_action. No alias — each entry’s name must be dash-free. | See Example 3 |
system_prompt | no | A “stage direction” prepended to every run. Pinned at creation time, cannot be overridden per-run. | "You are a helpful assistant. Answer concisely." |
mcp_servers and tools are wholesale-replaced, not merged, when overridden in a run.
If you set them in config_override, you replace the whole list. This keeps tool namespacing
predictable.Run payload shapes
POST /agents/conversations/{id}/runs accepts one of two payload kinds:
client_op_id— a UUID you generate, unique per(conversation, request). Used as an idempotency key — re-sending the sameclient_op_idreturns the original run unchanged. This lets you retry safely across network blips.expected_version— the conversation’s current version, which you got from the most recentGET /agents/conversations/{id}. The platform rejects the run if the version has moved on since you read it (see Version conflicts).tool_choice— optional, per-run only.config_override— optional, sparse — only the fields you want to change.
Run status lifecycle
Every run starts atpending and eventually reaches one of three terminal states:
GET /agents/runs/{id} every few seconds (start with 2–4 seconds; back off if you don’t
care about latency). When you see one of the terminal states you can stop polling.
completed runs populate exactly one of two fields, depending on whether you supplied an
output_format_schema:
- No schema (text mode) →
final_textcarries the model’s reply as a plain string;final_structured_outputisnull. - Schema supplied (structured mode) →
final_structured_outputcarries the parsed JSON object conforming verbatim to your schema;final_textisnull. Your schema does not need a top-leveltextfield — whatever shape you declare is what you get back.
requires_action runs populate pending_tool_calls — an array of caller-declared tool
calls waiting for you to answer. See Example 3 for the resume flow.
failed runs populate error.type (an opaque incident code) and error.message (a
human-readable detail). The response also includes error.title and error.docs_url pointing
at the relevant error catalog page.
A run that was deliberately cancelled also lands in failed, but carries the reserved
error.type: "AgentLoopCancelled" so you can tell an intentional stop from a genuine error.
The run’s in-flight inference job is cancelled too, so the run and its data-plane job end up
consistent. See /errors/cancelled.
One error vocabulary for both surfaces. The agent API exposes errors in two physical shapes
— HTTP 4xx/5xx with an RFC 7807 body for
synchronous failures (bad request body, conversation not found, version conflict, etc.), and
HTTP 200 with
Both shapes link into the same error catalog,
which means a single playbook covers both. Whether your client got a 409 on
status: "failed" and an error object for failures that happen inside the
workflow after the run has been accepted (max iterations exceeded, MCP server unreachable,
invalid effective config, etc.).The two shapes carry the same caller-facing fields:| RFC 7807 (synchronous) | RunErrorDto on a failed run (asynchronous) | Meaning |
|---|---|---|
type (URL) | error.docs_url | Stable URL to the docs page for this failure class |
title | error.title | Short, caller-facing summary |
status (HTTP) | n/a (the call itself returned 200) | — |
detail | error.message | Request-specific detail string |
instance (path) | n/a | — |
log_id | n/a; correlate via the run id | — |
| — | error.type (incident code) | Internal stable tag for log dashboards |
POST .../runs or a
200 with error.type: "AgentLoopMaxIterationsExceeded" on GET /agents/runs/{id}, the docs URL
in the response is the canonical “how do I recover” entry point.The workflow side itself never knows about caller-facing presentation — it only emits opaque
incident codes ("AgentLoopMaxIterationsExceeded", "UnknownTool", etc.). The translation to
title + docs_url happens at the API boundary against a single catalog kept in sync with the
docs pages.The live view
Every run response (POST .../runs and GET /agents/runs/{id}) carries a live object. Unlike
the status-dependent fields, live is orthogonal to status and is populated on every
read — it reflects the conversation’s current, still-mutating state, so a run you polled
minutes ago can surface newer values on the next read without a separate
GET /agents/conversations/{id} call.
live holds two fields:
current_name— the conversation’s display name, ornullif it has none yet. This is the same value asnameonGET /agents/conversations/{id}; it’s mirrored onto the run so a client that’s already polling a run sees title changes for free.messages— the run’s produced turns so far, streamed while the run is still in flight so you can show tool execution live. See Streaming tool execution below.
Streaming tool execution
While a run is non-terminal (pending / running), live.messages carries the turns the
loop has produced so far — each tool-use assistant turn appears before the platform executes the
call (so you can render “calling docs-search_kb…”), and the matching tool_result turn appears
once it returns. This lets a chat UI animate a multi-iteration tool loop from the same run poll
you’re already doing, with no extra endpoint.
Each entry has a turn_index (per-run ordering, independent of sequence_no), a role, and the
same content_blocks shape as committed messages.
The handoff to the committed log is lossless:
- Render
committed messages (GET .../messages) ++ live.messageswhile the run is non-terminal. - Once the run reaches a terminal status,
live.messagesis empty — the same turns are now committed and authoritative viaGET .../messages?since=…. Drop your live tail and keep the committed rows. - De-dupe is exact: the live
tool_use/tool_resultblocks carry the sametool_use_ids that land in the committedcontent_blocks.
version or the committed message stream (live turns aren’t messages — they carry no
sequence_no and don’t advance your expected_version). A run with no server-side tool calls (a
direct answer) simply shows an empty live.messages and you get the answer via final_text /
final_structured_output at completion.
Automatic titling
When you start the first run on a conversation that has no name, the platform kicks off a small asynchronous job that generates a short title from your first user message and writes it to the conversation’sname. Because it’s asynchronous and independent of your run, the timing
is best-effort:
current_nameis usuallynullon the first read right after the run is created, then becomes the generated title a beat later — keep reading the run (or the conversation) and it appears.- A
nameyou set explicitly at conversation-creation time is never overwritten — auto titling only fills an empty name. - Titling never touches the conversation
versionor the message stream: it’s not a message, doesn’t advance yourexpected_version, and never shows up inGET .../messages.
live is a growth point. It accumulates state that changes after a run was written, so a
single run poll can stand in for several reads (current_name, messages, more over time). Treat
it as “render whatever keys are present” rather than assuming a fixed set.Example 1 — hello world (no tools)
The simplest possible run: one user message, one assistant reply, no tools.Create the conversation
output_format_schema here — this is text mode. The final answer comes back as
final_text. See Example 4 for the structured-mode flow.
Response (abbreviated):
Start the run
Poll until terminal
Read the message stream
current_version: 2. Future runs on this conversation start with expected_version: 2.
Example 2 — searching docs via an MCP server
Now the model has tools. It searches the Narrative docs for “rate limiting”, makes a few attempts to locate the right page, and produces a grounded summary.Create the conversation with mcp_servers
alias: "docs"— the platform discovers the server’s tools at run start, then prefixes each with the alias before showing them to the model. An MCP-side tool namedsearch_narrative_i_o_knowledge_basebecomesdocs-search_narrative_i_o_knowledge_baseon the wire.- No
tools[]here — discovery is automatic. To inspect what the server exposes, call itstools/listdirectly (e.g.curl https://docs.narrative.io/mcp ...). max_iterations: 8— gives the model room to: search, read, refine, then answer.
Start the run
Same shape as Example 1 — just a user message. The model decides what tools to call.What you see while polling
status cycles pending → running → running → … — each running you see corresponds
roughly to one inference iteration. Be patient: MCP roundtrips can take 30–60 seconds total
for multi-step research like this.
Final state:
assistant (with
tool_use blocks) and tool (with tool_result blocks) turns, and a final assistant turn
carrying the answer.
Example 3 — asking the user a question (caller-declared tool)
Sometimes the model needs information only the caller has. Configure atools[] entry,
force the model to use it with tool_choice, and the run will pause at requires_action
until you answer.
Create the conversation with tools[]
Start the run and force the tool call
Poll until requires_action
pending_tool_calls[].tool_use_id is the handle you’ll need to resume.
Resume with tool_outputs
Re-read the conversation to get the new version (it has advanced because the assistant turn is
now persisted):
You can answer with
is_error: true to tell the model the tool failed (e.g. the user
declined). The model decides whether to try another approach, ask a different question, or
fall through to a final answer.Example 4 — structured output
When you need the model’s answer as a typed object rather than free text, supply anoutput_format_schema describing the shape you want. The model is grammar-constrained to
produce JSON matching it, and the parsed value comes back on final_structured_output.
This example records a login event using a discriminated union (anyOf) of login /
logout variants. Note that the schema has no top-level text field — you declare
whatever shape your application needs.
Create the conversation with a structured schema
Start the run
Final state
final_text is null because the run is in structured mode; the answer is on
final_structured_output, conforming verbatim to your schema. The two fields are mutually
exclusive — text-mode runs (Examples 1–3) populate final_text; structured-mode runs
populate final_structured_output.
Troubleshooting
Version conflicts (409)
You get a 409 fromPOST .../runs with error.type pointing at
/errors/version-conflict.
What happened: between the moment you read version: N and the moment you posted a run
with expected_version: N, something else added messages to the conversation (a previous
run’s finalize, or a concurrent caller). The platform refuses to start a run that would
conflict at finalize time.
How to recover:
completed is actually requires_action — the latest assistant turn might be a tool-call
prompt that needs your reply, not a finished answer.
”Bad request” with a tool-alias message (400)
Thedefaults.mcp_servers[].alias you sent isn’t in the allowed shape. MCP aliases must:
- be 1–8 characters total,
- start with a letter,
- contain only ASCII letters and digits (no underscores, no dashes, no other punctuation),
- be unique across
mcp_serversfor the conversation.
tools[]) have no alias — their name must be non-empty and
dash-free. See
/errors/invalid-caller-tool-name.
Full alias rules and examples: /errors/invalid-tool-alias.
”Tool wire name too long” (400)
The combined{alias}-{tool_name} exceeds 64 characters (an underlying Bedrock limit).
Shorten the alias or the underlying tool name. See
/errors/tool-name-too-long.
requires_action and you don’t know what to answer
pending_tool_calls[] lists every tool call awaiting your reply, with the tool name and the
arguments the model produced. When you post tool_outputs, every entry in
pending_tool_calls must have a matching outputs[] entry — no missing, no extras. If you
have nothing useful to say for a particular call (e.g. the user dismissed the prompt), still
post an entry with is_error: true and a short reason.
Common mistakes around tool outputs:
- Unknown
tool_use_id— you sent an id the latest assistant turn never produced. Almost always a typo or a stale resume payload. - Not a client tool call — you sent a
tool_use_idwhosenamehas the{alias}-{tool}shape (an MCP-resolved call). The platform already answered those; only dash-free names belong intool_outputs. - Incomplete tool outputs — you missed one of the pending ids, or sent extras the model didn’t ask for.
Run failed with AgentLoopMaxIterationsExceeded
The model used up its max_iterations budget without producing a final answer. Either:
- raise
max_iterationsindefaults(orconfig_overrideper run), - tighten the system prompt so the model is steered toward an answer sooner,
- inspect the message stream — repeated identical tool calls suggest the model is stuck in a loop because the tool isn’t giving it useful new information.
/errors/max-iterations-exceeded.
Run failed with AgentLoopSchemaDecodeFailed
In text mode (no output_format_schema) this is almost always a truncation issue: the
model’s reply got cut off before the platform could read it. Raise max_tokens.
In structured mode this means the model produced JSON that didn’t match the shape your
output_format_schema declared, or returned prose instead of JSON. Fixes:
- be explicit in the system prompt: “respond with a JSON object matching this schema; no other text.”
- consider widening
max_tokens— sometimes the model is truncated mid-JSON. - simplify the schema. Bedrock’s structured-output sampler enforces a subset of Draft 2020-12; features outside that subset can silently fall through to free-text generation. See the warning under Configuration fields.
- inspect the last assistant turn to see what the model actually said.
/errors/schema-decode-failed.
Run failed with AgentLoopCancelled
Not an error — the run was deliberately cancelled before it finished. The run lands in failed
with this reserved code (so it’s distinguishable from a genuine failure), and its in-flight
inference job is cancelled too. Any partial work is preserved on the run row and the message
stream. If you didn’t expect the cancellation, find out who issued it (operations tooling, a
worker drain on deploy, a client abort); the conversation is intact, so start a fresh run to
continue.
Details: /errors/cancelled.
Anything else
Every other failure class has its own page. Start at the error catalog and use the page name that matches theerror.docs_url or type in the response.
Where to go next
- Full error catalog — every failure mode the API can return, with cause and fix.
- Concepts: Model Context Protocol — the open standard
behind
mcp_servers. Helpful if you’re wiring up a tool the platform doesn’t ship by default.

