Skip to main content
The Agent Conversations API lets you ask a language model a question, have it call tools to gather more information, and get back a structured answer — all through a few HTTP calls. The model can use MCP-resolved tools (anything exposed by a Model Context Protocol server, like the Narrative docs search) or caller-declared tools (where it asks you a question and waits for your reply). This page is a complete reference: the moving pieces, the parameters, three end-to-end examples, and the most common things that go wrong.

What it does

A typical request goes like this:
  1. You create a conversation with a system prompt, a model choice, and a list of tools the model is allowed to use.
  2. You start a run by sending the model a user message.
  3. The model decides whether to answer directly or call one of the tools.
    • If it picks an MCP-resolved tool, the platform calls it for you and feeds the result back into the next round of reasoning. This can repeat several times.
    • If it picks a caller-declared tool, the run pauses and asks you to provide the answer. You start a new run with the answer; the model continues from there.
  4. Eventually the model produces a final answer matching the structured schema you provided. The run reaches completed and you fetch the result.
The platform keeps the full conversation in a database, so you can resume a paused run later, inspect every message the model saw, and start follow-up runs on the same conversation.
The shape mirrors the OpenAI Assistants API: a thread is a conversation, a run is one user-initiated turn, a step is one inference iteration inside a run. If you already know that mental model, the only new ideas here are tool aliases and optimistic-concurrency versioning (covered below).

API endpoints

All endpoints live under /agents and require a Bearer token with agent_conversations read/write permission.
Scope: per-user, not per-company. Conversations and runs are keyed on the bearer token’s (company_id, user_id) pair. Peers in the same company cannot see each other’s conversations, runs, or messages — every endpoint returns 404 for cross-user access, identical to cross-company access. The API does not distinguish “doesn’t exist” from “owned by another user”.
MethodPathPurpose
POST/agents/conversationsCreate an empty conversation, pinning its model + tool catalog
GET/agents/conversations?page=N&per_page=KList the calling user’s conversations (paginated, newest first)
GET/agents/conversations/{id}Read conversation metadata (current version, defaults)
GET/agents/conversations/{id}/messages?since=NPage through messages with sequence_no > N
POST/agents/conversations/{id}/runsStart a new run — either a fresh user message, or tool outputs resuming a paused run
GET/agents/runs/{id}Read a run’s current status, results, and pending tool calls
The first request (POST /agents/conversations) returns the conversation id you need for all the others. The fourth (POST .../runs) returns the run id you can poll with the fifth.

Listing your conversations

GET /agents/conversations returns the calling user’s conversations in created_at descending order, wrapped in the standard pagination envelope:
curl "$API/agents/conversations?page=1&per_page=20" \
  -H "Authorization: Bearer $TOKEN"
{
  "prev_page": null,
  "current_page": 1,
  "next_page": 2,
  "total_records": 47,
  "total_pages": 3,
  "records": [
    { "id": "...", "name": "rate-limiting q&a", "version": 4, "...": "..." }
  ]
}
Default per_page is 50. Results are filtered by both company_id and user_id — peers in the same company do not see each other’s conversations (see the scope callout above).

Core concepts in one paragraph each

Conversation

The long-lived container. The system_prompt is pinned for life — once set at creation time, it applies to every run that follows and cannot be changed. Everything else in defaults (model, tools, max iterations, temperature, output schema, etc.) is exactly that — a default: it applies unless a particular run explicitly replaces it. The conversation also carries a monotonic version counter that every successful run bumps. The counter doubles as both a delta cursor (for GET .../messages?since=N) and a compare-and-swap token (more on that under Troubleshooting).

Run

One user-initiated turn of conversation. Each run kicks off a workflow on the platform that performs zero or more inference rounds plus tool calls. A run is asynchronous: the POST returns immediately with status: "pending", and you poll GET /agents/runs/{id} until it reaches a terminal state (completed, requires_action, or failed). A run may override the conversation’s defaults for that single turn via config_override — pick a different model, raise max_iterations for a hard question, swap in a different tool catalog, tighten temperature for a deterministic answer, etc. The only thing a run cannot change is the system_prompt (that’s permanently set on the conversation). Two run-only settings have no conversation-level counterpart: tool_choice (which biases the model toward a particular tool on the first iteration) and the run’s payload itself.

Tools

A run’s tool catalog comes from two sources, listed side by side in the conversation defaults:
  • mcp_servers[] — each entry has an alias (1–8 letters/digits, must start with a letter), a URL, and an optional description. You do not declare individual tools — the platform discovers each server’s catalog at run start via the JSON-RPC tools/list method and stitches the wire names as {alias}-{tool_name} before handing them to the model. When the model emits a call to {alias}-{tool_name}, the workflow makes the tools/call request, gets the result, and feeds it back into the next inference round inside the same run.
  • tools[] — caller-declared tools you answer yourself. No alias — the model sees the bare name. When the model calls one of these, the run terminates with status: "requires_action"; you read the question from pending_tool_calls, decide on an answer, and start a new run with payload.kind: "tool_outputs" carrying your reply.
Discovery is per-run, not cached. Every run re-fetches tools/list from each registered MCP server. There is no inter-run cache — if the server adds, removes, or renames a tool, the next run picks it up automatically. The trade-off: each new run pays one HTTP roundtrip per registered server before the first inference. For 1–3 typical servers this is in the noise (10–100ms total); if you ever register a slow-discovering server, expect it to delay every run’s first iteration. Discovery failures terminate the run with MCP Discovery Failed.
Narrative-owned MCP servers are authenticated automatically. When an mcp_servers[].url matches the Data Collaboration MCP Server (https://mcp.narrative.io/mcp in prod), the platform mints a Default-scoped API token for the conversation’s user and company and attaches it as Authorization: Bearer ... on every tools/list and tools/call request to that server. The token is created server-side, lives only for the duration of the request chain, and is never persisted to agent_runs.effective_config, Temporal event history, or the GET /agents/runs/{id} echo. Public or third-party MCP servers (any URL outside the allowlist) continue to be called without auth — the existing behavior is unchanged.
Tool input schemas are normalized at discovery time. MCP servers can declare inputSchema shapes that Bedrock Converse and Anthropic Messages don’t accept verbatim — oneOf, $ref, format validators, additionalProperties: true, and similar JSON Schema features. The platform runs every discovered tool’s schema through a converter that translates oneOf to anyOf, inlines non-recursive $ref, replaces recursive $ref with {} to break cycles, forces additionalProperties: false, and lifts unsupported keywords (format, pattern, range validators) into the property’s description. Tools that previously had to be dropped wholesale because they carried $ref now load successfully; nothing is rejected at the discovery layer. The converter emits strict: false because Bedrock caps strict-tool counts per request — the model still gets the schema, and each MCP server validates arguments on its own side at call time.
The dash is the routing discriminator. MCP wire names always carry the {alias}- prefix; caller-declared tool names must not contain a dash. This is what lets the workflow classify a hallucinated or unknown call without ambiguity: dash + unknown alias → AgentLoopUnknownToolAlias; dash-free + unknown name → same error. The validation rule “no dash in tools[].name” is the cornerstone.

Tool choice

Per-run policy that nudges the model toward (or away from) using tools on the first iteration. Three options:
  • {"kind": "auto"} — model decides. The default.
  • {"kind": "any"} — model must call some tool (no plain-text answer allowed).
  • {"kind": "specific_tool", "name": "confirm_booking"} — model must call exactly this caller-declared tool. To pin an MCP-resolved tool instead, include the explicit mcp_alias: {"kind": "specific_tool", "mcp_alias": "docs", "name": "search_kb"}. The platform validates the target against the catalog synchronously and returns 400 with Unknown Tool Choice Name or Unknown Tool Choice MCP Alias on a miss.
The choice only applies to iteration 1 of the current run. From iteration 2 onward inside that same run the model is back on {"kind": "auto"}. There is no carry-over to subsequent runs — every new run picks its own tool_choice afresh. That means you can absolutely re-impose a forced tool call on a follow-up run (for example, push the model toward a specific caller-declared tool every time you need to ask the user something), so long as you set tool_choice on the run’s request body.

Configuration parameters

When you create a conversation, you pass a defaults object that pins how every run on that conversation behaves. Each field can be overridden per-run via config_override, except where noted.
FieldRequiredWhat it meansTypical value
modelyesWhich language model to use. The platform exposes a fixed set of identifiers."anthropic.claude-opus-4.6"
data_plane_idyesWhich compute environment runs the inference. Each company has at least one.UUID from your platform admin
execution_clusteryesWhich job-executor pool routes the inference job to AWS Bedrock or Snowflake Cortex. Inference itself runs in the external model service, not on a platform cluster — the executor only dispatches the HTTP call. Use "shared" for almost every case; the value only matters if your company runs a dedicated executor pool with isolation requirements."shared"
max_iterationsno (default 3)How many inference rounds the model is allowed before the platform forces a failed run. Each iteration costs one model call.3 for trivial questions, 8–15 for multi-step research
max_tokensno (default 2048)Cap on the model’s reply per iteration. Doesn’t include the prompt.10244096
temperatureno (default 0.0)How creative the model is allowed to be. 0.0 is deterministic; 1.0 is creative.0.0 for factual answers, 0.7 for brainstorming
output_format_schemanoA JSON Schema describing the structure of the final answer. When omitted, the run is in text mode and the answer comes back as final_text. When provided, the run is in structured mode and the answer comes back as final_structured_output, conforming verbatim to your schema. The two response fields are mutually exclusive. Supports a subset of Draft 2020-12 — see the note below the table.See examples below
mcp_serversno (default [])List of MCP servers the model may call. Each entry is {alias, url, description?}; tools are discovered per-run via tools/list.See Example 2
toolsno (default [])List of caller-declared tools that pause the run with requires_action. No alias — each entry’s name must be dash-free.See Example 3
system_promptnoA “stage direction” prepended to every run. Pinned at creation time, cannot be overridden per-run."You are a helpful assistant. Answer concisely."
mcp_servers and tools are wholesale-replaced, not merged, when overridden in a run. If you set them in config_override, you replace the whole list. This keeps tool namespacing predictable.
output_format_schema is a JSON Schema subset. The platform accepts the same subset of Draft 2020-12 that Bedrock structured-output accepts. Common features that do not work: union types (oneOf, anyOf, allOf, not), schema composition ($ref, $defs), conditional shape (if/then/else), pattern regexes on strings, and format validators (date-time, email, etc.). Supported: type, properties, required, enum, additionalProperties: false, minimum/maximum on numbers, minLength/maxLength on strings, minItems/maxItems on arrays. See JSON Schema Reference for the full list with examples.

Run payload shapes

POST /agents/conversations/{id}/runs accepts one of two payload kinds:
// A fresh user message
{
  "payload": { "kind": "user_message", "text": "What is 2 + 2?" }
}

// The user/system answering a tool call from a previously paused run
{
  "payload": {
    "kind": "tool_outputs",
    "outputs": [
      {
        "tool_use_id": "tooluse_xyz",
        "content": "User confirmed the proposed slot.",
        "is_error": false
      }
    ]
  }
}
The full request body also includes:
  • client_op_id — a UUID you generate, unique per (conversation, request). Used as an idempotency key — re-sending the same client_op_id returns the original run unchanged. This lets you retry safely across network blips.
  • expected_version — the conversation’s current version, which you got from the most recent GET /agents/conversations/{id}. The platform rejects the run if the version has moved on since you read it (see Version conflicts).
  • tool_choice — optional, per-run only.
  • config_override — optional, sparse — only the fields you want to change.

Run status lifecycle

Every run starts at pending and eventually reaches one of three terminal states:
pending  →  running  →  completed         ← model produced a final answer
                    ↘   requires_action   ← model called a caller-declared tool
                    ↘   failed            ← platform error, non-recoverable model error,
                                            or a deliberate cancellation
Poll GET /agents/runs/{id} every few seconds (start with 2–4 seconds; back off if you don’t care about latency). When you see one of the terminal states you can stop polling. completed runs populate exactly one of two fields, depending on whether you supplied an output_format_schema:
  • No schema (text mode)final_text carries the model’s reply as a plain string; final_structured_output is null.
  • Schema supplied (structured mode)final_structured_output carries the parsed JSON object conforming verbatim to your schema; final_text is null. Your schema does not need a top-level text field — whatever shape you declare is what you get back.
requires_action runs populate pending_tool_calls — an array of caller-declared tool calls waiting for you to answer. See Example 3 for the resume flow. failed runs populate error.type (an opaque incident code) and error.message (a human-readable detail). The response also includes error.title and error.docs_url pointing at the relevant error catalog page. A run that was deliberately cancelled also lands in failed, but carries the reserved error.type: "AgentLoopCancelled" so you can tell an intentional stop from a genuine error. The run’s in-flight inference job is cancelled too, so the run and its data-plane job end up consistent. See /errors/cancelled.
One error vocabulary for both surfaces. The agent API exposes errors in two physical shapes — HTTP 4xx/5xx with an RFC 7807 body for synchronous failures (bad request body, conversation not found, version conflict, etc.), and HTTP 200 with status: "failed" and an error object for failures that happen inside the workflow after the run has been accepted (max iterations exceeded, MCP server unreachable, invalid effective config, etc.).The two shapes carry the same caller-facing fields:
RFC 7807 (synchronous)RunErrorDto on a failed run (asynchronous)Meaning
type (URL)error.docs_urlStable URL to the docs page for this failure class
titleerror.titleShort, caller-facing summary
status (HTTP)n/a (the call itself returned 200)
detailerror.messageRequest-specific detail string
instance (path)n/a
log_idn/a; correlate via the run id
error.type (incident code)Internal stable tag for log dashboards
Both shapes link into the same error catalog, which means a single playbook covers both. Whether your client got a 409 on POST .../runs or a 200 with error.type: "AgentLoopMaxIterationsExceeded" on GET /agents/runs/{id}, the docs URL in the response is the canonical “how do I recover” entry point.The workflow side itself never knows about caller-facing presentation — it only emits opaque incident codes ("AgentLoopMaxIterationsExceeded", "UnknownTool", etc.). The translation to title + docs_url happens at the API boundary against a single catalog kept in sync with the docs pages.

The live view

Every run response (POST .../runs and GET /agents/runs/{id}) carries a live object. Unlike the status-dependent fields, live is orthogonal to status and is populated on every read — it reflects the conversation’s current, still-mutating state, so a run you polled minutes ago can surface newer values on the next read without a separate GET /agents/conversations/{id} call.
{
  "id": "<run-uuid>",
  "status": "running",
  "live": {
    "current_name": "Tomorrow Afternoon Meeting",
    "messages": [
      { "turn_index": 0, "role": "assistant", "content_blocks": [{ "type": "tool_use", "tool_use_id": "tooluse_abc", "name": "docs-search_kb", "arguments": { "query": "rate limiting" } }] },
      { "turn_index": 1, "role": "tool",      "content_blocks": [{ "type": "tool_result", "tool_use_id": "tooluse_abc", "content": [{ "type": "text", "text": "..." }], "is_error": false }] }
    ]
  }
}
live holds two fields:
  • current_name — the conversation’s display name, or null if it has none yet. This is the same value as name on GET /agents/conversations/{id}; it’s mirrored onto the run so a client that’s already polling a run sees title changes for free.
  • messages — the run’s produced turns so far, streamed while the run is still in flight so you can show tool execution live. See Streaming tool execution below.

Streaming tool execution

While a run is non-terminal (pending / running), live.messages carries the turns the loop has produced so far — each tool-use assistant turn appears before the platform executes the call (so you can render “calling docs-search_kb…”), and the matching tool_result turn appears once it returns. This lets a chat UI animate a multi-iteration tool loop from the same run poll you’re already doing, with no extra endpoint. Each entry has a turn_index (per-run ordering, independent of sequence_no), a role, and the same content_blocks shape as committed messages. The handoff to the committed log is lossless:
  • Render committed messages (GET .../messages) ++ live.messages while the run is non-terminal.
  • Once the run reaches a terminal status, live.messages is empty — the same turns are now committed and authoritative via GET .../messages?since=…. Drop your live tail and keep the committed rows.
  • De-dupe is exact: the live tool_use / tool_result blocks carry the same tool_use_ids that land in the committed content_blocks.
Live streaming is best-effort: it never blocks or fails the run, and it never touches the conversation version or the committed message stream (live turns aren’t messages — they carry no sequence_no and don’t advance your expected_version). A run with no server-side tool calls (a direct answer) simply shows an empty live.messages and you get the answer via final_text / final_structured_output at completion.

Automatic titling

When you start the first run on a conversation that has no name, the platform kicks off a small asynchronous job that generates a short title from your first user message and writes it to the conversation’s name. Because it’s asynchronous and independent of your run, the timing is best-effort:
  • current_name is usually null on the first read right after the run is created, then becomes the generated title a beat later — keep reading the run (or the conversation) and it appears.
  • A name you set explicitly at conversation-creation time is never overwritten — auto titling only fills an empty name.
  • Titling never touches the conversation version or the message stream: it’s not a message, doesn’t advance your expected_version, and never shows up in GET .../messages.
live is a growth point. It accumulates state that changes after a run was written, so a single run poll can stand in for several reads (current_name, messages, more over time). Treat it as “render whatever keys are present” rather than assuming a fixed set.

Example 1 — hello world (no tools)

The simplest possible run: one user message, one assistant reply, no tools.

Create the conversation

curl -X POST "$API/agents/conversations" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "hello world",
    "system_prompt": "You are a helpful assistant. Answer concisely.",
    "defaults": {
      "model": "anthropic.claude-opus-4.6",
      "data_plane_id": "f79cbdae-4848-47ca-95e8-69588364d185",
      "execution_cluster": "shared",
      "max_iterations": 3,
      "max_tokens": 1024,
      "temperature": 0.0
    }
  }'
No output_format_schema here — this is text mode. The final answer comes back as final_text. See Example 4 for the structured-mode flow. Response (abbreviated):
{ "id": "<conv-uuid>", "version": 0, "name": "hello world", ... }

Start the run

curl -X POST "$API/agents/conversations/<conv-uuid>/runs" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "client_op_id": "'"$(uuidgen | tr A-Z a-z)"'",
    "expected_version": 0,
    "payload": { "kind": "user_message", "text": "What is 2 + 2?" }
  }'
Response:
{ "id": "<run-uuid>", "status": "pending", "started_at": "...", ... }

Poll until terminal

curl "$API/agents/runs/<run-uuid>" -H "Authorization: Bearer $TOKEN"
After a few seconds:
{
  "id": "<run-uuid>",
  "status": "completed",
  "iterations_used": 1,
  "usage": { "completion_tokens": 8, "prompt_tokens": 169, "total_tokens": 177 },
  "final_text": "4",
  "final_structured_output": null,
  "error": null,
  ...
}

Read the message stream

curl "$API/agents/conversations/<conv-uuid>/messages?since=0" \
  -H "Authorization: Bearer $TOKEN"
{
  "current_version": 2,
  "messages": [
    { "sequence_no": 1, "role": "user",      "content_blocks": [{"type": "text", "text": "What is 2 + 2?"}], ... },
    { "sequence_no": 2, "role": "assistant", "content_blocks": [{"type": "text", "text": "4"}],              ... }
  ]
}
That’s the whole loop: one user turn (sequence 1) → one assistant turn (sequence 2) → current_version: 2. Future runs on this conversation start with expected_version: 2.

Example 2 — searching docs via an MCP server

Now the model has tools. It searches the Narrative docs for “rate limiting”, makes a few attempts to locate the right page, and produces a grounded summary.

Create the conversation with mcp_servers

curl -X POST "$API/agents/conversations" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "docs search",
    "system_prompt": "You are an AI assistant that answers questions about the Narrative Data Marketplace. Use the available tools to look things up before answering. Once you have enough information, produce a concise final answer that matches the provided output schema.",
    "defaults": {
      "model": "anthropic.claude-opus-4.6",
      "data_plane_id": "f79cbdae-4848-47ca-95e8-69588364d185",
      "execution_cluster": "shared",
      "max_iterations": 8,
      "max_tokens": 2048,
      "temperature": 0.0,
      "mcp_servers": [{
        "alias": "docs",
        "url": "https://docs.narrative.io/mcp",
        "description": "Narrative.io documentation MCP server"
      }]
    }
  }'
Key points:
  • alias: "docs" — the platform discovers the server’s tools at run start, then prefixes each with the alias before showing them to the model. An MCP-side tool named search_narrative_i_o_knowledge_base becomes docs-search_narrative_i_o_knowledge_base on the wire.
  • No tools[] here — discovery is automatic. To inspect what the server exposes, call its tools/list directly (e.g. curl https://docs.narrative.io/mcp ...).
  • max_iterations: 8 — gives the model room to: search, read, refine, then answer.

Start the run

Same shape as Example 1 — just a user message. The model decides what tools to call.
{
  "client_op_id": "...",
  "expected_version": 0,
  "payload": { "kind": "user_message", "text": "Search the Narrative docs for information about rate limiting and summarize what you find in 2-3 sentences." },
  "tool_choice": { "kind": "auto" }
}

What you see while polling

status cycles pendingrunningrunning → … — each running you see corresponds roughly to one inference iteration. Be patient: MCP roundtrips can take 30–60 seconds total for multi-step research like this. Final state:
{
  "status": "completed",
  "iterations_used": 6,
  "usage": { "completion_tokens": 510, "prompt_tokens": 14842, "total_tokens": 15352 },
  "submitted_inference_job_ids": [
    "<job-1>", "<job-2>", "<job-3>", "<job-4>", "<job-5>", "<job-6>"
  ],
  "final_text": "According to the Narrative.io documentation, the API returns a 429 ...",
  ...
}
The message stream now has 12 entries — the user message, plus alternating assistant (with tool_use blocks) and tool (with tool_result blocks) turns, and a final assistant turn carrying the answer.
Prompt tokens grow with each iteration because each round re-sends the full conversation history to the model. Six iterations on a conversation with several search hits can easily reach 15,000 prompt tokens. Tighter system prompts and shorter tool descriptions help.

Example 3 — asking the user a question (caller-declared tool)

Sometimes the model needs information only the caller has. Configure a tools[] entry, force the model to use it with tool_choice, and the run will pause at requires_action until you answer.

Create the conversation with tools[]

curl -X POST "$API/agents/conversations" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "booking flow",
    "system_prompt": "You are a booking assistant. Use the available tool to propose a meeting slot to the user.",
    "defaults": {
      "model": "anthropic.claude-opus-4.6",
      "data_plane_id": "f79cbdae-4848-47ca-95e8-69588364d185",
      "execution_cluster": "shared",
      "max_iterations": 5,
      "tools": [{
        "name": "confirm_booking",
        "description": "Ask the user to confirm a proposed booking slot.",
        "input_schema": {
          "type": "object",
          "additionalProperties": false,
          "required": ["proposed_slot"],
          "properties": { "proposed_slot": { "type": "string" } }
        }
      }]
    }
  }'

Start the run and force the tool call

{
  "client_op_id": "...",
  "expected_version": 0,
  "payload": { "kind": "user_message", "text": "Find me a meeting time for tomorrow afternoon." },
  "tool_choice": { "kind": "specific_tool", "name": "confirm_booking" }
}

Poll until requires_action

{
  "status": "requires_action",
  "iterations_used": 1,
  "pending_tool_calls": [
    {
      "tool_use_id": "tooluse_DWXPKZ50JDGib5GmShyUgJ",
      "name": "confirm_booking",
      "arguments": { "proposed_slot": "tomorrow afternoon" }
    }
  ],
  "final_text": null,
  "final_structured_output": null
}
The run is paused. pending_tool_calls[].tool_use_id is the handle you’ll need to resume.

Resume with tool_outputs

Re-read the conversation to get the new version (it has advanced because the assistant turn is now persisted):
curl "$API/agents/conversations/<conv-uuid>" -H "Authorization: Bearer $TOKEN"
# → { "version": 2, ... }
Now start a follow-up run with the answer:
{
  "client_op_id": "...",
  "expected_version": 2,
  "payload": {
    "kind": "tool_outputs",
    "outputs": [{
      "tool_use_id": "tooluse_DWXPKZ50JDGib5GmShyUgJ",
      "content": "User confirmed the proposed slot.",
      "is_error": false
    }]
  }
}
The model picks up where it left off, sees the tool result, and produces a final answer:
{
  "status": "completed",
  "iterations_used": 1,
  "final_text": "Great news! Your meeting has been confirmed for tomorrow afternoon. ...",
  "final_structured_output": null
}
You can answer with is_error: true to tell the model the tool failed (e.g. the user declined). The model decides whether to try another approach, ask a different question, or fall through to a final answer.

Example 4 — structured output

When you need the model’s answer as a typed object rather than free text, supply an output_format_schema describing the shape you want. The model is grammar-constrained to produce JSON matching it, and the parsed value comes back on final_structured_output. This example records a login event using a discriminated union (anyOf) of login / logout variants. Note that the schema has no top-level text field — you declare whatever shape your application needs.

Create the conversation with a structured schema

curl -X POST "$API/agents/conversations" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "event logger",
    "system_prompt": "You record events as structured data. Respond only via the structured output schema.",
    "defaults": {
      "model": "anthropic.claude-opus-4.6",
      "data_plane_id": "f79cbdae-4848-47ca-95e8-69588364d185",
      "execution_cluster": "shared",
      "max_iterations": 3,
      "max_tokens": 1024,
      "temperature": 0.0,
      "output_format_schema": {
        "type": "object",
        "additionalProperties": false,
        "required": ["event"],
        "properties": {
          "event": {
            "anyOf": [
              {
                "type": "object",
                "title": "Login",
                "additionalProperties": false,
                "required": ["type", "user_id", "session_duration_seconds"],
                "properties": {
                  "type": { "const": "login" },
                  "user_id": { "type": "string" },
                  "session_duration_seconds": { "type": "integer" }
                }
              },
              {
                "type": "object",
                "title": "Logout",
                "additionalProperties": false,
                "required": ["type", "user_id"],
                "properties": {
                  "type": { "const": "logout" },
                  "user_id": { "type": "string" }
                }
              }
            ]
          }
        }
      }
    }
  }'

Start the run

{
  "client_op_id": "...",
  "expected_version": 0,
  "payload": {
    "kind": "user_message",
    "text": "User u-123 logged in and stayed for 1800 seconds. Record the structured event."
  }
}

Final state

{
  "status": "completed",
  "iterations_used": 1,
  "final_text": null,
  "final_structured_output": {
    "event": {
      "type": "login",
      "user_id": "u-123",
      "session_duration_seconds": 1800
    }
  }
}
final_text is null because the run is in structured mode; the answer is on final_structured_output, conforming verbatim to your schema. The two fields are mutually exclusive — text-mode runs (Examples 1–3) populate final_text; structured-mode runs populate final_structured_output.
Your schema can be anything Bedrock’s structured-output sampler accepts — see the supported subset in the warning at the top of Configuration fields. Common shapes that work well: a single typed object, a discriminated union via anyOf with const discriminators (as above), an enum-tagged sum type, an array of typed records.

Troubleshooting

Version conflicts (409)

You get a 409 from POST .../runs with error.type pointing at /errors/version-conflict. What happened: between the moment you read version: N and the moment you posted a run with expected_version: N, something else added messages to the conversation (a previous run’s finalize, or a concurrent caller). The platform refuses to start a run that would conflict at finalize time. How to recover:
# 1. Refetch the conversation to get the new version
curl "$API/agents/conversations/<conv-uuid>" -H "Authorization: Bearer $TOKEN"

# 2. Refetch the messages since your last known version, so you know what changed
curl "$API/agents/conversations/<conv-uuid>/messages?since=<your-old-version>" \
  -H "Authorization: Bearer $TOKEN"

# 3. Re-post the same run body, but with the fresh expected_version
If you keep hitting this without obvious cause, check whether a previous run that you thought was completed is actually requires_action — the latest assistant turn might be a tool-call prompt that needs your reply, not a finished answer.

”Bad request” with a tool-alias message (400)

The defaults.mcp_servers[].alias you sent isn’t in the allowed shape. MCP aliases must:
  • be 1–8 characters total,
  • start with a letter,
  • contain only ASCII letters and digits (no underscores, no dashes, no other punctuation),
  • be unique across mcp_servers for the conversation.
Caller-declared tools (under tools[]) have no alias — their name must be non-empty and dash-free. See /errors/invalid-caller-tool-name. Full alias rules and examples: /errors/invalid-tool-alias.

”Tool wire name too long” (400)

The combined {alias}-{tool_name} exceeds 64 characters (an underlying Bedrock limit). Shorten the alias or the underlying tool name. See /errors/tool-name-too-long.

requires_action and you don’t know what to answer

pending_tool_calls[] lists every tool call awaiting your reply, with the tool name and the arguments the model produced. When you post tool_outputs, every entry in pending_tool_calls must have a matching outputs[] entry — no missing, no extras. If you have nothing useful to say for a particular call (e.g. the user dismissed the prompt), still post an entry with is_error: true and a short reason. Common mistakes around tool outputs:
  • Unknown tool_use_id — you sent an id the latest assistant turn never produced. Almost always a typo or a stale resume payload.
  • Not a client tool call — you sent a tool_use_id whose name has the {alias}-{tool} shape (an MCP-resolved call). The platform already answered those; only dash-free names belong in tool_outputs.
  • Incomplete tool outputs — you missed one of the pending ids, or sent extras the model didn’t ask for.

Run failed with AgentLoopMaxIterationsExceeded

The model used up its max_iterations budget without producing a final answer. Either:
  • raise max_iterations in defaults (or config_override per run),
  • tighten the system prompt so the model is steered toward an answer sooner,
  • inspect the message stream — repeated identical tool calls suggest the model is stuck in a loop because the tool isn’t giving it useful new information.
Details: /errors/max-iterations-exceeded.

Run failed with AgentLoopSchemaDecodeFailed

In text mode (no output_format_schema) this is almost always a truncation issue: the model’s reply got cut off before the platform could read it. Raise max_tokens. In structured mode this means the model produced JSON that didn’t match the shape your output_format_schema declared, or returned prose instead of JSON. Fixes:
  • be explicit in the system prompt: “respond with a JSON object matching this schema; no other text.”
  • consider widening max_tokens — sometimes the model is truncated mid-JSON.
  • simplify the schema. Bedrock’s structured-output sampler enforces a subset of Draft 2020-12; features outside that subset can silently fall through to free-text generation. See the warning under Configuration fields.
  • inspect the last assistant turn to see what the model actually said.
Details: /errors/schema-decode-failed.

Run failed with AgentLoopCancelled

Not an error — the run was deliberately cancelled before it finished. The run lands in failed with this reserved code (so it’s distinguishable from a genuine failure), and its in-flight inference job is cancelled too. Any partial work is preserved on the run row and the message stream. If you didn’t expect the cancellation, find out who issued it (operations tooling, a worker drain on deploy, a client abort); the conversation is intact, so start a fresh run to continue. Details: /errors/cancelled.

Anything else

Every other failure class has its own page. Start at the error catalog and use the page name that matches the error.docs_url or type in the response.

Where to go next