Prerequisites
- SDK installed and configured (see Authentication)
- A data plane ID where inference will run
- An API key with appropriate permissions
Basic inference request
Submit a simple inference request with a system prompt and user message:Tracking job completion
Inference jobs are asynchronous. Poll for completion using the job ID:Configuring inference parameters
Fine-tune the model’s behavior with configuration options:| Parameter | Effect | Typical Values |
|---|---|---|
temperature | Controls randomness. Lower = more deterministic | 0.0-0.3 for factual, 0.7-1.0 for creative |
max_tokens | Limits response length | 100-4000 depending on task |
top_p | Nucleus sampling threshold | 0.9-1.0 for most cases |
stop_sequences | Strings that end generation | ["\n\n", "END"] |
Multi-turn conversations
Include previous messages for context-aware responses:Choosing a model
Select the model based on your task requirements:| Model | Best For |
|---|---|
anthropic.claude-haiku-4.5 | Fast, simple tasks (classification, extraction) |
anthropic.claude-sonnet-4.5 | Balanced tasks (summarization, analysis) |
anthropic.claude-opus-4.5 | Complex reasoning (multi-step analysis, nuanced decisions) |
openai.gpt-4.1 | Advanced reasoning tasks |
openai.o4-mini | Fast, cost-effective tasks |
Typing responses
Use TypeScript generics to type the structured output:Error handling
Handle common inference errors:Best practices
| Practice | Description |
|---|---|
| Use specific schemas | Define precise JSON Schema to get consistent outputs |
| Choose appropriate models | Use smaller models for simple tasks to save cost and time |
| Set reasonable max_tokens | Avoid unnecessarily large values that increase latency |
| Include system prompts | Guide model behavior with clear instructions |
| Handle failures gracefully | Implement retries for transient errors |
Related content
Structured Output Guide
Deep dive into JSON Schema for inference
Choosing Models
Select the right model for your task
Model Inference API
Complete API reference
Tracking Jobs
Monitor job status and handle completion

