Key capabilities
- 1M context window — 1M tokens by default, with 128K max output tokens
- Adaptive thinking — On by default; control depth with
output_config.effort, or usethinking: {"type": "disabled"}to turn it off - Same API shape — Requests, responses, and streaming keep the same shape as Claude Sonnet 4.6
- New tokenizer — The same text produces about 30% more tokens than Claude Sonnet 4.6
- Model ID — Use
claude-sonnet-5
Quick example
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Must be claude-sonnet-5 |
messages | array | Yes | List of { role, content } objects |
max_tokens | integer | Yes | Maximum tokens to generate. Claude Sonnet 5 supports up to 128K output tokens. |
output_config | object | No | Use {"effort":"low" | "medium" | "high" | "xhigh" | "max"} to control adaptive thinking depth. Default: high |
thinking | object | No | Omit it to use default adaptive thinking, or use {"type":"disabled"} to turn thinking off. |
stream | boolean | No | Enable SSE streaming. Default: false |
stop_sequences | array | No | Sequences that stop generation |
Claude Sonnet 5 uses adaptive thinking by default. Use
output_config.effort to tune reasoning depth (low, medium, high, xhigh, or max). Manual extended thinking (thinking: {type: "enabled", budget_tokens: N}) returns a 400 error, and non-default temperature, top_p, or top_k values also return a 400 error. Use thinking: {type: "disabled"} to turn thinking off.Claude Sonnet 5 uses a new tokenizer. The same text produces approximately 30% more tokens than on Claude Sonnet 4.6, so prompt counts and
max_tokens budgets should be recalculated before migrating.Requests involving prohibited or high-risk cybersecurity topics may be refused. Refusals return
stop_reason: "refusal".API Reference
View the interactive API playground for Claude Sonnet 5.