Key capabilities
- OpenAI-compatible — Works as a drop-in replacement with the OpenAI SDK
- 256K context — 262,144 tokens for large documents and multi-turn conversations
- Multimodal input — Accepts text, image, and video content
- Thinking mode — Toggle via the
thinkingparameter; returnsreasoning_contentand supports Preserved Thinking - Long-horizon coding — More reliable across languages (Rust, Go, Python) and tasks (front-end, ops, performance)
- Rich features — Tool Calls (function calling), JSON Mode, Partial Mode, web search, and automatic context caching
Quick example
Note:image_urlandvideo_urlaccept two formats: a base64 data URI (data:image/png;base64,.../data:video/mp4;base64,...) or a file reference (ms://<file_id>). Tokens served from the context cache are reported inusage.prompt_tokens_details.cached_tokens.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Must be kimi-k2.6 |
messages | array | Yes | List of { role, content } objects. content may be a string or a multimodal array of text/image_url/video_url |
thinking | object | No | Controls thinking mode, e.g. {"type": "enabled"} (default) or {"type": "disabled"}; keep: "all" enables Preserved Thinking |
max_completion_tokens | integer | No | Maximum tokens to generate. (max_tokens is deprecated and not honored) |
temperature | float | No | 0–2. Controls randomness. Default: 1 |
stream | boolean | No | Enable SSE streaming. Default: false |
top_p | float | No | Nucleus sampling threshold. Default: 1 |
response_format | object | No | Set to {"type": "json_object"} to enable JSON Mode |
tools | array | No | A list of tools the model may call (function calling) |
stop | string / array | No | Sequences that stop generation (up to 5, max 32 bytes each) |
API Reference
View the interactive API playground for Kimi-K2.6.