GLM-5.2 - Anyfast

GLM-5.2 is Zhipu AI’s flagship model for the long-task era, available through Anyfast via an OpenAI-compatible interface. It pairs a genuinely usable 1M-token context window with open-source SOTA coding performance — strong enough to take a project from requirements all the way to deployable, multi-platform artifacts in a single long-running task.

Key capabilities

OpenAI-compatible — Works as a drop-in replacement with the OpenAI SDK
1M context window — Solid, lossless 1M-token context that stays stable on long-horizon tasks, not just nominal length
128K max output — Generates up to 128K tokens in a single response
Thinking mode — Chain-of-thought reasoning (forced thinking when enabled on GLM-5.2)
Adjustable reasoning effort — reasoning_effort tunes how hard the model thinks
Open-source SOTA coding — Top-ranked open model on long-horizon coding benchmarks, comparable to the strongest closed models
Function calling, structured output & MCP — Robust tool use, JSON output, and MCP tool/data-source integration
Streaming — Real-time token streaming via SSE

Output specifications

Property	Value
Input modality	Text
Output modality	Text
Context window	1M tokens
Max output tokens	128K

Quick example

curl https://www.anyfast.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.2",
    "messages": [
      { "role": "user", "content": "Explain quantum entanglement in simple terms." }
    ]
  }'

Thinking mode

GLM-5.2 supports a chain-of-thought thinking mode. When thinking.type is enabled (the default), GLM-5.2 always thinks before answering. Set it to disabled to skip reasoning for lightweight tasks.

Python

response = client.chat.completions.create(
    model="glm-5.2",
    messages=[
        {"role": "user", "content": "Design a REST API for a blogging platform."}
    ],
    extra_body={
        "thinking": {"type": "enabled"},
        "reasoning_effort": "max"
    }
)

print(response.choices[0].message.content)

reasoning_effort controls how hard the model reasons (effective only when thinking is enabled). GLM-5.2 accepts max, xhigh, high, medium, low, minimal, and none; for compatibility, none/minimal make the model skip thinking, low/medium map to high, and xhigh maps to max. Default: max.

Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Must be `glm-5.2`
`messages`	array	Yes	List of `{ role, content }` objects
`thinking`	object	No	`{ "type": "enabled" \| "disabled" }`. Controls chain-of-thought. Default: `enabled` (forced thinking)
`reasoning_effort`	string	No	`max`, `xhigh`, `high`, `medium`, `low`, `minimal`, `none`. Effective when thinking is enabled. Default: `max`
`max_tokens`	integer	No	Maximum tokens to generate (up to 131072). Recommended ≥ 1024
`temperature`	float	No	`0`–`1`. Controls randomness. Default: `1`
`top_p`	float	No	Nucleus sampling threshold. Default: `0.95`
`stream`	boolean	No	Enable SSE streaming. Default: `false`
`tools`	array	No	Function/MCP tool definitions for tool use
`response_format`	object	No	`{ "type": "json_object" }` for structured JSON output
`stop`	string / array	No	Sequences that stop generation

API Reference

View the interactive API playground for GLM-5.2.

​Key capabilities

​Output specifications

​Quick example

​Thinking mode

​Parameters