Skip to main content
GLM-5.2 is Zhipu AI’s flagship model for the long-task era, available through Anyfast via an OpenAI-compatible interface. It pairs a genuinely usable 1M-token context window with open-source SOTA coding performance — strong enough to take a project from requirements all the way to deployable, multi-platform artifacts in a single long-running task.

Key capabilities

  • OpenAI-compatible — Works as a drop-in replacement with the OpenAI SDK
  • 1M context window — Solid, lossless 1M-token context that stays stable on long-horizon tasks, not just nominal length
  • 128K max output — Generates up to 128K tokens in a single response
  • Thinking mode — Chain-of-thought reasoning (forced thinking when enabled on GLM-5.2)
  • Adjustable reasoning effortreasoning_effort tunes how hard the model thinks
  • Open-source SOTA coding — Top-ranked open model on long-horizon coding benchmarks, comparable to the strongest closed models
  • Function calling, structured output & MCP — Robust tool use, JSON output, and MCP tool/data-source integration
  • Streaming — Real-time token streaming via SSE

Output specifications

PropertyValue
Input modalityText
Output modalityText
Context window1M tokens
Max output tokens128K

Quick example

curl https://www.anyfast.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.2",
    "messages": [
      { "role": "user", "content": "Explain quantum entanglement in simple terms." }
    ]
  }'

Thinking mode

GLM-5.2 supports a chain-of-thought thinking mode. When thinking.type is enabled (the default), GLM-5.2 always thinks before answering. Set it to disabled to skip reasoning for lightweight tasks.
Python
response = client.chat.completions.create(
    model="glm-5.2",
    messages=[
        {"role": "user", "content": "Design a REST API for a blogging platform."}
    ],
    extra_body={
        "thinking": {"type": "enabled"},
        "reasoning_effort": "max"
    }
)

print(response.choices[0].message.content)
reasoning_effort controls how hard the model reasons (effective only when thinking is enabled). GLM-5.2 accepts max, xhigh, high, medium, low, minimal, and none; for compatibility, none/minimal make the model skip thinking, low/medium map to high, and xhigh maps to max. Default: max.

Parameters

ParameterTypeRequiredDescription
modelstringYesMust be glm-5.2
messagesarrayYesList of { role, content } objects
thinkingobjectNo{ "type": "enabled" | "disabled" }. Controls chain-of-thought. Default: enabled (forced thinking)
reasoning_effortstringNomax, xhigh, high, medium, low, minimal, none. Effective when thinking is enabled. Default: max
max_tokensintegerNoMaximum tokens to generate (up to 131072). Recommended ≥ 1024
temperaturefloatNo01. Controls randomness. Default: 1
top_pfloatNoNucleus sampling threshold. Default: 0.95
streambooleanNoEnable SSE streaming. Default: false
toolsarrayNoFunction/MCP tool definitions for tool use
response_formatobjectNo{ "type": "json_object" } for structured JSON output
stopstring / arrayNoSequences that stop generation

API Reference

View the interactive API playground for GLM-5.2.