> ## Documentation Index
> Fetch the complete documentation index at: https://docs.anyfast.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# GLM-5.2

> Zhipu AI's GLM-5.2 flagship chat model via OpenAI-compatible API. Built for long-horizon tasks with a usable 1M-token context, 128K output, and open-source SOTA coding.

GLM-5.2 is Zhipu AI's flagship model for the long-task era, available through Anyfast via an OpenAI-compatible interface. It pairs a genuinely usable 1M-token context window with open-source SOTA coding performance — strong enough to take a project from requirements all the way to deployable, multi-platform artifacts in a single long-running task.

## Key capabilities

* **OpenAI-compatible** — Works as a drop-in replacement with the OpenAI SDK
* **1M context window** — Solid, lossless 1M-token context that stays stable on long-horizon tasks, not just nominal length
* **128K max output** — Generates up to 128K tokens in a single response
* **Thinking mode** — Chain-of-thought reasoning (forced thinking when enabled on GLM-5.2)
* **Adjustable reasoning effort** — `reasoning_effort` tunes how hard the model thinks
* **Open-source SOTA coding** — Top-ranked open model on long-horizon coding benchmarks, comparable to the strongest closed models
* **Function calling, structured output & MCP** — Robust tool use, JSON output, and MCP tool/data-source integration
* **Streaming** — Real-time token streaming via SSE

## Output specifications

| Property          | Value     |
| ----------------- | --------- |
| Input modality    | Text      |
| Output modality   | Text      |
| Context window    | 1M tokens |
| Max output tokens | 128K      |

## Quick example

<CodeGroup>
  ```bash cURL theme={null}
  curl https://www.anyfast.ai/v1/chat/completions \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "glm-5.2",
      "messages": [
        { "role": "user", "content": "Explain quantum entanglement in simple terms." }
      ]
    }'
  ```

  ```python Python theme={null}
  from openai import OpenAI

  client = OpenAI(
      api_key="YOUR_API_KEY",
      base_url="https://www.anyfast.ai/v1"
  )

  response = client.chat.completions.create(
      model="glm-5.2",
      messages=[
          {"role": "user", "content": "Explain quantum entanglement in simple terms."}
      ]
  )

  print(response.choices[0].message.content)
  ```

  ```python Streaming theme={null}
  from openai import OpenAI

  client = OpenAI(
      api_key="YOUR_API_KEY",
      base_url="https://www.anyfast.ai/v1"
  )

  stream = client.chat.completions.create(
      model="glm-5.2",
      messages=[
          {"role": "user", "content": "Write a short poem about the sea."}
      ],
      stream=True
  )

  for chunk in stream:
      print(chunk.choices[0].delta.content or "", end="")
  ```
</CodeGroup>

## Thinking mode

GLM-5.2 supports a chain-of-thought thinking mode. When `thinking.type` is `enabled` (the default), GLM-5.2 always thinks before answering. Set it to `disabled` to skip reasoning for lightweight tasks.

```python Python theme={null}
response = client.chat.completions.create(
    model="glm-5.2",
    messages=[
        {"role": "user", "content": "Design a REST API for a blogging platform."}
    ],
    extra_body={
        "thinking": {"type": "enabled"},
        "reasoning_effort": "max"
    }
)

print(response.choices[0].message.content)
```

`reasoning_effort` controls how hard the model reasons (effective only when thinking is enabled). GLM-5.2 accepts `max`, `xhigh`, `high`, `medium`, `low`, `minimal`, and `none`; for compatibility, `none`/`minimal` make the model skip thinking, `low`/`medium` map to `high`, and `xhigh` maps to `max`. Default: `max`.

## Parameters

| Parameter          | Type           | Required | Description                                                                                                    |
| ------------------ | -------------- | -------- | -------------------------------------------------------------------------------------------------------------- |
| `model`            | string         | Yes      | Must be `glm-5.2`                                                                                              |
| `messages`         | array          | Yes      | List of `{ role, content }` objects                                                                            |
| `thinking`         | object         | No       | `{ "type": "enabled" \| "disabled" }`. Controls chain-of-thought. Default: `enabled` (forced thinking)         |
| `reasoning_effort` | string         | No       | `max`, `xhigh`, `high`, `medium`, `low`, `minimal`, `none`. Effective when thinking is enabled. Default: `max` |
| `max_tokens`       | integer        | No       | Maximum tokens to generate (up to 131072). Recommended ≥ 1024                                                  |
| `temperature`      | float          | No       | `0`–`1`. Controls randomness. Default: `1`                                                                     |
| `top_p`            | float          | No       | Nucleus sampling threshold. Default: `0.95`                                                                    |
| `stream`           | boolean        | No       | Enable SSE streaming. Default: `false`                                                                         |
| `tools`            | array          | No       | Function/MCP tool definitions for tool use                                                                     |
| `response_format`  | object         | No       | `{ "type": "json_object" }` for structured JSON output                                                         |
| `stop`             | string / array | No       | Sequences that stop generation                                                                                 |

<Card title="API Reference" icon="code" href="/api-reference/model-api/zhipu/glm-5.2">
  View the interactive API playground for GLM-5.2.
</Card>

<script src="/feedback.js" />
