Official documentation: https://ai.google.dev/gemini-api/docs/text-generation
Use Gemini models with thinking/reasoning capabilities through the standard OpenAI Chat Completions API format.
Overview
This endpoint enables Gemini’s thinking mode through the OpenAI-compatible chat format. By adding the reasoning_effort parameter, you can control how much reasoning the model performs before responding.
Authentication
All requests require a Bearer token in the Authorization header:
Authorization: Bearer YOUR_API_KEY
Request Parameters
The Gemini model ID. For example: gemini-2.5-pro, gemini-2.5-flash.
A list of messages comprising the conversation.
Controls the thinking effort level. Values: low, medium, high.
Sampling temperature between 0 and 2.
Nucleus sampling parameter.
Maximum number of tokens to generate.
Whether to stream responses.
Request Example
curl -X POST https://www.anyfast.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-pro",
"messages": [
{"role": "user", "content": "Hello!"}
],
"temperature": 0.1,
"top_p": 1.0,
"stream": true,
"reasoning_effort": "low"
}'
Response Example
{
"id": "chatcmpl-gemini-think-123",
"object": "chat.completion",
"created": 1677652288,
"model": "gemini-2.5-pro",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 8,
"total_tokens": 18
}
}
Response Fields
Unique identifier for the completion.
Object type, which is chat.completion.
Unix timestamp of when the completion was created.
List of completion choices.
Usage statistics for the request.
Reasoning Effort Levels
| Level | Description |
|---|
low | Minimal thinking, faster responses |
medium | Balanced thinking and speed |
high | Maximum reasoning depth |
Available Models
gemini-2.5-pro
gemini-2.5-flash