session_id obtained from the Identify Face step. Always call Identify Face first.
Workflow overview
Input modes
Text mode — built-in TTS
Providetext, voice_id, and voice_language. The platform synthesizes the audio using the specified voice and drives the lip movements.
Audio mode — custom audio file
Provideaudio_url to drive lip movements directly from an existing audio recording.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
input.session_id | string | Yes | Session ID from the Identify Face step |
input.face_image_url | string | No | Reference face image URL for identity consistency |
input.text | string | Text mode | Text for the character to speak |
input.voice_id | string | Text mode | Voice ID for TTS synthesis. See the Voice ID reference for available voices with audio previews. |
input.voice_language | string | Text mode | Language code: zh or en |
input.audio_url | string | Audio mode | Public URL of an audio file |
Polling
After the task is created, poll withGET /kling/v1/videos/advanced-lip-sync/{task_id} using the Task Query endpoint. Status transitions: queued → processing → succeeded / failed.
On success, the video download URL is available in data.data.task_result.videos[0].url.
Prerequisites: Identify Face
You must call this first to obtain a session_id.
Voice ID Reference
Browse all available voice IDs with audio previews to choose the right voice for your lip sync.
API Reference
View the interactive API playground for Kling Advanced Lip Sync.