Seedance 2.0 - Anyfast

Seedance 2.0 is ByteDance’s latest video generation model available through Anyfast API. It supports text-to-video, image-to-video, multimodal reference input, video editing, video extension, and synchronized audio generation.

Key capabilities

Feature	Description
Text-to-video	Generate video from text prompts
Image-to-video (first frame)	Use an image as the first frame
Image-to-video (first + last frame)	Use two images as first and last frames
Multimodal reference	Combine images, videos, and audio as references (0–9 images, up to 3 videos, up to 3 audio clips)
Video editing	Modify elements in an existing video using reference images
Video extension	Extend and concatenate reference videos
Audio generation	Auto-generate synchronized voice, sound effects, and background music
Web search	Enhance generation with real-time internet content (text-to-video only)
Return last frame	Retrieve the last frame of generated video

Output specifications

Property	Value
Resolution	480p, 720p, 1080p, 4k
Aspect ratio	16:9, 4:3, 1:1, 3:4, 9:16, 21:9, adaptive
Duration	4–15 seconds
Format	mp4
Frame rate	24 fps

Workflow

POST /v1/video/generations  →  task_id
Poll GET /v1/video/generations/{task_id}  →  status
When status = "succeeded"  →  download video URL (valid 24 hours)

Asset management workflow

To use persistent image, video, or audio assets (such as a fixed character reference), upload them via the Asset Management API and reference them with asset://<ID> in generation requests.

Step 1: Create an asset group

Create an asset group to get a group ID.

curl https://www.anyfast.ai/volc/asset/CreateAssetGroup \
  -H 'Authorization: Bearer YOUR_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "volc-asset",
    "Name": "your-custom-name"
}'

Response example:

{
    "Id": "group-20260427160000-xxxxx"
}

Step 2: Create an asset in the group

Use the group ID from Step 1 to upload an image asset (e.g., a character face reference photo).

curl https://www.anyfast.ai/volc/asset/CreateAsset \
  -H 'Authorization: Bearer YOUR_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "volc-asset",
    "GroupId": "group-20260427160000-xxxxx",
    "Name": "character-reference",
    "AssetType": "Image",
    "URL": "https://example.com/example.png"
}'

Response example:

{
    "Id": "asset-20260427160000-xxxxx"
}

Step 3: Generate a video using the asset

Reference the asset ID from Step 2 to generate a video.

curl https://www.anyfast.ai/v1/video/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2.0",
    "content": [
      {
        "type": "text",
        "text": "The person from @image1 walks along a sunlit street, cinematic quality"
      },
      {
        "type": "image_url",
        "image_url": {
          "url": "asset://asset-20260427160000-xxxxx"
        },
        "role": "reference_image"
      }
    ],
    "resolution": "720p",
    "duration": 5
  }'

The API returns an asynchronous task ID (prefixed with asyn).

Step 4: Poll for the result

Query the generation status using the task ID.

curl https://www.anyfast.ai/v1/video/generations/asynxxxx \
  -H "Authorization: Bearer YOUR_API_KEY"

Once the task completes, the response includes a pre-signed S3 download link.

Notes

Download links expire after 12 hours; re-fetch after expiry.

If the task reaches 100% but returns an error, the content was likely blocked by the provider’s content moderation system (e.g., celebrity likeness or copyrighted content). Try modifying the prompt or replacing the reference image.

Examples

Text-to-video

curl https://www.anyfast.ai/v1/video/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2.0",
    "content": [
      {
        "type": "text",
        "text": "A cat playing piano in a sunlit room, cinematic lighting"
      }
    ],
    "generate_audio": true,
    "ratio": "16:9",
    "duration": 8
  }'

Multimodal reference (image + video + audio)

Reference assets in the prompt with @image1, @video1, @audio1 — numbered per media type by their order in the content array.

Important: Assets must be passed in strict order: text, image_url, video_url, audio_url. Do not reorder them — doing so may cause errors. When multiple assets are included, do not mix in other asset types.

cURL

curl https://www.anyfast.ai/v1/video/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2.0",
    "content": [
      {
        "type": "text",
        "text": "The character from @image1 dances in the scene from @video1, with @audio1 as background music"
      },
      {
        "type": "image_url",
        "image_url": {"url": "https://example.com/character.jpg"},
        "role": "reference_image"
      },
      {
        "type": "video_url",
        "video_url": {"url": "https://example.com/clip.mp4"},
        "role": "reference_video"
      },
      {
        "type": "audio_url",
        "audio_url": {"url": "https://example.com/bgm.mp3"},
        "role": "reference_audio"
      }
    ],
    "generate_audio": true,
    "ratio": "16:9",
    "duration": 11
  }'

Video editing

cURL

curl https://www.anyfast.ai/v1/video/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2.0",
    "content": [
      {
        "type": "text",
        "text": "Replace the water bottle in @video1 with the perfume bottle from @image1, keep camera movement unchanged"
      },
      {
        "type": "image_url",
        "image_url": {"url": "https://example.com/perfume.jpg"},
        "role": "reference_image"
      },
      {
        "type": "video_url",
        "video_url": {"url": "https://example.com/original.mp4"},
        "role": "reference_video"
      }
    ],
    "generate_audio": true,
    "ratio": "16:9",
    "duration": 5
  }'

Web search enhanced (text-to-video only)

cURL

curl https://www.anyfast.ai/v1/video/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2.0",
    "content": [
      {
        "type": "text",
        "text": "Macro shot of cherry blossoms in spring, petals falling slowly"
      }
    ],
    "generate_audio": true,
    "ratio": "16:9",
    "duration": 11,
    "tools": [{"type": "web_search"}]
  }'

Parameters

Parameter	Type	Required	Description
`model`	string	Yes	`seedance-2.0`
`content`	array	Yes	Input content array (text, image_url, video_url, audio_url items)
`content[].type`	string	Yes	`text`, `image_url`, `video_url`, or `audio_url`
`content[].text`	string	Text items	Text prompt (max 500 Chinese chars / 1000 English words)
`content[].image_url.url`	string	Image items	Image URL, Base64 data URI, or `asset://<ID>`
`content[].video_url.url`	string	Video items	Video URL or `asset://<ID>` (mp4/mov, max 50 MB, 2–15s)
`content[].audio_url.url`	string	Audio items	Audio URL, Base64 data URI, or `asset://<ID>` (wav/mp3, max 15 MB)
`content[].role`	string	Conditional	`first_frame`, `last_frame`, `reference_image`, `reference_video`, `reference_audio`
`generate_audio`	boolean	No	Generate synchronized audio. Default: `true`
`resolution`	string	No	`480p`, `720p`, `1080p`, or `4k`. Default: `720p`. (4k is supported only by Seedance 2.0)
`ratio`	string	No	`16:9`, `4:3`, `1:1`, `3:4`, `9:16`, `21:9`, `adaptive`. Default: `adaptive`
`duration`	integer	No	4–15 seconds. Default: `5`
`tools`	array	No	`[{"type": "web_search"}]` for web search (text-to-video only)
`watermark`	boolean	No	Add watermark. Default: `false`

Input modes

Mode	Content items	`role` values
Text-to-video	1× text	—
Image-to-video (first frame)	text (optional) + 1× image_url	`first_frame` or omit
Image-to-video (first + last frame)	text (optional) + 2× image_url	`first_frame` + `last_frame`
Multimodal reference	text (optional) + image/video/audio	`reference_image`, `reference_video`, `reference_audio`
Video editing	text + image_url + video_url	`reference_image` + `reference_video`
Video extension	text + video_url(s)	`reference_video`

Note: First frame, first+last frame, and multimodal reference are mutually exclusive — do not mix them in the same request.

Referencing assets in prompts

Inside the text prompt, you can refer to media items using @<type><N> placeholders — numbered per media type by their order in the content array:

Placeholder	Refers to
`@image1`, `@image2`, …	The 1st, 2nd, … `image_url` item
`@video1`, `@video2`, …	The 1st, 2nd, … `video_url` item
`@audio1`, `@audio2`, …	The 1st, 2nd, … `audio_url` item

Example: "The character from @image1 walks through the scene in @video1 with @audio1 as background".

Note: Assets must be passed in strict order: text, image_url, video_url, audio_url. Do not reorder them — doing so may cause errors. When multiple assets are included, do not mix in other asset types.

Resolution pixel values

Resolution	16:9	4:3	1:1	3:4	9:16	21:9
480p	864x496	752x560	640x640	560x752	496x864	992x432
720p	1280x720	1112x834	960x960	834x1112	720x1280	1470x630
1080p	1920x1080	1664x1248	1440x1440	1248x1664	1080x1920	2208x944
4k	3840x2160	3326x2494	2880x2880	2494x3326	2160x3840	4398x1886

Note: 4k resolution is supported only by Seedance 2.0 (not Fast, Mini, or Ultra).

API Reference

View the interactive API playground for Seedance 2.0.

​Key capabilities

​Output specifications

​Workflow

​Asset management workflow

​Step 1: Create an asset group

​Step 2: Create an asset in the group

​Step 3: Generate a video using the asset

​Step 4: Poll for the result

​Examples

​Text-to-video

​Multimodal reference (image + video + audio)

​Video editing

​Web search enhanced (text-to-video only)

​Parameters

​Input modes

​Referencing assets in prompts

​Resolution pixel values

API Reference

Key capabilities

Output specifications

Workflow

Asset management workflow

Step 1: Create an asset group

Step 2: Create an asset in the group

Step 3: Generate a video using the asset

Step 4: Poll for the result

Examples

Text-to-video

Multimodal reference (image + video + audio)

Video editing

Web search enhanced (text-to-video only)

Parameters

Input modes

Referencing assets in prompts

Resolution pixel values