Seedance 2.0 Video Generation — Create Task API Reference

Use this endpoint to submit a video generation task with Seedance 2.0, ByteDance’s latest generation video model. Seedance 2.0 supports a rich set of multi-modal reference inputs — you can guide generation using reference images, reference videos, reference audio clips, or by pinning the first and last frames. Two model tiers are available (standard and Pro), each with a fast variant. The API responds immediately with a task id — you then poll the Query Video Task endpoint until the video is ready.

Video generation is asynchronous. The create endpoint returns a task id with status: "queued". Use the Query Video Task endpoint to check progress and retrieve the final video_url.

Base URL & Endpoint

Base URL:  https://zcbservice.aizfw.cn/kyyReactApiServer
Endpoint:  POST /v1/kyyvideo2/videos
Full URL:  https://zcbservice.aizfw.cn/kyyReactApiServer/v1/kyyvideo2/videos

All requests must include your API key as a Bearer token:

Authorization: Bearer YOUR_API_KEY

Supported Models

Model	Reference Video Required	Speed	Quality
`seedance_2_0`	✅ Required	Standard	Standard
`seedance_2_0_fast`	✅ Required	Fast	Standard
`seedance_2_0_pro`	⭕ Optional	Standard	High
`seedance_2_0_fast_pro`	⭕ Optional	Fast	High

seedance_2_0 and seedance_2_0_fast require at least one reference video (referenceVideos). seedance_2_0_pro and seedance_2_0_fast_pro work without reference videos, making them suitable for pure text-to-video and first/last frame workflows.

Request Parameters

model

string

required

The Seedance 2.0 model to use. Accepted values: seedance_2_0, seedance_2_0_fast, seedance_2_0_pro, seedance_2_0_fast_pro.

prompt

string

required

A text description of the video you want to generate. Be specific about scene, style, motion, and lighting.Example: "A cat dancing in the rain, cinematic style"

duration

integer

Output video duration in seconds. Range: 4–15. Default: 5.Longer durations increase generation time proportionally.

aspect_ratio

string

Output video aspect ratio. Default: 16:9.Accepted values: 16:9, 4:3, 1:1, 3:4, 9:16, 21:9, adaptive.

generateAudio

boolean

Whether to include synthesised audio in the output video. Default: true.

true — output video includes synchronised audio
false — output video is silent

First / Last Frame Mode

Pin the exact start and/or end frames of the generated video. Mutually exclusive with referenceImages, referenceVideos, and referenceAudios.

first_image

string

The first frame of the video. Accepts a public URL or an asset library reference in the format asset://ASSET_ID.

Used alone: image-to-video mode (video starts from this frame)
Used with last_image: first/last frame mode

Image requirements:

Formats: JPEG, PNG, WebP, BMP, TIFF, GIF
Aspect ratio (width ÷ height): 0.4–2.5
Dimensions: 300–6000 px per side

Cannot be combined with referenceImages, referenceVideos, or referenceAudios.

last_image

string

The last frame of the video. Must be used together with first_image. Same format and dimension requirements as first_image.Cannot be combined with referenceImages, referenceVideos, or referenceAudios.

Provide reference media to guide generation style and content. Mutually exclusive with first_image and last_image.

referenceImages

array

A list of reference image URLs or asset library references (asset://ASSET_ID) to guide visual style.

Count: 1–9 images
Formats: JPEG, PNG, WebP, BMP, TIFF, GIF
Aspect ratio: 0.4–2.5
Dimensions: 300–6000 px per side

Cannot be combined with first_image or last_image.

referenceVideos

array

A list of reference video URLs or asset library references (asset://ASSET_ID) to guide motion and content.

Count: max 3 videos
Resolutions: 480p or 720p
Duration per clip: 2–15 seconds
Total combined duration: ≤ 15 seconds

The system automatically selects the optimal backend model based on your reference videos.Cannot be combined with first_image or last_image.

referenceAudios

array

A list of reference audio URLs or asset library references (asset://ASSET_ID) to guide the generated audio track.

Count: max 3 clips
Formats: WAV, MP3
Duration per clip: 2–15 seconds
Total combined duration: ≤ 15 seconds
File size per clip: ≤ 15 MB

Cannot be combined with first_image or last_image.

Input Mode Mutual Exclusivity

The two input guidance modes are mutually exclusive. You cannot use both in the same request:

First/last frame mode: first_image, last_image
Multi-modal reference mode: referenceImages, referenceVideos, referenceAudios

Response Fields

string

Unique identifier for the video generation task. Save this value — you will use it to poll the Query Video Task endpoint.

object

string

Object type. Always "video".

created

integer

Unix timestamp (seconds) of when the task was created.

model

string

The model name you specified in the request.

status

string

Initial task status. On successful creation this is always "queued".

error

string

Error message. Only present when status is "failed".

Code Examples

curl --request POST \
  --url https://zcbservice.aizfw.cn/kyyReactApiServer/v1/kyyvideo2/videos \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "seedance_2_0_pro",
    "prompt": "A cat dancing in the rain, cinematic style",
    "duration": 8,
    "aspect_ratio": "16:9",
    "generateAudio": true
  }'

Example success response:

{
  "id": "video_fd35ee52-2a98-44a6-b930-29a88ce9b8fd",
  "object": "video",
  "created": 1774836724,
  "model": "seedance_2_0_pro",
  "status": "queued",
  "error": null
}

Tips

Use seedance_2_0_pro or seedance_2_0_fast_pro for text-to-video and first/last-frame tasks — these models work without a mandatory reference video, giving you the most flexibility.

If you are using reference images that contain human faces or virtual avatars, you must upload them to the asset library first for content review. Use asset://ASSET_ID references rather than direct URLs in those cases.

Keep your reference videos short (2–5 seconds each) and under the 15-second total cap. The system selects the optimal backend model automatically based on your reference video content.

Documentation Index

​Base URL & Endpoint

​Supported Models

​Request Parameters

​First / Last Frame Mode

​Multi-Modal Reference Mode

​Input Mode Mutual Exclusivity

​Response Fields

​Code Examples

​Tips

Base URL & Endpoint

Supported Models

Request Parameters

First / Last Frame Mode

Multi-Modal Reference Mode

Input Mode Mutual Exclusivity

Response Fields

Code Examples

Tips