Submit an asynchronous video generation task using xAI’s Grok video models. The API accepts a prompt plus optional duration, aspect ratio, resolution, and reference images — then immediately returns a taskDocumentation Index
Fetch the complete documentation index at: https://docs.xingchaoyiqing.com/llms.txt
Use this file to discover all available pages before exploring further.
id for polling. Grok supports both text-to-video and image-to-video modes, with four distinct model variants offering different duration ranges, resolution caps, and reference image constraints.
Base URL
Endpoint
Authentication
Models
Choose the model that best fits your duration, resolution, and billing needs:| Model | Duration | Aspect Ratios | Resolution | Reference Images | Billing |
|---|---|---|---|---|---|
grok_video3 | 6–30s (default 10s) | 16:9 / 9:16 / 1:1 / 3:2 / 2:3 | 480p / 720p (default 720p) | Up to 7 | Per second |
grok_video3_pro | Fixed 10s | 16:9 (default) | 720p (default) | Supported | Per call |
grok_video3_max | 6 / 10 / 12 / 16 / 20 / 30s (default 10s) | 16:9 / 9:16 / 1:1 | 480p / 720p (default 720p) | Up to 5 (public URL only) | Per second |
grok_video3_stable | 6 or 10s (default 10s) | 16:9 / 9:16 / 3:2 / 2:3 / 1:1 | 480p / 720p (default 720p) | Up to 7 (public URL only) | Per call |
grok_video3_max and grok_video3_stable require reference images to be publicly accessible URLs — base64-encoded images are not supported for these models.Request Parameters
The Grok model to use. See the model table above for capabilities.Supported values:
grok_video3grok_video3_progrok_video3_maxgrok_video3_stable
Text description of the video you want to generate. Include subject, action, camera movement, and visual style for best results.Example:
"A cat dancing in the rain, cinematic style"Output video duration in seconds. Behavior varies by model:
| Model | Supported Values | Default |
|---|---|---|
grok_video3 | Any integer 6–30 | 10 |
grok_video3_pro | Fixed 10 — do not set | 10 |
grok_video3_max | 6, 10, 12, 16, 20, 30 | 10 |
grok_video3_stable | 6 or 10 | 10 |
Output video aspect ratio. Defaults to
16:9 for all models.| Model | Supported Ratios |
|---|---|
grok_video3 | 16:9, 9:16, 1:1, 3:2, 2:3 |
grok_video3_pro | 16:9 |
grok_video3_max | 16:9, 9:16, 1:1 |
grok_video3_stable | 16:9, 9:16, 3:2, 2:3, 1:1 |
Output resolution. Defaults to
720p for all models.| Model | Supported Resolutions |
|---|---|
grok_video3 | 480p, 720p |
grok_video3_pro | 720p |
grok_video3_max | 480p, 720p |
grok_video3_stable | 480p, 720p |
Array of reference image URLs for image-to-video generation. When omitted, the request is treated as text-to-video.Constraints by model:
grok_video3— up to 7 imagesgrok_video3_pro— reference images supportedgrok_video3_max— up to 5 images; must be public URLsgrok_video3_stable— up to 7 images; must be public URLs
["https://example.com/ref1.jpg", "https://example.com/ref2.jpg"]Response Fields
Unique task identifier. Save this — you’ll use it to poll the Query Video Task endpoint.
Object type. Always
"video".Unix timestamp of when the task was created.
The model name used for this task.
Task status at creation. Always
"queued" on successful submission.Error message.
null on successful submission.Code Examples
Example Response
Key Constraints
- Omit
image_urlsfor text-to-video; include it for image-to-video - For
grok_video3_max: only the enumerated duration values are accepted — do not pass arbitrary seconds - For
grok_video3_stable: reference images must be publicly accessible URLs (no base64) - More reference images generally means longer queue and generation time
grok_video3_prohas a fixed 10-second duration — do not set thedurationfield
Next Steps
Use the returnedid to poll the Query Video Task endpoint for status updates and to retrieve video_url when generation completes.