1. AI Avatar - Mosaic API Docs

AI Avatar creates a reusable presenter from a short reference clip or image/voice pair, then generates talking-head videos from a script. It is best for founder updates, explainers, UGC ads, training videos, and repeatable spokesperson content.

How It Works

The AI Avatar flow has two parts:

Create an avatar: provide a 4-15 second source video, or provide an image plus a 4-15 second voice reference. Mosaic processes this into reusable avatar references.
Generate a video: select the avatar profile, write the script, choose the model and take style, then run the agent.

Avatar Profiles

Avatars live in your workspace library and can be used from the editor or API. Mosaic prepares the reference video, reference image, and voice audio during processing. Recommended source video:

4-15 seconds long
Exactly one person on screen
Clear face, direct-to-camera framing
Natural speech with visible mouth movement from that person
Clean single-speaker audio from that person; avoid background speakers, music, heavy noise, or dubbing
Minimal cuts, overlays, or visual effects

If you use an image plus voice reference instead of a source video, the image should show the same single person you want to generate, and the voice reference should contain only that person’s clear speech.

Single Take vs Multitake

Single take asks the model for one continuous delivery. Use it for short scripts where a natural uninterrupted performance matters most. Multitake lets Mosaic split longer scripts into sentence-complete chunks, render multiple takes, normalize them, and stitch the final output. Use it for longer scripts, retries, or workflows where reliability matters more than one uninterrupted provider clip. Seedance 2 and Seedance 2 Fast support both modes in Mosaic. In multitake mode, Mosaic still preserves the avatar identity and voice while chunking the script behind the scenes.

Model Options

Seedance 2 and Seedance 2 Fast are the recommended AI Avatar models. They are Mosaic’s most cost-effective and life-like avatar generation options, and they support both single take and multitake generation with avatars from your library.

Model	Best For	Notes
`seedance-2-fast`	Recommended fast avatar generation	Most cost-effective option. Fast provider output is upscaled to 1080p when needed.
`seedance-2`	Recommended high-quality avatar generation	Most life-like option. Supports 16:9 and 9:16.
`avatar-v4`	Legacy Longcat 1 single-take avatars	Requires a voice reference or avatar profile.
`fabric-1`	Legacy lip-sync style single-take generation	Can use an uploaded voiceover.
`kling-2.6-pro`	Legacy multitake avatar generation	Supports product and character references.
`kling-3-standard`	Legacy multitake avatar generation	Shown as Kling 3 in the editor.
`kling-3-pro`	Legacy multitake avatar generation	Higher-cost legacy Kling option.

API Workflow

Create an avatar with POST /avatar-profiles/create.
Create an agent with POST /agent/create, then add an AI Avatar node with POST /agent/{agent_id}/update.
Run the agent with POST /agent/{agent_id}/run and no video inputs.

Avatar processing starts automatically when you create the avatar. You can use the avatar ID in an AI Avatar tile immediately; if the avatar is still processing, the run waits for it to become ready. If the avatar’s processing_status is failed, create a new avatar before using it. Create an avatar:

curl -X POST "https://api.mosaic.so/avatar-profiles/create" \
  -H "Authorization: Bearer mk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Founder Avatar",
    "sources": {
      "video_url": "https://example.com/founder-reference.mp4"
    }
  }'

Create an agent shell:

curl -X POST "https://api.mosaic.so/agent/create" \
  -H "Authorization: Bearer mk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Seedance Avatar Generator",
    "visibility": "private"
  }'

Add an AI Avatar node:

curl -X POST "https://api.mosaic.so/agent/AGENT_ID/update" \
  -H "Authorization: Bearer mk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "operations": [
      {
        "op": "create_node",
        "node_type_id": "b3b4c9e2-2a47-4fa9-8ce8-0c1fa1d7b6ef",
        "params_used": {
          "avatar_profile_id": "AVATAR_PROFILE_ID",
          "video_model": "seedance-2-fast",
          "single_take": false,
          "aspect_ratio": "9:16",
          "script": "Here is the exact script the avatar should say."
        }
      }
    ]
  }'

Run the AI Avatar agent:

curl -X POST "https://api.mosaic.so/agent/AGENT_ID/run" \
  -H "Authorization: Bearer mk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "callback_url": "https://your-webhook.com/mosaic"
  }'

Use video_model: "seedance-2" for the higher-quality Seedance 2 path. Set single_take: true for one continuous delivery, or false for multitake chunking.

API Info

Node Params & API Details

Node ID: b3b4c9e2-2a47-4fa9-8ce8-0c1fa1d7b6ef

Node params

Param	Type	Required	Default	Notes
`brief`	`string`	Yes	`""`	High-level intent/context (validated, ~1-1000 chars).
`script`	`string`	Yes	`""`	Spoken script.
`video_model`	`"seedance-2-fast" \| "seedance-2" \| "kling-2.6-pro" \| "kling-3-standard" \| "kling-3-pro" \| "fabric-1" \| "avatar-v4"`	No	`"seedance-2-fast"`	Generation model choice.
`single_take`	`boolean`	No	`true`	`true` for one continuous delivery; `false` for multitake chunking/stitching.
`aspect_ratio`	`"9:16" \| "16:9" \| "auto"`	No	`"9:16"`	Output framing mode.
`avatar_profile_id`	`string` (UUID)	Conditional	unset	Reusable avatar. Required for Seedance 2 models.
`creation_mode`	`"manual" \| "video_reference"`	No	`"manual"`	Avatar generation flow selection.
`reference_video_id`	`string` (UUID)	Conditional	unset	Used when `creation_mode="video_reference"`.
`reference_change_request`	`string`	No	`""`	Optional instructions when using video-reference mode.
`product_image_id`	`string` (UUID)	No	unset	Product reference image.
`character_image_id`	`string` (UUID)	No	unset	Avatar/character image override.
`voice_reference_id`	`string` (UUID)	No	unset	Voice reference asset ID for cloning.
`voice_reference_type`	`"audio" \| "video"`	Conditional	unset	Required when `voice_reference_id` is provided.
`voiceover_id`	`string` (UUID)	Conditional	unset	Optional voiceover upload for Fabric 1 lip-sync path.
`voiceover_type`	`"audio" \| "video"`	Conditional	unset	Required when `voiceover_id` is provided.
`elevenlabs_model_id`	`string`	No	unset	Explicit ElevenLabs model override.
`elevenlabs_voice_settings`	`{stability?:number,similarity_boost?:number,style?:number,use_speaker_boost?:boolean,speed?:number}`	No	unset	Fine-grained TTS tuning object.
`voice_dictionary`	`Array<{word:string,audio_id:string,enabled:boolean}>`	No	`[]`	Up to three pronunciation references for Seedance 2.

Parameter groups

Core generation: brief, script, video_model, single_take, aspect_ratio
Avatar library: avatar_profile_id
Creation flow: creation_mode, reference_video_id, reference_change_request
Visual references: product_image_id, character_image_id
Voice references: voice_reference_id, voice_reference_type, voiceover_id, voiceover_type
Voice tuning: elevenlabs_model_id, elevenlabs_voice_settings, voice_dictionary

Example

{
  "avatar_profile_id": "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa",
  "script": "We built this to help teams publish polished videos in minutes.",
  "aspect_ratio": "9:16",
  "video_model": "seedance-2-fast",
  "single_take": false
}

​How It Works

​Avatar Profiles

​Single Take vs Multitake

​Model Options

​API Workflow

​API Info

​Node params

​Parameter groups

​Example