Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mosaic.so/llms.txt

Use this file to discover all available pages before exploring further.

AI Avatar creates a reusable presenter from a short reference clip or image/voice pair, then generates talking-head videos from a script. It is best for founder updates, explainers, UGC ads, training videos, and repeatable spokesperson content.

How It Works

The AI Avatar flow has two parts:
  • Create an avatar: provide a 4-15 second source video, or provide an image plus a 4-15 second voice reference. Mosaic processes this into reusable avatar references.
  • Generate a video: select the avatar profile, write the script, choose the model and take style, then run the agent.
AI Avatar

Avatar Profiles

Avatars live in your workspace library and can be used from the editor or API. Mosaic prepares the reference video, reference image, and voice audio during processing. Recommended source video:
  • 4-15 seconds long
  • Exactly one person on screen
  • Clear face, direct-to-camera framing
  • Natural speech with visible mouth movement from that person
  • Clean single-speaker audio from that person; avoid background speakers, music, heavy noise, or dubbing
  • Minimal cuts, overlays, or visual effects
If you use an image plus voice reference instead of a source video, the image should show the same single person you want to generate, and the voice reference should contain only that person’s clear speech.

Single Take vs Multitake

Single take asks the model for one continuous delivery. Use it for short scripts where a natural uninterrupted performance matters most. Multitake lets Mosaic split longer scripts into sentence-complete chunks, render multiple takes, normalize them, and stitch the final output. Use it for longer scripts, retries, or workflows where reliability matters more than one uninterrupted provider clip. Seedance 2 and Seedance 2 Fast support both modes in Mosaic. In multitake mode, Mosaic still preserves the avatar identity and voice while chunking the script behind the scenes.

Model Options

Seedance 2 and Seedance 2 Fast are the recommended AI Avatar models. They are Mosaic’s most cost-effective and life-like avatar generation options, and they support both single take and multitake generation with avatars from your library.
ModelBest ForNotes
seedance-2-fastRecommended fast avatar generationMost cost-effective option. Fast provider output is upscaled to 1080p when needed.
seedance-2Recommended high-quality avatar generationMost life-like option. Supports 16:9 and 9:16.
avatar-v4Legacy Longcat 1 single-take avatarsRequires a voice reference or avatar profile.
fabric-1Legacy lip-sync style single-take generationCan use an uploaded voiceover.
kling-2.6-proLegacy multitake avatar generationSupports product and character references.
kling-3-standardLegacy multitake avatar generationShown as Kling 3 in the editor.
kling-3-proLegacy multitake avatar generationHigher-cost legacy Kling option.

API Workflow

  1. Create an avatar with POST /avatar-profiles/create.
  2. Create an agent with POST /agent/create, then add an AI Avatar node with POST /agent/{agent_id}/update.
  3. Run the agent with POST /agent/{agent_id}/run and no video inputs.
Avatar processing starts automatically when you create the avatar. You can use the avatar ID in an AI Avatar tile immediately; if the avatar is still processing, the run waits for it to become ready. If the avatar’s processing_status is failed, create a new avatar before using it. Create an avatar:
curl -X POST "https://api.mosaic.so/avatar-profiles/create" \
  -H "Authorization: Bearer mk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Founder Avatar",
    "sources": {
      "video_url": "https://example.com/founder-reference.mp4"
    }
  }'
Create an agent shell:
curl -X POST "https://api.mosaic.so/agent/create" \
  -H "Authorization: Bearer mk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Seedance Avatar Generator",
    "visibility": "private"
  }'
Add an AI Avatar node:
curl -X POST "https://api.mosaic.so/agent/AGENT_ID/update" \
  -H "Authorization: Bearer mk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "operations": [
      {
        "op": "create_node",
        "node_type_id": "b3b4c9e2-2a47-4fa9-8ce8-0c1fa1d7b6ef",
        "params_used": {
          "avatar_profile_id": "AVATAR_PROFILE_ID",
          "video_model": "seedance-2-fast",
          "single_take": false,
          "aspect_ratio": "9:16",
          "script": "Here is the exact script the avatar should say."
        }
      }
    ]
  }'
Run the AI Avatar agent:
curl -X POST "https://api.mosaic.so/agent/AGENT_ID/run" \
  -H "Authorization: Bearer mk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "callback_url": "https://your-webhook.com/mosaic"
  }'
Use video_model: "seedance-2" for the higher-quality Seedance 2 path. Set single_take: true for one continuous delivery, or false for multitake chunking.

API Info

  • Node ID: b3b4c9e2-2a47-4fa9-8ce8-0c1fa1d7b6ef

Node params

ParamTypeRequiredDefaultNotes
briefstringYes""High-level intent/context (validated, ~1-1000 chars).
scriptstringYes""Spoken script.
video_model"seedance-2-fast" | "seedance-2" | "kling-2.6-pro" | "kling-3-standard" | "kling-3-pro" | "fabric-1" | "avatar-v4"No"seedance-2-fast"Generation model choice.
single_takebooleanNotruetrue for one continuous delivery; false for multitake chunking/stitching.
aspect_ratio"9:16" | "16:9" | "auto"No"9:16"Output framing mode.
avatar_profile_idstring (UUID)ConditionalunsetReusable avatar. Required for Seedance 2 models.
creation_mode"manual" | "video_reference"No"manual"Avatar generation flow selection.
reference_video_idstring (UUID)ConditionalunsetUsed when creation_mode="video_reference".
reference_change_requeststringNo""Optional instructions when using video-reference mode.
product_image_idstring (UUID)NounsetProduct reference image.
character_image_idstring (UUID)NounsetAvatar/character image override.
voice_reference_idstring (UUID)NounsetVoice reference asset ID for cloning.
voice_reference_type"audio" | "video"ConditionalunsetRequired when voice_reference_id is provided.
voiceover_idstring (UUID)ConditionalunsetOptional voiceover upload for Fabric 1 lip-sync path.
voiceover_type"audio" | "video"ConditionalunsetRequired when voiceover_id is provided.
elevenlabs_model_idstringNounsetExplicit ElevenLabs model override.
elevenlabs_voice_settings{stability?:number,similarity_boost?:number,style?:number,use_speaker_boost?:boolean,speed?:number}NounsetFine-grained TTS tuning object.
voice_dictionaryArray<{word:string,audio_id:string,enabled:boolean}>No[]Up to three pronunciation references for Seedance 2.

Parameter groups

  • Core generation: brief, script, video_model, single_take, aspect_ratio
  • Avatar library: avatar_profile_id
  • Creation flow: creation_mode, reference_video_id, reference_change_request
  • Visual references: product_image_id, character_image_id
  • Voice references: voice_reference_id, voice_reference_type, voiceover_id, voiceover_type
  • Voice tuning: elevenlabs_model_id, elevenlabs_voice_settings, voice_dictionary

Example

{
  "avatar_profile_id": "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa",
  "script": "We built this to help teams publish polished videos in minutes.",
  "aspect_ratio": "9:16",
  "video_model": "seedance-2-fast",
  "single_take": false
}