Skip to main content
Generate a realistic AI-powered avatar to present or narrate your video content. AI Avatar is ideal for creating professional talking-head videos without needing to be on camera.

How It Works

The AI Avatar tile generates a virtual presenter based on the script and settings you provide. You choose the avatar’s appearance, voice, and delivery style, and Mosaic produces a video of the avatar speaking your script with natural lip sync and gestures. AI Avatar

Input & Settings

Avatar Selection

Choose from a library of AI-generated presenters with different appearances, genders, and styles. Use cases:
  • Corporate training and onboarding videos
  • Product explainers and demos
  • Educational content and tutorials
  • Social media content where you don’t want to be on camera

Script

Provide the text for the avatar to speak. Best practices:
  • Keep sentences concise and conversational
  • Use punctuation for natural pauses
  • Break long scripts into logical sections
  • Write as if speaking to someone directly

Voice

Select a voice style that matches the avatar and your content’s tone. Options vary by avatar, but typically include:
  • Professional — clear and authoritative
  • Friendly — warm and approachable
  • Energetic — upbeat and engaging

Language

Choose the language for the avatar’s speech. Multiple languages are supported for global content creation.

Usage Recommendations

Use AI Avatar to:
  • Create presenter-led videos without filming
  • Produce multilingual content at scale
  • Add a human touch to explainer or tutorial videos
  • Build consistent brand spokesperson content
AI Avatar works great when combined with:
  • AI B-Roll (add visual variety behind the avatar)
  • AI Music (add background score)
  • Captions (add subtitles for accessibility)
  • Reframe (adapt to different platforms)

API Info

  • Node ID: b3b4c9e2-2a47-4fa9-8ce8-0c1fa1d7b6ef

Node params

ParamTypeRequiredDefaultNotes
briefstringYes""High-level intent/context (validated, ~1-1000 chars).
scriptstringConditional""Spoken script. Required unless using Fabric 1 with uploaded voiceover_id.
video_model"kling-2.6-pro" | "kling-3-standard" | "kling-3-pro" | "fabric-1"Yes"kling-2.6-pro"Generation model choice.
single_takebooleanNofalseEnables single-take rendering path and script length changes.
aspect_ratio"9:16" | "16:9" | "auto"No"9:16"Output framing mode.
creation_mode"manual" | "video_reference"No"manual"Avatar generation flow selection.
reference_video_idstring (UUID)ConditionalunsetUsed when creation_mode="video_reference".
reference_change_requeststringNo""Optional instructions when using video-reference mode.
product_image_idstring (UUID)NounsetProduct reference image.
character_image_idstring (UUID)NounsetAvatar/character image override.
voice_reference_idstring (UUID)NounsetVoice reference asset ID for cloning.
voice_reference_type"audio"ConditionalunsetSet to "audio" when voice_reference_id is provided.
voiceover_idstring (UUID)ConditionalunsetOptional voiceover upload for Fabric 1 lip-sync path.
voiceover_type"audio"ConditionalunsetSet to "audio" when voiceover_id is provided.
elevenlabs_model_idstringNounsetExplicit ElevenLabs model override.
elevenlabs_voice_settings{stability?:number,similarity_boost?:number,style?:number,use_speaker_boost?:boolean,speed?:number}NounsetFine-grained TTS tuning object.

Parameter groups

  • Core generation: brief, script, video_model, single_take, aspect_ratio
  • Creation flow: creation_mode, reference_video_id, reference_change_request
  • Visual references: product_image_id, character_image_id
  • Voice references: voice_reference_id, voice_reference_type, voiceover_id, voiceover_type
  • Single-take voice tuning: elevenlabs_model_id, elevenlabs_voice_settings

Example

{
  "brief": "Confident founder-style product launch message",
  "script": "We built this to help teams publish polished videos in minutes.",
  "creation_mode": "manual",
  "aspect_ratio": "9:16",
  "video_model": "kling-2.6-pro",
  "single_take": false
}