6. Audio Generation - Mosaic API Docs

Audio Generation creates new audio assets. Use voiceover mode for spoken narration and music mode for background tracks.

How It Works

For voiceover, provide a script and optionally attach a voice reference. For music, describe the mood, genre, instrumentation, and target length. The output can feed video-generation, AI Avatar, captions, or publishing workflows.

API Usage

Create an agent shell:

curl -X POST "https://api.mosaic.so/agent/create" \
  -H "Authorization: Bearer mk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Launch voiceover",
    "visibility": "private"
  }'

Add an Audio Generation node:

curl -X POST "https://api.mosaic.so/agent/AGENT_ID/update" \
  -H "Authorization: Bearer mk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "operations": [
      {
        "op": "create_node",
        "node_type_id": "14687f30-5fd0-468f-8239-2784d83df95b",
        "params_used": {
          "mode": "voiceover",
          "model": "eleven_v3",
          "script": "Introducing the fastest way to turn ideas into publish-ready videos.",
          "use_upstream_voice_reference": false
        }
      }
    ]
  }'

Run the agent with no video inputs:

curl -X POST "https://api.mosaic.so/agent/AGENT_ID/run" \
  -H "Authorization: Bearer mk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "callback_url": "https://your-webhook.com/mosaic"
  }'

API Info

Node Params & API Details

Node ID: 14687f30-5fd0-468f-8239-2784d83df95b

Node params

Param	Type	Required	Default	Notes
`mode`	`"voiceover" \| "music"`	No	`"voiceover"`	Audio generation mode.
`model`	`"eleven_v3" \| "music"`	No	`"eleven_v3"`	Audio model for the selected mode.
`script`	`string`	Conditional	`""`	Required for voiceover mode.
`voice_id`	`string`	No	curated default	Voice ID when no upstream voice reference is used.
`use_upstream_voice_reference`	`boolean`	No	`true`	Use selected upstream audio/video as a voice reference.
`selected_context`	`{audio:string[],videos:string[],voice_reference?:string}`	No	empty lists	Attached references.
`music_prompt`	`string`	Conditional	`""`	Required for music mode.
`music_length_ms`	`number`	No	`30000`	Music duration, 5-300 seconds.

Voiceover example

{
  "mode": "voiceover",
  "model": "eleven_v3",
  "script": "This is the voiceover text.",
  "use_upstream_voice_reference": false
}

Music example

{
  "mode": "music",
  "model": "music",
  "music_prompt": "Warm upbeat electronic background music for a product launch",
  "music_length_ms": 45000
}

5. Video Generation 7. Image Generation

⌘I

​How It Works

​API Usage

​API Info

​Node params

​Voiceover example

​Music example