Skip to main content
POST
/
api
/
v1
/
tts
/
voice-clone
/
embeddings
Voice Clone Embeddings
curl --request POST \
  --url https://api.vachana.ai/api/v1/tts/voice-clone/embeddings \
  --header 'Content-Type: multipart/form-data' \
  --header 'X-API-Key-ID: <x-api-key-id>' \
  --form audio_file='@example-file'
{
  "success": true,
  "message": "Voice embeddings generated successfully",
  "data": {
    "voice_clone_embedding": {
      "embedding": "<string>",
      "shape": [
        1,
        768
      ],
      "dtype": "torch.bfloat16"
    }
  }
}

Voice Cloning Flow

Voice cloning is a two-step process. Complete Step 1 once per voice, then reuse the embedding across any synthesis endpoint.
1

Generate a voice embedding — this page

Upload 5–30 seconds of clean reference audio to extract a speaker_embedding. Cache the result — you only need to generate it once per voice.
2

Synthesize with your cloned voice

Pass the speaker_embedding from Step 1 to your preferred synthesis endpoint:

REST

Full audio returned in a single response

Streaming (SSE)

Receive audio progressively as it’s synthesized

Realtime (WebSocket)

Lowest latency — stream text in, audio out

Overview

Generate a speaker_embedding from a reference audio clip. Upload the file and receive a multi-dimensional embedding you can pass to any Voice Cloned TTS endpoint.

Headers

X-API-Key-ID
string
required

API Key ID for authentication

Body

multipart/form-data
audio_file
file
required

The audio file to generate embeddings for

Response

200 - application/json

Voice embeddings generated successfully

success
boolean
Example:

true

message
string
Example:

"Voice embeddings generated successfully"

data
object