Voice Clone Embeddings

curl --request POST \ --url https://api.vachana.ai/api/v1/tts/voice-clone/embeddings \ --header 'Content-Type: multipart/form-data' \ --header 'X-API-Key-ID: <x-api-key-id>' \ --form audio_file='@example-file'

{ "success": true, "message": "Voice embeddings generated successfully", "data": { "voice_clone_embedding": { "embedding": "<string>", "shape": [ 1, 768 ], "dtype": "torch.bfloat16" } } }

Voice Cloning Flow

Voice cloning is a two-step process. Complete Step 1 once per voice, then reuse the embedding across any synthesis endpoint.

Generate a voice embedding — this page

Upload 5–30 seconds of clean reference audio to extract a speaker_embedding. Cache the result — you only need to generate it once per voice.

Synthesize with your cloned voice

Pass the speaker_embedding from Step 1 to your preferred synthesis endpoint:

REST

Full audio returned in a single response

Streaming (SSE)

Receive audio progressively as it’s synthesized

Realtime (WebSocket)

Lowest latency — stream text in, audio out

Overview

Generate a speaker_embedding from a reference audio clip. Upload the file and receive a multi-dimensional embedding you can pass to any Voice Cloned TTS endpoint.

Headers

X-API-Key-ID

string

required

API Key ID for authentication

Body

multipart/form-data

audio_file

file

required

The audio file to generate embeddings for

Response

200 - application/json

Voice embeddings generated successfully

success

boolean

Example:

true

message

string

Example:

"Voice embeddings generated successfully"

data

object

Show child attributes

​Voice Cloning Flow