Skip to main content
POST
/
api
/
v1
/
tts
/
inference
TTS Inference
curl --request POST \
  --url https://api.vachana.ai/api/v1/tts/inference \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key-ID: <x-api-key-id>' \
  --data '
{
  "audio_config": {
    "bitrate": "192k",
    "container": "mp3",
    "encoding": "linear_pcm",
    "num_channels": 1,
    "sample_rate": 44100,
    "sample_width": 2
  },
  "model": "vachana-voice-v3",
  "text": "नमस्ते, आप कैसे हैं?"
}
'
"<string>"

Overview

1

Step 1 — Generate a voice embedding

Upload a reference audio clip to get a speaker_embedding. See Voice Clone Embeddings.
2

Step 2 — Synthesize with your cloned voice — this page

Pass the speaker_embedding from Step 1 to this endpoint to synthesize audio in your cloned voice.
Synthesize audio using your cloned voice. Pass the speaker_embedding obtained from Voice Clone Embeddings along with your text. The full audio is returned in one response. For streaming playback, see Voice Cloning Streaming or Voice Cloning Realtime.

Endpoint

POST https://api.vachana.ai/api/v1/tts/inference

Authentication

HeaderRequiredDescriptionExample
X-API-Key-IDYesYour API key for authenticationyour-api-key-id
Content-TypeYesMust be application/jsonapplication/json

Request Body

text
string
required
The text to synthesize into speech
model
string
required
Voice cloning model to use. Currently supported: vachana-vc-v1
audio_config
object
required
Audio output configuration
speaker_embedding
object
required
Voice clone embedding obtained from the Voice Clone Embeddings endpoint

Response

Returns binary audio data in the format specified by audio_config.container:
  • audio/wav for WAV files
  • audio/mpeg for MP3 files
  • audio/ogg for OGG files

Example Request

curl -X POST https://api.vachana.ai/api/v1/tts/inference \
  -H "X-API-Key-ID: your-api-key-id" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "नमस्ते, आप कैसे हैं?",
    "model": "vachana-vc-v1",
    "audio_config": {
      "sample_rate": 44100,
      "encoding": "linear_pcm",
      "container": "wav"
    },
    "speaker_embedding": {
      "embedding": "your-embedding-string",
      "shape": [1, 768],
      "dtype": "torch.bfloat16"
    }
  }' \
  --output audio.wav

Error Responses

400
Bad Request
Invalid text or audio configuration
{
  "success": false,
  "error": {
    "type": "INVALID_REQUEST_ERROR",
    "message": "Invalid text or audio configuration."
  }
}
429
Too Many Requests
Rate limit exceeded
{
  "success": false,
  "error": {
    "type": "RATE_LIMIT_ERROR",
    "message": "Rate limit exceeded. Please try again later."
  }
}
500
Internal Server Error
Unexpected error occurred
{
  "success": false,
  "error": {
    "type": "API_ERROR",
    "message": "An unexpected error occurred while processing."
  }
}

Headers

X-API-Key-ID
string
required

Body

application/json

Request body for TTS inference.

text
string
required
model
enum<string>
required

Supported TTS models.

Available options:
vachana-voice-v3
audio_config
AudioConfig · object
required

Audio output configuration.

voice
enum<string>

ID of a pre-defined voice. Ignored if speaker_embedding is provided.

Available options:
Karan,
Simran,
Nara,
Riya,
Viraj,
Raju

Response

Successful audio synthesis

The response is of type file.