Voice Cloned TTS (Streaming)

TTS Stream

curl --request POST \
  --url https://api.vachana.ai/api/v1/tts/sse \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key-ID: <x-api-key-id>' \
  --data '
{
  "audio_config": {
    "bitrate": "192k",
    "container": "mp3",
    "encoding": "linear_pcm",
    "num_channels": 1,
    "sample_rate": 44100,
    "sample_width": 2
  },
  "model": "vachana-voice-v3",
  "text": "नमस्ते, आप कैसे हैं?"
}
'

"event: audio_chunk\ndata: UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAEAfAAABAAgAZGF0YQAAAAA=\n\nevent: completed\ndata: {\"status\": \"success\"}\n"

POST

api

tts

sse

TTS Stream

curl --request POST \
  --url https://api.vachana.ai/api/v1/tts/sse \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key-ID: <x-api-key-id>' \
  --data '
{
  "audio_config": {
    "bitrate": "192k",
    "container": "mp3",
    "encoding": "linear_pcm",
    "num_channels": 1,
    "sample_rate": 44100,
    "sample_width": 2
  },
  "model": "vachana-voice-v3",
  "text": "नमस्ते, आप कैसे हैं?"
}
'

"event: audio_chunk\ndata: UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAEAfAAABAAgAZGF0YQAAAAA=\n\nevent: completed\ndata: {\"status\": \"success\"}\n"

Overview

Step 1 — Generate a voice embedding

Upload a reference audio clip to get a speaker_embedding. See Voice Clone Embeddings.

Step 2 — Synthesize with your cloned voice — this page

Pass the speaker_embedding from Step 1 to this endpoint to stream cloned voice audio progressively.

Stream cloned voice audio as it’s generated using Server-Sent Events. Pass the speaker_embedding from Voice Clone Embeddings to use your cloned voice. Reduces latency compared to Voice Cloned TTS REST. For the lowest latency, see Voice Cloned TTS Realtime.

Endpoint

POST https://api.vachana.ai/api/v1/tts/sse

Authentication

Header	Required	Description	Example
`X-API-Key-ID`	Yes	Your API key for authentication	`your-api-key-id`
`Content-Type`	Yes	Must be `application/json`	`application/json`

Request Body

text

string

required

The text to synthesize into speech

model

string

required

Voice cloning model to use. Currently supported: vachana-vc-v1

audio_config

object

required

Audio output configuration

Show properties

sample_rate

integer

Sample rate in Hz (8000-44100)

num_channels

integer

Number of audio channels (1-8)

sample_width

integer

Sample width in bytes (1-4)

encoding

string

Audio encoding format: linear_pcm or oggopus

container

string

Audio container format: raw, mp3, wav, mulaw, or ogg

bitrate

string

MP3 bitrate (only when container=mp3): 96k, 128k, or 192k

speaker_embedding

object

required

Voice clone embedding obtained from the Voice Clone Embeddings endpoint

Show properties

embedding

string

required

The voice clone embedding string

shape

array

required

Shape of the embedding tensor, e.g., [1, 768]

dtype

string

required

Data type of the embedding, e.g., torch.bfloat16

Response

The server streams audio data via Server-Sent Events (SSE). Each event contains a chunk of audio data encoded in base64.

Event Types

audio_chunk

event

Contains base64-encoded audio data

event: audio_chunk
data: UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAEAfAAABAAgAZGF0YQAAAAA=

completed

event

Signals the end of the audio stream

event: completed
data: {"status": "success"}

Example Request

curl -X POST https://api.vachana.ai/api/v1/tts/sse \
  -H "X-API-Key-ID: your-api-key-id" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -N \
  -d '{
    "text": "नमस्ते, आप कैसे हैं?",
    "model": "vachana-vc-v1",
    "audio_config": {
      "sample_rate": 44100,
      "encoding": "linear_pcm",
      "container": "wav"
    },
    "speaker_embedding": {
      "embedding": "your-embedding-string",
      "shape": [1, 768],
      "dtype": "torch.bfloat16"
    }
  }'

Error Responses

400

Bad Request

Invalid text or audio configuration

{
  "success": false,
  "error": {
    "type": "INVALID_REQUEST_ERROR",
    "message": "Invalid text or audio configuration."
  }
}

429

Too Many Requests

Rate limit exceeded

{
  "success": false,
  "error": {
    "type": "RATE_LIMIT_ERROR",
    "message": "Rate limit exceeded. Please try again later."
  }
}

500

Internal Server Error

Unexpected error occurred

{
  "success": false,
  "error": {
    "type": "API_ERROR",
    "message": "An unexpected error occurred while processing."
  }
}

Headers

X-API-Key-ID

string

required

Body

application/json

Request body for TTS inference.

text

string

required

model

enum<string>

required

Supported TTS models.

Available options:

vachana-voice-v3

audio_config

AudioConfig · object

required

Audio output configuration.

Show child attributes

voice

enum<string>

ID of a pre-defined voice. Ignored if speaker_embedding is provided.

Available options:

Karan,

Simran,

Nara,

Riya,

Viraj,

Raju

Response

Successful Server-Sent Events stream

The response is of type string.

Voice Cloned TTS (REST)Voice Cloned TTS (Realtime)

​Overview

​Endpoint

​Authentication

​Request Body

​Response

​Event Types

​Example Request

​Error Responses

Headers

Body

Response

Overview

Endpoint

Authentication

Request Body

Response

Event Types

Example Request

Error Responses