Text-to-Speech (Streaming)

TTS Stream

curl --request POST \
  --url https://api.vachana.ai/api/v1/tts/sse \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key-ID: <x-api-key-id>' \
  --data '
{
  "text": "नमस्ते, आप कैसे हैं?",
  "model": "timbre-v2.5",
  "audio_config": {
    "encoding": "linear_pcm",
    "container": "wav",
    "num_channels": 1,
    "sample_rate": 48000,
    "sample_width": 2
  }
}
'

import requests

url = "https://api.vachana.ai/api/v1/tts/sse"

payload = {
    "text": "नमस्ते, आप कैसे हैं?",
    "model": "timbre-v2.5",
    "audio_config": {
        "encoding": "linear_pcm",
        "container": "wav",
        "num_channels": 1,
        "sample_rate": 48000,
        "sample_width": 2
    }
}
headers = {
    "X-API-Key-ID": "<x-api-key-id>",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)

const options = {
  method: 'POST',
  headers: {'X-API-Key-ID': '<x-api-key-id>', 'Content-Type': 'application/json'},
  body: JSON.stringify({
    text: 'नमस्ते, आप कैसे हैं?',
    model: 'timbre-v2.5',
    audio_config: {
      encoding: 'linear_pcm',
      container: 'wav',
      num_channels: 1,
      sample_rate: 48000,
      sample_width: 2
    }
  })
};

fetch('https://api.vachana.ai/api/v1/tts/sse', options)
  .then(res => res.json())
  .then(res => console.log(res))
  .catch(err => console.error(err));

"event: audio_chunk\ndata: UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAEAfAAABAAgAZGF0YQAAAAA=\n\nevent: completed\ndata: {\"status\": \"success\"}"

{
  "success": false,
  "error": {
    "type": "INVALID_REQUEST_ERROR",
    "message": "Invalid text or audio configuration."
  }
}

{
  "success": false,
  "error": {
    "type": "RATE_LIMIT_ERROR",
    "message": "Rate limit exceeded. Please try again later."
  }
}

{
  "success": false,
  "error": {
    "type": "API_ERROR",
    "message": "An unexpected error occurred while processing."
  }
}

{
  "success": false,
  "error": {
    "type": "SERVICE_UNAVAILABLE",
    "message": "Text-to-speech service is temporarily unavailable."
  }
}

POST

api

tts

sse

TTS Stream

curl --request POST \
  --url https://api.vachana.ai/api/v1/tts/sse \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key-ID: <x-api-key-id>' \
  --data '
{
  "text": "नमस्ते, आप कैसे हैं?",
  "model": "timbre-v2.5",
  "audio_config": {
    "encoding": "linear_pcm",
    "container": "wav",
    "num_channels": 1,
    "sample_rate": 48000,
    "sample_width": 2
  }
}
'

import requests

url = "https://api.vachana.ai/api/v1/tts/sse"

payload = {
    "text": "नमस्ते, आप कैसे हैं?",
    "model": "timbre-v2.5",
    "audio_config": {
        "encoding": "linear_pcm",
        "container": "wav",
        "num_channels": 1,
        "sample_rate": 48000,
        "sample_width": 2
    }
}
headers = {
    "X-API-Key-ID": "<x-api-key-id>",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)

const options = {
  method: 'POST',
  headers: {'X-API-Key-ID': '<x-api-key-id>', 'Content-Type': 'application/json'},
  body: JSON.stringify({
    text: 'नमस्ते, आप कैसे हैं?',
    model: 'timbre-v2.5',
    audio_config: {
      encoding: 'linear_pcm',
      container: 'wav',
      num_channels: 1,
      sample_rate: 48000,
      sample_width: 2
    }
  })
};

fetch('https://api.vachana.ai/api/v1/tts/sse', options)
  .then(res => res.json())
  .then(res => console.log(res))
  .catch(err => console.error(err));

"event: audio_chunk\ndata: UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAEAfAAABAAgAZGF0YQAAAAA=\n\nevent: completed\ndata: {\"status\": \"success\"}"

{
  "success": false,
  "error": {
    "type": "INVALID_REQUEST_ERROR",
    "message": "Invalid text or audio configuration."
  }
}

{
  "success": false,
  "error": {
    "type": "RATE_LIMIT_ERROR",
    "message": "Rate limit exceeded. Please try again later."
  }
}

{
  "success": false,
  "error": {
    "type": "API_ERROR",
    "message": "An unexpected error occurred while processing."
  }
}

{
  "success": false,
  "error": {
    "type": "SERVICE_UNAVAILABLE",
    "message": "Text-to-speech service is temporarily unavailable."
  }
}

Currently in beta. You’re on the priority waitlist and among the first to get access.

Overview

Receive audio in chunks as it’s generated, allowing playback to start immediately. Reduces latency compared to TTS REST. For the lowest latency, see TTS Realtime.

Passing numbers, IDs, dates, or currency as raw strings causes mispronunciations. See the Input Formatting Guide for correct formatting of phone numbers, account numbers, PINs, Aadhaar, vehicle registration numbers, GSTIN, currency, and more.

Available Voices

To see the available voices, click here.

Endpoint

POST https://api.vachana.ai/api/v1/tts/sse

Authentication

All requests require the following headers:

Header	Required	Description	Example
`Content-Type`	Yes	Must be `application/json`	`application/json`
`X-API-Key-ID`	Yes	Your API key for authentication	`<your-api-key-id>`

Request Parameters

Send a JSON body with the following structure:

{
  "text": "नमस्ते, आप कैसे हैं?",
  "voice": "Pranav",
  "model": "Timbre v2.0 / Timbre v2.5",
  "audio_config": {
    "sample_rate": 48000,
    "encoding": "linear_pcm",
    "container": "wav"
  }
}

string

required

The text to synthesize into speech. Pass numbers, dates, and currency as spoken words to avoid mispronunciations — see Input Formatting Guide.

string

required

TTS model to use. Currently supported: Timbre v2.0 / Timbre v2.5

string

ID of a pre-defined voice. See Available Voices

integer

default:"48000"

Sample rate in Hz. Supported: 8000, 16000, 22050, 24000, 44100, 48000.

integer

required

Number of audio channels (e.g., 1 for mono, 2 for stereo)

integer

required

Sample width in bytes (e.g., 2 for 16-bit audio)

string

default:"linear_pcm"

Audio encoding. Options: linear_pcm, pcm_s16le, pcm_mulaw, pcm_alaw, oggopus. For telephony, prefer container=mulaw or container=alaw over this field — both produce the same output. Use oggopus (or container=ogg) for a playable OGG Opus file.

string

default:"wav"

Output container. Options: wav, raw, mp3, ogg, mulaw, alaw. Use ogg for OGG Opus. Use mulaw or alaw for G.711 telephony (forces 8000 Hz regardless of sample_rate).

string

default:"128k"

MP3 bitrate. Only used when container is mp3. Supported: 32k, 64k, 96k, 128k, 192k.

Audio Format Reference

`container`	`encoding`	Output	`sample_rate`	`bitrate`	Content-Type
`wav`	`linear_pcm`	WAV file (with header)	8000–48000 Hz	—	`audio/wav`
`raw`	`linear_pcm`	Raw 16-bit PCM	8000–48000 Hz	—	`application/octet-stream`
`mp3`	—	MP3 file	8000–48000 Hz	`32k`–`192k`	`audio/mpeg`
`ogg`	—	OGG Opus file	8000–48000 Hz	—	`audio/ogg`
`mulaw`	—	Raw G.711 µ-law	forced 8000 Hz	—	`audio/basic`
`alaw`	—	Raw G.711 A-law	forced 8000 Hz	—	`audio/alaw`

Encoding aliases — produce identical output to the container rows above:

`container`	`encoding`	Equivalent to
`raw`	`pcm_mulaw`	`container=mulaw`
`raw`	`pcm_alaw`	`container=alaw`
`raw` or `ogg`	`oggopus`	`container=ogg`

bitrate only applies when container=mp3. container=mulaw/alaw override sample_rate to 8000 Hz.

When container=ogg or encoding=oggopus is requested, all audio chunks are encoded into a single OGG Opus file delivered as one chunk — streaming delivery is not possible for OGG.

Response

The server streams audio as Server-Sent Events. Each audio_chunk event carries a base64-encoded audio fragment. A final completed event signals that synthesis is finished.

event: audio_chunk
data: UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAEAfAAABAAgAZGF0YQAAAAA=

event: completed
data: {"status": "success"}

Code Example

curl -X POST https://api.vachana.ai/api/v1/tts/sse \
  -H "Content-Type: application/json" \
  -H "X-API-Key-ID: <your-api-key>" \
  -d '{
    "text": "नमस्ते, आप कैसे हैं?",
    "voice": "Pranav",
    "model": "Timbre v2.0 / Timbre v2.5",
    "audio_config": {
      "sample_rate": 48000,
      "encoding": "linear_pcm",
      "container": "wav"
    }
  }'

const response = await fetch("https://api.vachana.ai/api/v1/tts/sse", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "X-API-Key-ID": "<your-api-key>",
  },
  body: JSON.stringify({
    text: "नमस्ते, आप कैसे हैं?",
    voice: "Pranav",
    model: "Timbre v2.0 / Timbre v2.5",
    audio_config: {
      sample_rate: 48000,
      encoding: "linear_pcm",
      container: "wav",
    },
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  // Handle SSE chunks (audio_chunk / completed events)
  console.log(decoder.decode(value));
}

import requests

response = requests.post(
    "https://api.vachana.ai/api/v1/tts/sse",
    headers={
        "Content-Type": "application/json",
        "X-API-Key-ID": "<your-api-key>",
    },
    json={
        "text": "नमस्ते, आप कैसे हैं?",
        "voice": "Pranav",
        "model": "Timbre v2.0 / Timbre v2.5",
        "audio_config": {
            "sample_rate": 48000,
            "encoding": "linear_pcm",
            "container": "wav",
        },
    },
    stream=True,
)

for line in response.iter_lines():
    if line:
        # Handle SSE chunks (audio_chunk / completed events)
        print(line.decode("utf-8"))

Python SDK

The SDK’s streaming client handles SSE parsing and chunk reassembly for you — you just iterate and write.

Installation

pip install gnani-vachana

Requires Python 3.10+.

Authentication

from gnani.tts import GnaniTTSStreamClient

client = GnaniTTSStreamClient(api_key="your-api-key")

export GNANI_API_KEY="your-api-key"

from gnani.tts import GnaniTTSStreamClient

client = GnaniTTSStreamClient()

Stream Audio to a File

synthesize_stream yields audio chunks as they arrive. Playback or writing can begin before the full response is complete.

from gnani.tts import GnaniTTSStreamClient

client = GnaniTTSStreamClient(api_key="your-api-key")

with open("output.wav", "wb") as f:
    for chunk in client.synthesize_stream(
        "Streaming TTS response in Hindi",
        voice="Kaveri",
    ):
        f.write(chunk)

With Custom Audio Config

from gnani.tts import GnaniTTSStreamClient, AudioConfig

client = GnaniTTSStreamClient(api_key="your-api-key")

with open("output.wav", "wb") as f:
    for chunk in client.synthesize_stream(
        "नमस्ते, आप कैसे हैं?",
        voice="Pranav",
        audio_config=AudioConfig(
            sample_rate=48000,
            encoding="linear_pcm",
            container="wav",
        ),
    ):
        f.write(chunk)

Supported Languages

The Gnani Timbre v2.0 API supports 2 languages.

Language	Native Script	Example
English	Latin	”I am going to the market”
Hindi	Devanagari (हिन्दी)	“मैं बाज़ार जा रहा हूँ”

Headers

X-API-Key-ID

string

required

Body

application/json

Request body for TTS inference.

text

string

required

model

enum<string>

required

Supported TTS models.

Available options:

timbre-v2.5,

timbre-v2.0

audio_config

AudioConfig · object

required

Audio output configuration.

Show child attributes

voice

string

Voice name from the Timbre catalog.

language

string

Language code. Use IND-IN for Timbre v2.0. For Timbre v2.5 use auto, hi-IN, en-IN, ta-IN, te-IN, kn-IN, ml-IN, mr-IN, pa-IN, bn-IN, or gu-IN.

speed

number

Playback speed multiplier (Timbre v2.5 only). Range: 0.85–1.15.

Response

Successful Server-Sent Events stream

The response is of type string.

Gnani APIs

APIs

Use Cases

Text-to-Speech (Streaming)

Overview

Available Voices

Endpoint

Authentication

Request Parameters

Audio Format Reference

Response

Code Example

Python SDK

Installation

Authentication

Stream Audio to a File

With Custom Audio Config

Supported Languages

Headers

Body

Response

​Overview

​Available Voices

​Endpoint

​Authentication

​Request Parameters

​Audio Format Reference

​Response

​Code Example

​Python SDK

​Installation

​Authentication

​Stream Audio to a File

​With Custom Audio Config

​Supported Languages

Headers

Body

Response

Overview

Available Voices

Endpoint

Authentication

Request Parameters

Audio Format Reference

Response

Code Example

Python SDK

Installation

Authentication

Stream Audio to a File

With Custom Audio Config

Supported Languages