Quick Start - Inya Docs

Prerequisites

Get your API key

Install the SDK

pip install gnani-vachana

Requires Python 3.9+.

Set environment variables

export GNANI_API_KEY="your-api-key"
export GNANI_ORGANIZATION_ID="your-org-id"   # required for STT
export GNANI_USER_ID="your-user-id"           # required for STT

Speech-to-Text
Text-to-Speech
Voice Cloning

REST
Realtime (WebSocket)

Supported formats: WAV, MP3, OGG, FLAC, AAC, M4A · Sample rate: 8 – 44.1 kHz · Max duration: 60 s

from gnani.stt import GnaniSTTClient

# reads GNANI_API_KEY, GNANI_ORGANIZATION_ID, GNANI_USER_ID from env
client = GnaniSTTClient()

result = client.transcribe("audio.wav", language_code="hi-IN")
print(result["transcript"])

Replace:

audio.wav — path to your audio file
hi-IN — see Language Codes

Response:

{
  "success": true,
  "transcript": "नमस्ते, आप कैसे हैं?"
}

Stream raw PCM audio and receive transcriptions as they arrive.

Format: PCM 16-bit · Sample rate: 8 or 16 kHz · Channels: Mono

import asyncio
from gnani.stt import GnaniSTTStreamClient

async def main():
    # reads GNANI_API_KEY from env
    async with GnaniSTTStreamClient(
        language_code="hi-IN",
        sample_rate=16000,
    ) as stream:
        with open("audio.pcm", "rb") as f:
            await stream.stream_audio(
                f,
                on_transcript=lambda t: print(f"Transcript: {t.text}"),
                realtime_pace=True,
            )

asyncio.run(main())

Response per segment:

{
  "type": "transcript",
  "text": "Hello, how are you today?",
  "segment_id": "seg_abc123",
  "latency": 320
}

REST
Streaming (SSE)
Realtime (WebSocket)

Synthesize speech and get back the full audio in one call.

from gnani.tts import GnaniTTSClient

client = GnaniTTSClient()  # reads GNANI_API_KEY from env

audio = client.synthesize("नमस्ते, आप कैसे हैं?", voice="sia")

with open("output.wav", "wb") as f:
    f.write(audio)

Receive audio progressively as it’s synthesized — lower latency for longer texts.

from gnani.tts import GnaniTTSStreamClient

client = GnaniTTSStreamClient()  # reads GNANI_API_KEY from env

with open("output.wav", "wb") as f:
    for chunk in client.synthesize_stream("नमस्ते, आप कैसे हैं?", voice="sia"):
        f.write(chunk)

Lowest latency — stream text in, receive PCM audio chunks in real time.

import asyncio
from gnani.tts import GnaniTTSRealtimeClient

async def main():
    async with GnaniTTSRealtimeClient() as client:  # reads GNANI_API_KEY from env
        with open("output.wav", "wb") as f:
            async for chunk in client.synthesize("नमस्ते, आप कैसे हैं?", voice="sia"):
                f.write(chunk)

asyncio.run(main())

Voice cloning is a two-step process: extract a voice embedding from a reference clip, then pass that embedding to any synthesis endpoint.

Generate a voice embedding

Upload 5–30 seconds of clean speech (WAV or MP3):

import os
import requests

with open("reference.wav", "rb") as f:
    response = requests.post(
        "https://api.vachana.ai/api/v1/tts/voice-clone/embeddings",
        headers={"X-API-Key-ID": os.environ["GNANI_API_KEY"]},
        files={"audio_file": f},
    )

speaker_embedding = response.json()
# {"embedding": "...", "shape": [1, 768], "dtype": "torch.bfloat16"}

Cache speaker_embedding — you only need to generate it once per voice.

Synthesize with your cloned voice

Pass speaker_embedding from Step 1 to any synthesis endpoint:

REST
Streaming (SSE)

import os
import requests

response = requests.post(
    "https://api.vachana.ai/api/v1/tts/inference",
    headers={
        "Content-Type": "application/json",
        "X-API-Key-ID": os.environ["GNANI_API_KEY"],
    },
    json={
        "text": "नमस्ते, आप कैसे हैं?",
        "model": "vachana-vc-v1",
        "audio_config": {
            "sample_rate": 44100,
            "encoding": "linear_pcm",
            "container": "wav",
        },
        "speaker_embedding": speaker_embedding,
    },
)

with open("cloned_voice.wav", "wb") as f:
    f.write(response.content)

import os
import base64
import requests

with requests.post(
    "https://api.vachana.ai/api/v1/tts/sse",
    headers={
        "Content-Type": "application/json",
        "X-API-Key-ID": os.environ["GNANI_API_KEY"],
    },
    json={
        "text": "नमस्ते, आप कैसे हैं?",
        "model": "vachana-vc-v1",
        "speaker_embedding": speaker_embedding,
    },
    stream=True,
) as response:
    with open("cloned_voice.wav", "wb") as f:
        for line in response.iter_lines():
            if line.startswith(b"data:"):
                payload = line[5:].strip()
                if payload and b"status" not in payload:
                    f.write(base64.b64decode(payload))

Next Steps

STT Reference

Full parameter reference, language codes, and batch transcription

TTS Reference

Voice options, audio config, and all streaming modes

Voice Cloning

Embedding API, quality tips, and synthesis options

​Prerequisites

​Next Steps

STT Reference

TTS Reference

Voice Cloning

Prerequisites

Next Steps