Prerequisites
Get your API key
Sign up on the Vachana platform and generate an API key from your dashboard.
- Speech-to-Text
- Text-to-Speech
- Voice Cloning
- REST
- Realtime (WebSocket)
Supported formats: WAV, MP3, OGG, FLAC, AAC, M4A · Sample rate: 8 – 44.1 kHz · Max duration: 60 s
from gnani.stt import GnaniSTTClient
# reads GNANI_API_KEY, GNANI_ORGANIZATION_ID, GNANI_USER_ID from env
client = GnaniSTTClient()
result = client.transcribe("audio.wav", language_code="hi-IN")
print(result["transcript"])
audio.wav— path to your audio filehi-IN— see Language Codes
{
"success": true,
"transcript": "नमस्ते, आप कैसे हैं?"
}
Stream raw PCM audio and receive transcriptions as they arrive.Response per segment:
Format: PCM 16-bit · Sample rate: 8 or 16 kHz · Channels: Mono
import asyncio
from gnani.stt import GnaniSTTStreamClient
async def main():
# reads GNANI_API_KEY from env
async with GnaniSTTStreamClient(
language_code="hi-IN",
sample_rate=16000,
) as stream:
with open("audio.pcm", "rb") as f:
await stream.stream_audio(
f,
on_transcript=lambda t: print(f"Transcript: {t.text}"),
realtime_pace=True,
)
asyncio.run(main())
{
"type": "transcript",
"text": "Hello, how are you today?",
"segment_id": "seg_abc123",
"latency": 320
}
- REST
- Streaming (SSE)
- Realtime (WebSocket)
Synthesize speech and get back the full audio in one call.
from gnani.tts import GnaniTTSClient
client = GnaniTTSClient() # reads GNANI_API_KEY from env
audio = client.synthesize("नमस्ते, आप कैसे हैं?", voice="sia")
with open("output.wav", "wb") as f:
f.write(audio)
Receive audio progressively as it’s synthesized — lower latency for longer texts.
from gnani.tts import GnaniTTSStreamClient
client = GnaniTTSStreamClient() # reads GNANI_API_KEY from env
with open("output.wav", "wb") as f:
for chunk in client.synthesize_stream("नमस्ते, आप कैसे हैं?", voice="sia"):
f.write(chunk)
Lowest latency — stream text in, receive PCM audio chunks in real time.
import asyncio
from gnani.tts import GnaniTTSRealtimeClient
async def main():
async with GnaniTTSRealtimeClient() as client: # reads GNANI_API_KEY from env
with open("output.wav", "wb") as f:
async for chunk in client.synthesize("नमस्ते, आप कैसे हैं?", voice="sia"):
f.write(chunk)
asyncio.run(main())
Voice cloning is a two-step process: extract a voice embedding from a reference clip, then pass that embedding to any synthesis endpoint.
Generate a voice embedding
Upload 5–30 seconds of clean speech (WAV or MP3):
import os
import requests
with open("reference.wav", "rb") as f:
response = requests.post(
"https://api.vachana.ai/api/v1/tts/voice-clone/embeddings",
headers={"X-API-Key-ID": os.environ["GNANI_API_KEY"]},
files={"audio_file": f},
)
speaker_embedding = response.json()
# {"embedding": "...", "shape": [1, 768], "dtype": "torch.bfloat16"}
Cache
speaker_embedding — you only need to generate it once per voice.Synthesize with your cloned voice
Pass
speaker_embedding from Step 1 to any synthesis endpoint:- REST
- Streaming (SSE)
import os
import requests
response = requests.post(
"https://api.vachana.ai/api/v1/tts/inference",
headers={
"Content-Type": "application/json",
"X-API-Key-ID": os.environ["GNANI_API_KEY"],
},
json={
"text": "नमस्ते, आप कैसे हैं?",
"model": "vachana-vc-v1",
"audio_config": {
"sample_rate": 44100,
"encoding": "linear_pcm",
"container": "wav",
},
"speaker_embedding": speaker_embedding,
},
)
with open("cloned_voice.wav", "wb") as f:
f.write(response.content)
import os
import base64
import requests
with requests.post(
"https://api.vachana.ai/api/v1/tts/sse",
headers={
"Content-Type": "application/json",
"X-API-Key-ID": os.environ["GNANI_API_KEY"],
},
json={
"text": "नमस्ते, आप कैसे हैं?",
"model": "vachana-vc-v1",
"speaker_embedding": speaker_embedding,
},
stream=True,
) as response:
with open("cloned_voice.wav", "wb") as f:
for line in response.iter_lines():
if line.startswith(b"data:"):
payload = line[5:].strip()
if payload and b"status" not in payload:
f.write(base64.b64decode(payload))
Next Steps
STT Reference
Full parameter reference, language codes, and batch transcription
TTS Reference
Voice options, audio config, and all streaming modes
Voice Cloning
Embedding API, quality tips, and synthesis options