Skip to main content

Available APIs

Speech-to-Text: Gnani Prisma v2.5

APIDescription
STT RESTTranscribe short audio files (≤ 60s) via a single HTTP request
STT RealtimeStream live audio over a WebSocket connection and receive transcript segments in real-time
STT BatchSubmit long or multiple audio files for async transcription; poll for results via job_id

Text-to-Speech: Gnani Timbre v2.0

APIDescription
TTS RESTSynthesize text to audio in a single synchronous HTTP call
TTS StreamingSubmit text via an HTTP request and receive synthesized audio progressively as a server-sent event stream
TTS RealtimeStream text incrementally and receive audio simultaneously over a persistent WebSocket connection, delivering low latency

Voice Cloning

APIDescription
VC EmbeddingsUpload a reference audio file to generate a speaker_embedding for use in voice cloning
Voice Cloned TTS RESTSynthesize audio in your cloned voice via a single synchronous HTTP call
Voice Cloned TTS StreamingStream cloned voice audio progressively using Server-Sent Events
Voice Cloned TTS RealtimeStream text and receive cloned voice audio in real-time over a WebSocket connection

Key Capabilities

FeatureDetail
10+ Indian LanguagesNative script transcription and synthesis across 10+ Indian languages
Language DetectionAutomatic — or specify language_code to target a specific language.
Code-SwitchingHandles code-mixed speech naturally
Audio FlexibilitySTT accepts WAV, MP3, OGG, FLAC, AAC, M4A
Voice CloningClone any voice from a short audio sample using speaker embeddings
LatencyP95 200ms for STT
Streaming TTS supported
Processing ModesReal-time streaming and batch
AccuracySub-4% WER on Indian English and 20-30% better accuracy for major Indian languages
Transcript FormattingAuto-punctuation, inverse text normalization (numerals, dates, currency)
SSML SupportFull SSML for fine-grained speech synthesis control
Noise RobustnessOptimized for telephony-grade and noisy real-world audio

Get Started

Ready to begin? Head over to the Quick Start Guide to make your first API call.