Speech-to-Text (REST)
Quick transcription of audio clips up to 60 seconds via HTTP.
Overview
The REST endpoint transcribes an audio file in a single synchronous HTTP request and returns the transcript immediately. It is best suited for short, pre-recorded audio clips.| Use case | Recommended endpoint |
|---|---|
| Short clips ≤ 60 s (ideal ≤ 30 s) | This endpoint |
| Live microphone / real-time audio | STT Realtime (WebSocket) |
| Large files or bulk jobs | STT Batch |
Endpoint
Authentication
Pass your API key in the request header.| Header | Type | Required | Description |
|---|---|---|---|
X-API-Key-ID | string | ✅ | Your Vachana API key. Obtain one from the Vachana dashboard. |
Request Parameters
All parameters are sent asmultipart/form-data fields.
language_code. Useful to improve accuracy when the audio is predominantly one language.verbatim — raw spoken-form output. transcribe — enables Inverse Text Normalization (ITN): numbers, currency, dates, and phone numbers are written in their conventional form. See ITN below.format=transcribe, set true to render digits in the native script of the target language (e.g. ₹५,००० instead of ₹5,000 for Hindi). Has no effect when format=verbatim. Currently supported for hi-IN and en-IN only.Response
200 — Success
| Field | Type | Description |
|---|---|---|
success | boolean | true when transcription completed without error. |
request_id | string | Unique identifier for this request. Use it when contacting support or correlating logs. |
timestamp | string | Server-side request timestamp in YYYYMMDD_HHMMSS.mmm format. |
transcript | string | The transcribed text. Format depends on the format parameter. |
Error Responses
| Status | Meaning |
|---|---|
400 | Bad request — invalid parameters or unsupported audio format. |
429 | Rate limit exceeded — slow down or contact support to increase limits. |
500 | Internal server error — transient issue on our side; retry with backoff. |
503 | Service unavailable — the STT service is temporarily down. |
Code Example
Python SDK
The official Python SDK handles multipart construction, authentication headers, and retries automatically.Installation
Authentication
The client requires three credentials:organization_id, api_key, and user_id. You can pass them directly or load them from environment variables.
Transcribe Audio
Custom Request ID
Pass arequest_id to correlate SDK calls with your own logs or support tickets.
Error Handling
Supported Languages
The Vachana API supports 10 Indian languages.| Language | Code | Native Script | Example |
|---|---|---|---|
| Bengali | bn-IN | Bengali (বাংলা) | “আমি ভাত খাই” |
| English | en-IN | Latin | ”I am going to the market” |
| Gujarati | gu-IN | Gujarati (ગુજરાતી) | “હું બજાર જાઉં છું” |
| Hindi | hi-IN | Devanagari (हिन्दी) | “मैं बाज़ार जा रहा हूँ” |
| Kannada | kn-IN | Kannada (ಕನ್ನಡ) | “ನಾನು ಮಾರುಕಟ್ಟೆಗೆ ಹೋಗುತ್ತೇನೆ” |
| Malayalam | ml-IN | Malayalam (മലയാളം) | “ഞാൻ ചന്തയിലേക്ക് പോകുന്നു” |
| Marathi | mr-IN | Devanagari (मराठी) | “मी बाजारात जातोय” |
| Punjabi | pa-IN | Gurmukhi (ਪੰਜਾਬੀ) | “ਮੈਂ ਬਾਜ਼ਾਰ ਜਾ ਰਿਹਾ ਹਾਂ” |
| Tamil | ta-IN | Tamil (தமிழ்) | “நான் சந்தைக்கு செல்கிறேன்” |
| Telugu | te-IN | Telugu (తెలుగు) | “నేను మార్కెట్కి వెళ్తున్నాను” |
| Hinglish (experimental) | en-hi-in-cm | Latin + Devanagari | ”मैं market जा रहा हूँ” |
| Auto-detect (experimental) | en-IN,hi-IN,ta-IN,… | All supported | Pass all desired codes comma-separated |
language_code. For example: en-IN,hi-IN,ta-IN,te-IN,kn-IN,ml-IN,gu-IN,mr-IN,bn-IN,pa-IN.Inverse Text Normalization (ITN)
ITN converts the spoken-form output of the ASR engine into the conventional written form a reader expects — numbers become digits, currency gets the ₹ symbol, dates are formatted, and phone numbers are compacted — all in one pass, immediately after transcription. How to enable: Setformat=transcribe in the request body.
Currently supported for Hindi (hi-IN) and English (en-IN) only. All other languages use verbatim output regardless of the format value.
What ITN Normalizes
1 — Cardinal & Ordinal Numbers
Whole numbers and positional ranks are formatted using Indian comma grouping (groups of 2 after the first 3 digits).| Spoken input (ASR) | Written output (ITN) | Rule |
|---|---|---|
| दो हज़ार | 2,000 | Indian comma grouping |
| पाँच लाख बीस हज़ार | 5,20,000 | Lakh-scale grouping |
| उन्नीस सौ चौरानवे | 1,994 | Hundred-base year form |
| five lakh | 5,00,000 | English lakh convention |
| पहला / twenty first | 1st / 21st | Ordinal suffix |
2 — Currency & Money
All Indian currency expressions — including paise fractions and lakh/crore scales — are formatted with the ₹ symbol and Indian comma grouping.| Spoken input (ASR) | Written output (ITN) | Rule |
|---|---|---|
| पाँच सौ रुपये | ₹500 | ₹ + amount |
| तीन रुपये पचास पैसे | ₹3.50 | ₹ + rupees.paise |
| दस लाख रुपये | ₹10,00,000 | ₹ + lakh grouping |
| I need five thousand rupees | ₹5,000 | English India pipeline |
3 — Dates
| Spoken input (ASR) | Written output (ITN) | Rule |
|---|---|---|
| बीस जनवरी दो हज़ार पच्चीस | 20 जनवरी 2025 | DD Month YYYY (hi) |
| fifteenth january twenty twenty five | 15th January 2025 | Ordinal Month YYYY (en) |
4 — Times
Indian time-of-day words (सुबह, दोपहर, शाम, रात) automatically map to 24-hour HH:MM output.| Spoken input (ASR) | Written output (ITN) | Rule |
|---|---|---|
| सुबह पाँच बजे | सुबह 05:00 | सुबह = AM |
| शाम पाँच बजे | शाम 17:00 | शाम = evening (16–20 h) |
| रात के दस बजे | रात 22:00 | रात = night (20–24 h) |
| meeting at five fifteen in the evening | meeting 17:15 in the evening | en — 24-hour |
5 — Phone Numbers & PIN Codes
Digit streams are concatenated into compact numeric strings. 10-digit streams → mobile number; 6-digit streams → PIN. Repeat prefixes (double/डबल, triple/ट्रिपल) are expanded.| Spoken input (ASR) | Written output (ITN) | Rule |
|---|---|---|
| नौ आठ सात छह पाँच चार तीन दो एक शून्य | 9876543210 | 10 digits → phone |
| एक एक शून्य शून्य शून्य एक | 110001 | 6 digits → PIN |
| डबल आठ नौ शून्य एक दो तीन चार पाँच छह | 8890123456 | double prefix |
| one two three four five six | 123456 | English digit words |
6 — Mixed & Code-Mixed Utterances
A single sentence may contain multiple entity types or blend Hindi and English. ITN handles all in one pass, normalizing each entity independently.| Spoken input (ASR) | Written output (ITN) |
|---|---|
| कल थ्री फिफ्टी पीएम को पाँच सौ रुपये transfer करना है | कल 15:50 को ₹500 transfer करना है |
Native Script Digits — itn_native_numerals
By default, ITN outputs Western Arabic digits (0–9) regardless of language. Set itn_native_numerals=true to render digits in the native script of the target language.
| Language | Spoken input | false (default) | true — native script |
|---|---|---|---|
Hindi hi-IN | पाँच हज़ार रुपये | ₹5,000 | ₹५,००० |
English en-IN | five thousand rupees | ₹5,000 | ₹5,000 (Latin — no change) |
What ITN Does Not Change
ITN intentionally preserves idiomatic and ambiguous phrases to avoid incorrect normalization.- दो तीन (meaning a few) stays as text, not
2or3 - कर दो / ले दो (imperative verbs) are kept as words, not treated as cardinal 2
Authorizations
API key for authentication. Sign up in Vachana to get the API Key.
Body
Audio file to transcribe. Supported formats - WAV, MP3, OGG, FLAC, AAC, M4A. Maximum duration - 60 seconds (Ideal duration is 30 seconds).
Language code for transcription. Use one of the supported language codes.
Supported values: bn-IN, en-IN, gu-IN, hi-IN, kn-IN, ml-IN, mr-IN, pa-IN, ta-IN, te-IN
bn-IN, en-IN, gu-IN, hi-IN, kn-IN, ml-IN, mr-IN, pa-IN, ta-IN, te-IN "hi-IN"
Optional preferred language for processing when multiple languages are specified.
Must be one of the languages in language_code. When set, forces processing with the single-language model for the specified language, which may improve accuracy for predominantly single-language audio.
bn-IN, en-IN, gu-IN, hi-IN, kn-IN, ml-IN, mr-IN, pa-IN, ta-IN, te-IN "hi-IN"
Output format for the transcript.
verbatim(default) — Returns the raw spoken-form transcript as recognised by the ASR engine. No post-processing is applied.transcribe— Enables Inverse Text Normalization (ITN). Spoken numeric expressions, currency, dates, times, and phone numbers are automatically converted to their written form (e.g. "five thousand rupees" → "₹5,000"). Currently supported forhi-INanden-INonly.
verbatim, transcribe "transcribe"
When format=transcribe, set to true to render digits in the native script of the target language instead of Western Arabic digits (0–9).
For example, with hi-IN: "पाँच हज़ार रुपये" → "₹५,०००" instead of "₹5,000".
Has no effect when format=verbatim. Currently supported for hi-IN and en-IN only (English always uses Western Arabic digits regardless of this setting).
true