Speech-to-Text (REST)

Speech to Text (REST)

curl --request POST \
  --url https://api.vachana.ai/stt/v3 \
  --header 'Content-Type: multipart/form-data' \
  --header 'X-API-Key-ID: <api-key>' \
  --form audio_file='@example-file' \
  --form language_code=hi-IN

import requests

url = "https://api.vachana.ai/stt/v3"

files = { "audio_file": ("example-file", open("example-file", "rb")) }
payload = { "language_code": "hi-IN" }
headers = {"X-API-Key-ID": "<api-key>"}

response = requests.post(url, data=payload, files=files, headers=headers)

print(response.text)

const form = new FormData();
form.append('audio_file', '<string>');
form.append('language_code', 'hi-IN');

const options = {method: 'POST', headers: {'X-API-Key-ID': '<api-key>'}};

options.body = form;

fetch('https://api.vachana.ai/stt/v3', options)
  .then(res => res.json())
  .then(res => console.log(res))
  .catch(err => console.error(err));

{
  "success": true,
  "request_id": "req_abc123",
  "timestamp": "20251226_143052.123",
  "transcript": "नमस्ते, आप कैसे हैं?"
}

{
  "success": false,
  "error": {
    "type": "INVALID_REQUEST_ERROR",
    "message": "Audio duration exceeds maximum limit of 60 seconds, while the ideal duration is 30 seconds."
  }
}

{
  "success": false,
  "error": {
    "type": "RATE_LIMIT_ERROR",
    "message": "Rate limit exceeded. Please try again later."
  }
}

{
  "success": false,
  "error": {
    "type": "API_ERROR",
    "message": "An unexpected error occurred while processing."
  }
}

{
  "success": false,
  "error": {
    "type": "SERVICE_UNAVAILABLE",
    "message": "Speech recognition service is temporarily unavailable."
  }
}

POST

stt

Speech to Text (REST)

curl --request POST \
  --url https://api.vachana.ai/stt/v3 \
  --header 'Content-Type: multipart/form-data' \
  --header 'X-API-Key-ID: <api-key>' \
  --form audio_file='@example-file' \
  --form language_code=hi-IN

import requests

url = "https://api.vachana.ai/stt/v3"

files = { "audio_file": ("example-file", open("example-file", "rb")) }
payload = { "language_code": "hi-IN" }
headers = {"X-API-Key-ID": "<api-key>"}

response = requests.post(url, data=payload, files=files, headers=headers)

print(response.text)

const form = new FormData();
form.append('audio_file', '<string>');
form.append('language_code', 'hi-IN');

const options = {method: 'POST', headers: {'X-API-Key-ID': '<api-key>'}};

options.body = form;

fetch('https://api.vachana.ai/stt/v3', options)
  .then(res => res.json())
  .then(res => console.log(res))
  .catch(err => console.error(err));

{
  "success": true,
  "request_id": "req_abc123",
  "timestamp": "20251226_143052.123",
  "transcript": "नमस्ते, आप कैसे हैं?"
}

{
  "success": false,
  "error": {
    "type": "INVALID_REQUEST_ERROR",
    "message": "Audio duration exceeds maximum limit of 60 seconds, while the ideal duration is 30 seconds."
  }
}

{
  "success": false,
  "error": {
    "type": "RATE_LIMIT_ERROR",
    "message": "Rate limit exceeded. Please try again later."
  }
}

{
  "success": false,
  "error": {
    "type": "API_ERROR",
    "message": "An unexpected error occurred while processing."
  }
}

{
  "success": false,
  "error": {
    "type": "SERVICE_UNAVAILABLE",
    "message": "Speech recognition service is temporarily unavailable."
  }
}

Overview

The REST endpoint transcribes an audio file in a single synchronous HTTP request and returns the transcript immediately. It is best suited for short, pre-recorded audio clips.

Use case	Recommended endpoint
Short clips ≤ 60 s (ideal ≤ 30 s)	This endpoint
Live microphone / real-time audio	STT Realtime (WebSocket)
Large files or bulk jobs	STT Batch

Endpoint

POST https://api.vachana.ai/stt/v3
Content-Type: multipart/form-data

Authentication

Pass your API key in the request header.

Header	Type	Required	Description
`X-API-Key-ID`	`string`	✅	Your Gnani Prisma v2.5 API key. Obtain one from the Gnani APIs dashboard.

Request Parameters

All parameters are sent as multipart/form-data fields.

file

required

Audio file to transcribe. Supported formats: WAV, MP3, OGG, FLAC, AAC, M4A. Maximum duration: 60 seconds (ideal ≤ 30 s).

string

required

BCP-47 language code. See Supported Languages below.

enum

default:"verbatim"

verbatim — raw spoken-form output. transcribe — enables Inverse Text Normalization (ITN): numbers, currency, dates, and phone numbers are written in their conventional form. See ITN below.

boolean

default:"false"

When format=transcribe, set true to render digits in the native script of the target language (e.g. ₹५,००० instead of ₹5,000 for Hindi). Has no effect when format=verbatim.

Response

200 — Success

{
  "success": true,
  "request_id": "req_abc123",
  "timestamp": "20251226_143052.123",
  "transcript": "नमस्ते, आप कैसे हैं?"
}

Field	Type	Description
`success`	`boolean`	`true` when transcription completed without error.
`request_id`	`string`	Unique identifier for this request. Use it when contacting support or correlating logs.
`timestamp`	`string`	Server-side request timestamp in `YYYYMMDD_HHMMSS.mmm` format.
`transcript`	`string`	The transcribed text. Format depends on the `format` parameter.

Error Responses

Status	Meaning
`400`	Bad request — invalid parameters or unsupported audio format.
`429`	Rate limit exceeded — slow down or contact support to increase limits.
`500`	Internal server error — transient issue on our side; retry with backoff.
`503`	Service unavailable — the STT service is temporarily down.

Code Example

curl --request POST \
  --url https://api.vachana.ai/stt/v3 \
  --header 'Content-Type: multipart/form-data' \
  --header 'X-API-Key-ID: <api-key>' \
  --form audio_file='@recording.wav' \
  --form language_code=hi-IN \
  --form format=transcribe \
  --form itn_native_numerals=true

from gnani.stt import GnaniSTTClient

client = GnaniSTTClient(
    organization_id="your-organization-id",
    api_key="your-api-key",
    user_id="your-user-id",
)

result = client.transcribe("recording.wav", language_code="hi-IN")
print(result["transcript"])

Python SDK

The official Python SDK handles multipart construction, authentication headers, and retries automatically.

Installation

pip install gnani-vachana

Requires Python 3.10+.

Authentication

The client requires an api_key credential. You can pass it directly or load it from environment variables.

from gnani.stt import GnaniSTTClient

client = GnaniSTTClient(
    api_key="your-api-key"
)

export GNANI_API_KEY="your-api-key"

from gnani.stt import GnaniSTTClient

# Picks up credentials from environment automatically
client = GnaniSTTClient()

Transcribe Audio

result = client.transcribe("recording.wav", language_code="hi-IN")
print(result["transcript"])

with open("recording.wav", "rb") as f:
    result = client.transcribe(f, language_code="hi-IN")
print(result["transcript"])

with open("recording.wav", "rb") as f:
    audio_bytes = f.read()

result = client.transcribe(audio_bytes, language_code="hi-IN")
print(result["transcript"])

Custom Request ID

Pass a request_id to correlate SDK calls with your own logs or support tickets.

result = client.transcribe(
    "call.flac",
    language_code="hi-IN",
    request_id="my-trace-123",
)

Error Handling

from gnani.stt import (
    AuthenticationError,
    InvalidAudioError,
    APIError,
)

try:
    result = client.transcribe("audio.wav", language_code="hi-IN")
    print(result["transcript"])
except AuthenticationError:
    print("Invalid credentials — check your organization_id, api_key, and user_id.")
except InvalidAudioError as e:
    print(f"Bad audio file: {e}")
except APIError as e:
    print(f"API error {e.status_code}: {e}")

Supported Languages

The Gnani Prisma v2.5 API supports 10 Indian languages.

Language	Code	Native Script	Example
Bengali	`bn-IN`	Bengali (বাংলা)	“আমি ভাত খাই”
English	`en-IN`	Latin	”I am going to the market”
Gujarati	`gu-IN`	Gujarati (ગુજરાતી)	“હું બજાર જાઉં છું”
Hindi	`hi-IN`	Devanagari (हिन्दी)	“मैं बाज़ार जा रहा हूँ”
Kannada	`kn-IN`	Kannada (ಕನ್ನಡ)	“ನಾನು ಮಾರುಕಟ್ಟೆಗೆ ಹೋಗುತ್ತೇನೆ”
Malayalam	`ml-IN`	Malayalam (മലയാളം)	“ഞാൻ ചന്തയിലേക്ക് പോകുന്നു”
Marathi	`mr-IN`	Devanagari (मराठी)	“मी बाजारात जातोय”
Punjabi	`pa-IN`	Gurmukhi (ਪੰਜਾਬੀ)	“ਮੈਂ ਬਾਜ਼ਾਰ ਜਾ ਰਿਹਾ ਹਾਂ”
Tamil	`ta-IN`	Tamil (தமிழ்)	“நான் சந்தைக்கு செல்கிறேன்”
Telugu	`te-IN`	Telugu (తెలుగు)	“నేను మార్కెట్‌కి వెళ్తున్నాను”

Inverse Text Normalization (ITN)

ITN converts the spoken-form output of the ASR engine into the conventional written form a reader expects — numbers become digits, currency gets the ₹ symbol, dates are formatted, and phone numbers are compacted — all in one pass, immediately after transcription. How to enable: Set format=transcribe in the request body.

What ITN Normalizes

1 — Cardinal & Ordinal Numbers

Whole numbers and positional ranks are formatted using Indian comma grouping (groups of 2 after the first 3 digits).

Spoken input (ASR)	Written output (ITN)	Rule
दो हज़ार	2,000	Indian comma grouping
पाँच लाख बीस हज़ार	5,20,000	Lakh-scale grouping
उन्नीस सौ चौरानवे	1,994	Hundred-base year form
five lakh	5,00,000	English lakh convention
पहला / twenty first	1st / 21st	Ordinal suffix

2 — Currency & Money

All Indian currency expressions — including paise fractions and lakh/crore scales — are formatted with the ₹ symbol and Indian comma grouping.

Spoken input (ASR)	Written output (ITN)	Rule
पाँच सौ रुपये	₹500	₹ + amount
तीन रुपये पचास पैसे	₹3.50	₹ + rupees.paise
दस लाख रुपये	₹10,00,000	₹ + lakh grouping
I need five thousand rupees	₹5,000	English India pipeline

3 — Dates

Spoken input (ASR)	Written output (ITN)	Rule
बीस जनवरी दो हज़ार पच्चीस	20 जनवरी 2025	DD Month YYYY (hi)
fifteenth january twenty twenty five	15th January 2025	Ordinal Month YYYY (en)

4 — Times

Indian time-of-day words (सुबह, दोपहर, शाम, रात) automatically map to 24-hour HH:MM output.

Spoken input (ASR)	Written output (ITN)	Rule
सुबह पाँच बजे	सुबह 05:00	सुबह = AM
शाम पाँच बजे	शाम 17:00	शाम = evening (16–20 h)
रात के दस बजे	रात 22:00	रात = night (20–24 h)
meeting at five fifteen in the evening	meeting 17:15 in the evening	en — 24-hour

5 — Phone Numbers & PIN Codes

Digit streams are concatenated into compact numeric strings. 10-digit streams → mobile number; 6-digit streams → PIN. Repeat prefixes (double/डबल, triple/ट्रिपल) are expanded.

Spoken input (ASR)	Written output (ITN)	Rule
नौ आठ सात छह पाँच चार तीन दो एक शून्य	9876543210	10 digits → phone
एक एक शून्य शून्य शून्य एक	110001	6 digits → PIN
डबल आठ नौ शून्य एक दो तीन चार पाँच छह	8890123456	double prefix
one two three four five six	123456	English digit words

6 — Mixed & Code-Mixed Utterances

A single sentence may contain multiple entity types or blend Hindi and English. ITN handles all in one pass, normalizing each entity independently.

Spoken input (ASR)	Written output (ITN)
कल थ्री फिफ्टी पीएम को पाँच सौ रुपये transfer करना है	कल 15:50 को ₹500 transfer करना है

Native Script Digits — `itn_native_numerals`

By default, ITN outputs Western Arabic digits (0–9) regardless of language. Set itn_native_numerals=true to render digits in the native script of the target language.

Language	Spoken input	`false` (default)	`true` — native script
Hindi `hi-IN`	पाँच हज़ार रुपये	₹5,000	₹५,०००
English `en-IN`	five thousand rupees	₹5,000	₹5,000 (Latin — no change)

What ITN Does Not Change

ITN intentionally preserves idiomatic and ambiguous phrases to avoid incorrect normalization.

दो तीन (meaning a few) stays as text, not 2 or 3
कर दो / ले दो (imperative verbs) are kept as words, not treated as cardinal 2

If a word or phrase is unchanged in the output, treat it as a failure only when the input was unambiguously a numeric entity.

Authorizations

X-API-Key-ID

string

header

required

API key for authentication. Sign up in Vachana to get the API Key.

Body

multipart/form-data

audio_file

file

required

Audio file to transcribe. Supported formats - WAV, MP3, OGG, FLAC, AAC, M4A. Maximum duration - 60 seconds (Ideal duration is 30 seconds).

language_code

enum<string>

required

Language code for transcription. Use one of the supported language codes.

Supported values: bn-IN, en-IN, gu-IN, hi-IN, kn-IN, ml-IN, mr-IN, pa-IN, ta-IN, te-IN

Available options:

bn-IN,

en-IN,

gu-IN,

hi-IN,

kn-IN,

ml-IN,

mr-IN,

pa-IN,

ta-IN,

te-IN

Example:

"hi-IN"

format

enum<string>

default:verbatim

Output format for the transcript.

verbatim (default) — Returns the raw spoken-form transcript as recognised by the ASR engine. No post-processing is applied.
transcribe — Enables Inverse Text Normalization (ITN). Spoken numeric expressions, currency, dates, times, and phone numbers are automatically converted to their written form (e.g. "five thousand rupees" → "₹5,000").

Available options:

verbatim,

transcribe

Example:

"transcribe"

itn_native_numerals

boolean

default:false

When format=transcribe, set to true to render digits in the native script of the target language instead of Western Arabic digits (0–9).

For example, with hi-IN: "पाँच हज़ार रुपये" → "₹५,०००" instead of "₹5,000".

Has no effect when format=verbatim.

Example:

true

Response

Successful transcription

success

boolean

Indicates if the transcription was successful

timestamp

string

Request timestamp in format YYYYMMDD_HHMMSS.mmm

transcript

string

The transcribed text from the audio

​Overview

​Endpoint

​Authentication

​Request Parameters

​Response

​200 — Success

​Error Responses

​Code Example

​Python SDK

​Installation

​Authentication

​Transcribe Audio

​Custom Request ID

​Error Handling

​Supported Languages

​Inverse Text Normalization (ITN)

​What ITN Normalizes

​1 — Cardinal & Ordinal Numbers

​2 — Currency & Money

​3 — Dates

​4 — Times

​5 — Phone Numbers & PIN Codes

​6 — Mixed & Code-Mixed Utterances

​Native Script Digits — itn_native_numerals

​What ITN Does Not Change

Authorizations

Body

Response

Overview

Endpoint

Authentication

Request Parameters

Response

200 — Success

Error Responses

Code Example

Python SDK

Installation

Authentication

Transcribe Audio

Custom Request ID

Error Handling

Supported Languages

Inverse Text Normalization (ITN)

What ITN Normalizes

1 — Cardinal & Ordinal Numbers

2 — Currency & Money

3 — Dates

4 — Times

5 — Phone Numbers & PIN Codes

6 — Mixed & Code-Mixed Utterances

Native Script Digits — `itn_native_numerals`

What ITN Does Not Change