Speech-to-Text (Batch)

Overview

Submit one or more audio files for transcription and receive a job_id immediately. Poll the status endpoint on a fixed interval until the job completes and transcripts are available. Ideal for long recordings, bulk files, or offline pipelines where you do not need a live response. For real-time transcription, see STT Realtime. For short clips under 60 seconds, see STT REST.

Endpoints

Operation	Method	URL
Submit Job	POST	`https://api.vachana.ai/stt/v3/batch/submit`
Check Job Status	GET	`https://api.vachana.ai/stt/v3/batch/status/{job_id}`

Limits & Specifications

View limits and supported formats

Item	Limit
Max audio duration	Less than 1 hour per file
Max files per request	10 files per API call
Max total payload size	80 MB across all files and form fields combined
Minimum poll interval	60 seconds between status calls for the same `job_id`
Speaker diarization.	This API supports at most 2 speakers per file (two-party diarization). Scenarios with more than two distinct speakers are not supported

Supported Audio Formats

AAC · WAV · FLAC · ALAC · OGG (Vorbis) · OpusUse standard file extensions and MIME types (e.g. .m4a for AAC, .wav, .flac, .ogg).

Authentication

Send these headers on every request both submit and status calls.

Header	Required	Description
`X-API-Key-ID`	Yes	Your API key. Required for all requests.
`X-API-Request-ID`	No	A unique trace ID (e.g. UUID) you assign. Used to correlate your logs with platform logs or support. If omitted, the platform may generate one.

Do not set Content-Type: application/json on the submit request. Use multipart/form-data. curl sets the correct boundary automatically when you use -F / --form.

Submit a Transcription Job

`POST /stt/v3/batch/submit`

Upload audio files and kick off an asynchronous transcription job. The response returns a job_id immediately. the files are not yet transcribed at this point.

Request — Form Fields

Supported Language Codes

The Gnani Prisma v2.5 API supports these 10 Indian languages

Language	Code	Native Script	Example Text
Bengali	`bn-IN`	Bengali (বাংলা)	“আমি ভাত খাই”
English	`en-IN`	Latin	”I am going to the market”
Gujarati	`gu-IN`	Gujarati (ગુજરાતી)	“હું બજાર જાઉં છું”
Hindi	`hi-IN`	Devanagari (हिन्दी)	“मैं बाज़ार जा रहा हूँ”
Kannada	`kn-IN`	Kannada (ಕನ್ನಡ)	“ನಾನು ಮಾರುಕಟ್ಟೆಗೆ ಹೋಗುತ್ತೇನೆ”
Malayalam	`ml-IN`	Malayalam (മലയാളം)	“ഞാൻ ചന്തയിലേക്ക് പോകുന്നു”
Marathi	`mr-IN`	Devanagari (मराठी)	“मी बाजारात जातोय”
Punjabi	`pa-IN`	Gurmukhi (ਪੰਜਾਬੀ)	“ਮੈਂ ਬਾਜ਼ਾਰ ਜਾ ਰਿਹਾ ਹਾਂ”
Tamil	`ta-IN`	Tamil (தமிழ்)	“நான் சந்தைக்கு செல்கிறேன்”
Telugu	`te-IN`	Telugu (తెలుగు)	“నేను మార్కెట్‌కి వెళ్తున్నాను”

Example — curl

curl --location --request POST 'https://api.vachana.ai/stt/v3/batch/submit' \
  --header 'X-API-Key-ID: <YOUR_API_KEY>' \
  --header 'X-API-Request-ID: 550e8400-e29b-41d4-a716-446655440000' \
  --form 'language_code=hi-IN' \
  --form 'is_multi_channel=false' \
  --form 'format=transcribe' \
  --form 'audio_files=@"/path/to/first.wav"' \
  --form 'audio_files=@"/path/to/second.wav"'

Response — `200 OK`

{
  "job_id": "batch_7f3a92c1d4e8",
  "status": "submitted",
  "file_count": 2,
  "message": "Job accepted. Poll the status endpoint every 60 seconds for results."
}

Field	Type	Description
`job_id`	string	Identifier for this job. Use it in the status URL.
`status`	string	Initial value is always `submitted`.
`file_count`	integer	Number of files accepted into the job.
`message`	string	Short confirmation with polling instructions.

Errors

HTTP Status	When
`400`	No files uploaded, empty file, more than 10 files, payload over 80 MB, unsupported format, or other client-side validation failure.
`500`	Server error.

Check Job Status

`GET /stt/v3/batch/status/{job_id}`

Poll this endpoint to check progress and retrieve transcription results once the job finishes. Call this once every 60 seconds per job_id. do not poll more frequently.

Path Parameter

Parameter	Required	Description
`job_id`	Yes	The `job_id` returned from the Submit response.

Example — curl

curl --location --request GET 'https://api.vachana.ai/stt/v3/batch/status/{job_id}' \
  --header 'X-API-Key-ID: <YOUR_API_KEY>' \
  --header 'X-API-Request-ID: 550e8400-e29b-41d4-a716-446655440000'

Response — `200 OK`

{
  "job_id": "batch_7f3a92c1d4e8",
  "status": "completed",
  "total_files": 2,
  "completed_files": 2,
  "failed_files": 0,
  "overall_progress": 100,
  "error": null,
  "results": [
    {
      "filename": "first.wav",
      "status": "completed",
      "full_transcript": "नमस्ते, आप कैसे हैं?",
      "total_duration": 45.3,
      "error": null,
      "segments": [
        {
          "segment_id": 0,
          "start_time": 0.0,
          "end_time": 3.2,
          "text": "नमस्ते, आप कैसे हैं?",
          "speaker_id": 1,
          "language_detected": "hi-IN"
        }
      ]
    }
  ]
}

Job-Level Response Fields

Field	Type	Description
`job_id`	string	Job identifier.
`status`	string	`submitted` — accepted or in progress. `processing` — actively transcribing. `completed` — done. `failed` — job-level failure.
`total_files`	integer	Total number of files in the job.
`completed_files`	integer	Files finished successfully. Meaningful only when the job has reached a final state.
`failed_files`	integer	Files that failed. Meaningful only when the job has reached a final state.
`overall_progress`	integer	Approximate progress from `0` to `100` while the job is running.
`results`	array or `null`	Per-file results. `null` while the job is `submitted` or `processing`.
`error`	string or `null`	Top-level error message for the job, if any.

Per-File Result Fields — `results[]`

Field	Type	Description
`filename`	string	Original file name as submitted.
`full_transcript`	string	Complete transcribed text for the file.
`segments`	array	Time-aligned transcript segments (see below).
`total_duration`	number	Audio duration in seconds.
`status`	string	`completed` or `failed` for this individual file.
`error`	string or `null`	Error message for this file if it failed.

Per-Segment Fields — `results[].segments[]`

Field	Type	Description
`segment_id`	integer	Segment index (zero-based).
`start_time`	number	Segment start time in seconds.
`end_time`	number	Segment end time in seconds.
`text`	string	Transcribed text for this segment.
`speaker_id`	integer	Speaker identifier. Populated for multi-channel audio.
`language_detected`	string	BCP-47 code of the detected language for this segment.

Errors

HTTP Status	When
`404`	`job_id` not found — unknown ID or the job is no longer available.
`500`	Server error.

Inverse Text Normalization (ITN)

When format=transcribe is passed in the form body, ITN runs on every file’s transcript after recognition — converting spoken-form numbers, currency, dates, times, and phone numbers into the compact written form a reader expects.

What ITN Normalizes

ITN recognizes six categories of spoken expressions. Every matching span is transformed; all other words pass through unchanged.

1 — Cardinal & Ordinal Numbers

Whole numbers and positional ranks are formatted using Indian comma grouping (groups of 2 after the first 3 digits).

Spoken input (ASR)	Written output (ITN)	Format rule
दो हज़ार	2,000	Indian comma grouping
पाँच लाख बीस हज़ार	5,20,000	Lakh-scale grouping
five lakh	5,00,000	English lakh convention
पहला / twenty first	1st / 21st	Ordinal suffix

2 — Currency & Money

All Indian currency expressions — including paise fractions and lakh/crore scales — are formatted with the ₹ symbol and Indian comma grouping.

Spoken input (ASR)	Written output (ITN)	Format rule
पाँच सौ रुपये	₹500	₹ + amount
तीन रुपये पचास पैसे	₹3.50	₹ + rupees.paise
I need five thousand rupees	₹5,000	English India pipeline
pay do lakh rupees	₹2,00,000	Code-mixed en/hi

3 — Dates

Spoken input (ASR)	Written output (ITN)	Format rule
बीस जनवरी दो हज़ार पच्चीस	20 जनवरी 2025	DD Month YYYY (hi)
fifteenth january twenty twenty five	15th January 2025	Ordinal Month YYYY (en)

4 — Times

Indian time-of-day words (सुबह, दोपहर, शाम, रात) automatically map to 24-hour HH:MM output.

Spoken input (ASR)	Written output (ITN)	Format rule
सुबह पाँच बजे	सुबह 05:00	सुबह = AM
शाम पाँच बजे	शाम 17:00	शाम = evening (16–20h)
रात के दस बजे	रात 22:00	रात = night (20–24h)
meeting at five fifteen in the evening	meeting 17:15 in the evening	en — 24-hour

5 — Phone Numbers & PIN Codes

10-digit streams → mobile number; 6-digit streams → PIN. Repeat prefixes (double/डबल) are expanded.

Spoken input (ASR)	Written output (ITN)	Format rule
नौ आठ सात छह पाँच चार तीन दो एक शून्य	9876543210	10 digits → phone
एक एक शून्य शून्य शून्य एक	110001	6 digits → PIN
one two three four five six	123456	English digit words

6 — Mixed & Code-Mixed Utterances

A single file may contain segments with multiple entity types or blend Hindi and English. ITN normalizes each entity independently in one pass.

Spoken input (ASR)	Written output (ITN)
कल थ्री फिफ्टी पीएम को पाँच सौ रुपये transfer करना है	कल 15:50 को ₹500 transfer करना है
pay do lakh rupees by fifteenth march	pay ₹2,00,000 by 15th March

Native Script Digits — `itn_native_numerals`

By default, ITN outputs Western Arabic digits (0–9) regardless of language. When format=transcribe is set, you can additionally pass itn_native_numerals=true to render digits in the native script of the target language.

Language	Spoken input	`false` (default)	`true` — native script
Hindi `hi-IN`	पाँच हज़ार रुपये	₹5,000	₹५,०००
English `en-IN`	five thousand rupees	₹5,000	₹5,000 (Latin — no change)

English always outputs Western Arabic digits. itn_native_numerals=true has no effect for en-IN.

What ITN Does Not Change

ITN intentionally preserves idiomatic and ambiguous phrases.

दो तीन (meaning a few) stays as text, not 2 or 3
कर दो / ले दो (imperative verbs) are kept as words, not treated as cardinal 2

Flow Summary

Submit — POST https://api.vachana.ai/stt/v3/batch/submit with X-API-Key-ID, optional X-API-Request-ID, and form fields audio_files, language_code, and optionally is_multi_channel, format, and itn_native_numerals.
Save the job_id from the submit response.
Poll — GET https://api.vachana.ai/stt/v3/batch/status/{job_id} (same auth headers) every 60 seconds until status is completed or failed and results is populated.

Gnani APIs

APIs

Use Cases

Speech-to-Text (Batch)

Overview

Endpoints

Limits & Specifications

Supported Audio Formats

Authentication

Submit a Transcription Job

`POST /stt/v3/batch/submit`

Request — Form Fields

Supported Language Codes

Example — curl

Response — `200 OK`

Errors

Check Job Status

`GET /stt/v3/batch/status/{job_id}`

Path Parameter

Example — curl

Response — `200 OK`

Job-Level Response Fields

Per-File Result Fields — `results[]`

Per-Segment Fields — `results[].segments[]`

Errors

Inverse Text Normalization (ITN)

What ITN Normalizes

Native Script Digits — `itn_native_numerals`

What ITN Does Not Change

Flow Summary

​Overview

​Endpoints

​Limits & Specifications

​Supported Audio Formats

​Authentication

​Submit a Transcription Job

​POST /stt/v3/batch/submit

​Request — Form Fields

​Supported Language Codes

​Example — curl

​Response — 200 OK

​Errors

​Check Job Status

​GET /stt/v3/batch/status/{job_id}

​Path Parameter

​Example — curl

​Response — 200 OK

​Job-Level Response Fields

​Per-File Result Fields — results[]

​Per-Segment Fields — results[].segments[]

​Errors

​Inverse Text Normalization (ITN)

​What ITN Normalizes

​Native Script Digits — itn_native_numerals

​What ITN Does Not Change

​Flow Summary

Overview

Endpoints

Limits & Specifications

Supported Audio Formats

Authentication

Submit a Transcription Job

`POST /stt/v3/batch/submit`

Request — Form Fields

Supported Language Codes

Example — curl

Response — `200 OK`

Errors

Check Job Status

`GET /stt/v3/batch/status/{job_id}`

Path Parameter

Example — curl

Response — `200 OK`

Job-Level Response Fields

Per-File Result Fields — `results[]`

Per-Segment Fields — `results[].segments[]`

Errors

Inverse Text Normalization (ITN)

What ITN Normalizes

Native Script Digits — `itn_native_numerals`

What ITN Does Not Change

Flow Summary