Submit one or more audio files for transcription and receive a job_id immediately. Poll the status endpoint on a fixed interval until the job completes and transcripts are available. Ideal for long recordings, bulk files, or offline pipelines where you do not need a live response. For real-time transcription, see STT Realtime. For short clips under 60 seconds, see STT REST.
Send these headers on every request both submit and status calls.
Header
Required
Description
X-API-Key-ID
Yes
Your API key. Required for all requests.
X-API-Request-ID
No
A unique trace ID (e.g. UUID) you assign. Used to correlate your logs with platform logs or support. If omitted, the platform may generate one.
Do not set Content-Type: application/json on the submit request. Use multipart/form-data. curl sets the correct boundary automatically when you use -F / --form.
Upload audio files and kick off an asynchronous transcription job. The response returns a job_id immediately. the files are not yet transcribed at this point.
Audio files to transcribe. Add one audio_files field per file. Accepts 1–10 files, each under 1 hour. Formats: AAC, WAV, FLAC, ALAC, OGG, Opus. Total body must not exceed 80 MB.
Set to true if the audio is multi-channel (e.g. stereo or per-speaker tracks). Set to false for standard mono audio. Defaults to false.
format
No
string
Output format for transcripts. Set to transcribe to enable Inverse Text Normalization (ITN) — numbers, currency, dates, and phone numbers are converted to written form. Set to verbatim for raw spoken-form output. Defaults to verbatim. Currently supported for hi-IN and en-IN only.
itn_native_numerals
No
boolean
When format=transcribe, set to true to render digits in the native script of the target language (e.g. Devanagari numerals for Hindi). Has no effect when format=verbatim. Defaults to false. See the ITN section below for full details.
{ "job_id": "batch_7f3a92c1d4e8", "status": "submitted", "file_count": 2, "message": "Job accepted. Poll the status endpoint every 60 seconds for results."}
Field
Type
Description
job_id
string
Identifier for this job. Use it in the status URL.
Poll this endpoint to check progress and retrieve transcription results once the job finishes. Call this once every 60 seconds per job_id. do not poll more frequently.
When format=transcribe is passed in the form body, ITN runs on every file’s transcript after recognition — converting spoken-form numbers, currency, dates, times, and phone numbers into the compact written form a reader expects.
ITN is currently supported for Hindi (hi-IN) and English (en-IN) only. Enabling ITN for other languages has no effect — transcripts are returned as-is.
By default, ITN outputs Western Arabic digits (0–9) regardless of language. When format=transcribe is set, you can additionally pass itn_native_numerals=true to render digits in the native script of the target language.
Language
Spoken input
false (default)
true — native script
Hindi hi-IN
पाँच हज़ार रुपये
₹5,000
₹५,०००
English en-IN
five thousand rupees
₹5,000
₹5,000 (Latin — no change)
English always outputs Western Arabic digits. itn_native_numerals=true has no effect for en-IN.
Submit — POST https://api.vachana.ai/stt/v3/batch/submit with X-API-Key-ID, optional X-API-Request-ID, and form fields audio_files, language_code, and optionally is_multi_channel, format, and itn_native_numerals.
Save the job_id from the submit response.
Poll — GET https://api.vachana.ai/stt/v3/batch/status/{job_id} (same auth headers) every 60 seconds until status is completed or failed and results is populated.