API
Audio transcription
POST /api/v1/audio/transcriptions (also available at the
OpenAI-style alias /v1/audio/transcriptions) transcribes
uploaded audio using Whisper-class models. Multipart form upload, up to
25 MB per file.
Supported audio formats
mp3, mp4, mpeg, mpga,
m4a, wav, webm, flac,
ogg.
Request
curl https://api.pendra.ai/api/v1/audio/transcriptions \
-H "Authorization: Bearer pdr_sk_..." \
-F file=@meeting.mp3 \
-F model=whisper-large-v3-turbo \
-F language=en Fields
file— the audio file (multipart).model— Whisper-class model ID, e.g.whisper-large-v3-turbo.language— optional ISO 639-1 code (e.g.en,cy).prompt— optional priming text (acronyms, names).temperature— sampling temperature; default 0.response_format—json(default),text,srt, orvtt.
Response
The response shape depends on response_format. Default is
json:
{
"text": "We're starting the meeting now. First item on the agenda is the Q3 forecast."
} verbose_json includes timing and language metadata, plus
optional word- or segment-level timestamps when
timestamp_granularities is set:
{
"task": "transcribe",
"language": "en",
"duration": 12.48,
"text": "We're starting the meeting now. First item on the agenda is the Q3 forecast.",
"segments": [
{
"id": 0,
"start": 0.0,
"end": 4.2,
"text": "We're starting the meeting now.",
"avg_logprob": -0.21,
"no_speech_prob": 0.01
},
{
"id": 1,
"start": 4.2,
"end": 12.48,
"text": "First item on the agenda is the Q3 forecast.",
"avg_logprob": -0.18,
"no_speech_prob": 0.02
}
]
}
With response_format=text, you get a plain UTF-8 string body
(no JSON envelope).
Subtitle output
Set response_format=srt (or vtt) for time-coded
captions instead of a plain JSON transcript. The body is the raw subtitle
file:
1
00:00:00,000 --> 00:00:04,200
We're starting the meeting now.
2
00:00:04,200 --> 00:00:12,480
First item on the agenda is the Q3 forecast. curl https://api.pendra.ai/api/v1/audio/transcriptions \
-H "Authorization: Bearer pdr_sk_..." \
-F file=@meeting.mp3 \
-F model=whisper-large-v3-turbo \
-F response_format=srt Python
from pendra import Pendra
client = Pendra()
with open("meeting.mp3", "rb") as f:
result = client.audio.transcriptions.create(
model="whisper-large-v3-turbo",
file=f,
language="en",
)
print(result.text) Available backends
Transcription requests route to workers running Speaches, our Whisper-family transcription backend.