API

Audio transcription

POST /api/v1/audio/transcriptions (also available at the OpenAI-style alias /v1/audio/transcriptions) transcribes uploaded audio using Whisper-class models. Multipart form upload, up to 25 MB per file.

Supported audio formats

mp3, mp4, mpeg, mpga, m4a, wav, webm, flac, ogg.

Request

curl
curl https://api.pendra.ai/api/v1/audio/transcriptions \
  -H "Authorization: Bearer pdr_sk_..." \
  -F file=@meeting.mp3 \
  -F model=whisper-large-v3-turbo \
  -F language=en

Fields

  • file — the audio file (multipart).
  • model — Whisper-class model ID, e.g. whisper-large-v3-turbo.
  • language — optional ISO 639-1 code (e.g. en, cy).
  • prompt — optional priming text (acronyms, names).
  • temperature — sampling temperature; default 0.
  • response_formatjson (default), text, srt, or vtt.

Response

The response shape depends on response_format. Default is json:

{
  "text": "We're starting the meeting now. First item on the agenda is the Q3 forecast."
}

verbose_json includes timing and language metadata, plus optional word- or segment-level timestamps when timestamp_granularities is set:

{
  "task": "transcribe",
  "language": "en",
  "duration": 12.48,
  "text": "We're starting the meeting now. First item on the agenda is the Q3 forecast.",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 4.2,
      "text": "We're starting the meeting now.",
      "avg_logprob": -0.21,
      "no_speech_prob": 0.01
    },
    {
      "id": 1,
      "start": 4.2,
      "end": 12.48,
      "text": "First item on the agenda is the Q3 forecast.",
      "avg_logprob": -0.18,
      "no_speech_prob": 0.02
    }
  ]
}

With response_format=text, you get a plain UTF-8 string body (no JSON envelope).

Subtitle output

Set response_format=srt (or vtt) for time-coded captions instead of a plain JSON transcript. The body is the raw subtitle file:

1
00:00:00,000 --> 00:00:04,200
We're starting the meeting now.

2
00:00:04,200 --> 00:00:12,480
First item on the agenda is the Q3 forecast.
curl
curl https://api.pendra.ai/api/v1/audio/transcriptions \
  -H "Authorization: Bearer pdr_sk_..." \
  -F file=@meeting.mp3 \
  -F model=whisper-large-v3-turbo \
  -F response_format=srt

Python

transcribe.py
from pendra import Pendra

client = Pendra()

with open("meeting.mp3", "rb") as f:
    result = client.audio.transcriptions.create(
        model="whisper-large-v3-turbo",
        file=f,
        language="en",
    )
print(result.text)

Available backends

Transcription requests route to workers running Speaches, our Whisper-family transcription backend.