Whisper API: Self-Hostable Speech to Text Transcription

Whisper API is an open-source, self-hosted service for speech-to-text. It runs whisper.cpp under the hood and exposes a Deepgram-compatible surface (/v1/listen over REST and WebSocket), so full control of audio and data is maintained while reusing familiar integration patterns.

Screenshot_2026-04-11_at_10.52.37_PM.png

Key features

Familiar API — Drop-in style compatibility with /v1/listen for uploads, JSON bodies (including transcribe-from-URL), and streaming.
Rich output — JSON with timings, plus SRT and VTT export options.
Live streaming — Real-time transcription over WebSockets (16 kHz PCM).
Advanced options — Custom vocabulary-style prompting, optional audio windowing (start / duration) where supported.
Secure operations — API keys via a small CLI; URL ingest includes SSRF-oriented limits (size caps, host controls; redirects off by default).
Documentation — Browse the live whisper.api docs (installation, auth, REST, WebSocket streaming, models, Docker, examples). The same site is built from the docs/ folder (Astro Starlight); run it locally with Bun if you are developing the docs.

Stack (high level)

FastAPI, SQLAlchemy (SQLite by default; PostgreSQL-friendly), async HTTP and WebSockets, and the whisper-cli binary from whisper.cpp. See the repo requirements.txt and setup_whisper.sh for the full setup path.

Quick start

Install dependencies, configure the environment, and pull/build whisper assets:

pip install -r requirements.txt
cp .env.example .env
chmod +x setup_whisper.sh
./setup_whisper.sh

Initialize storage and create an API key:

python -m app.cli init
python -m app.cli create --name "MyAdminKey"

Start the server (default dev port 7860):

uvicorn app.main:app --host 0.0.0.0 --port 7860

For local-only testing, Swagger can expose POST /v1/auth/test-token when ENABLE_TEST_TOKEN_ENDPOINT=true in .env. That flag defaults to off and should stay off in production.

Example: transcribe a file

curl -X POST 'http://localhost:7860/v1/listen' \
  -H "Authorization: Token <YOUR_KEY>" \
  -H "Content-Type: audio/wav" \
  --data-binary @audio.wav

Example: transcribe from URL

The server fetches the audio, with configurable download limits and safety defaults:

curl -X POST 'http://localhost:7860/v1/listen' \
  -H "Authorization: Token <YOUR_KEY>" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/audio.mp3"}'

Documentation

Online: https://whisper.vedgupta.in/docs/ — architecture overview, setup, authentication, REST and WebSocket API reference, models, Docker deployment, code examples, and contributing.

Local (from the repo):

cd docs && bun install && bun run dev