Whisper (Local)
Local speech-to-text with the Whisper CLI. No API key needed, runs entirely on-device.
Tags: transcription, speech-to-text, whisper, local, offline
Category: AI
Use Cases
- Privacy-sensitive transcription: medical, legal, confidential meetings
- Offline transcription in air-gapped or low-connectivity environments
- Unlimited free transcription for high-volume audio processing
- Voice memo pipeline: record → transcribe → summarize → notes
- Podcast processing without API costs
- Multilingual transcription across 99 languages
- Automated transcription pipeline via cron for new audio files
Tips
- Start with the 'medium' model for the best speed/accuracy tradeoff on Apple Silicon
- Use whisper.cpp instead of the Python CLI for 2-4x faster inference on macOS
- Use faster-whisper (CTranslate2) for the best performance on NVIDIA GPUs
- Add `--language en` to skip auto-detection and improve speed when you know the language
- Use `--output_format all` to get txt, srt, vtt, tsv, and json simultaneously
- For long recordings, split into segments first: ffmpeg reduces memory pressure
- Combine with OpenClaw cron for automated transcription of new audio files in a watched folder
- Pre-process noisy audio: `ffmpeg -i input.mp3 -af 'highpass=f=200,lowpass=f=3000' clean.wav`
Known Issues & Gotchas
- First run downloads model weights (large-v3 is ~3GB) — needs internet once, then fully offline
- Python-based — requires Python 3.8+ and PyTorch, which can be heavy on disk
- CPU-only transcription is 5-10x slower than real-time on the large model — GPU/MPS acceleration recommended
- The 'tiny' and 'base' models hallucinate more on noisy audio — use 'medium' or 'large' for reliability
- No streaming/real-time transcription — batch processing only (file in, text out)
- Whisper can hallucinate repeated phrases on silent segments — especially with the large model
- Apple Silicon Macs use MPS acceleration automatically with PyTorch — but whisper.cpp with Metal is faster
- Memory usage scales with model size: large-v3 needs ~10GB RAM during inference
Alternatives
- OpenAI Whisper API (cloud)
- whisper.cpp
- faster-whisper
- Deepgram Nova-2
- Sherpa ONNX TTS (reverse: STT)
Community Feedback
My self-hosted app uses local Whisper for transcription. The whole pipeline is self-hosted. It uses a locally-hosted Whisper or ASR model for the transcription, and all the smart features run locally too.
— Reddit r/LocalLLaMA
My experience with whisper.cpp — local no-dependency transcription. On Apple Silicon it's remarkably fast. The large-v3 model gives accuracy that rivals cloud APIs.
— Reddit r/LocalLLaMA
The open-source Whisper model downloaded and run locally from the GitHub repository is safe in the sense that your audio data is not sent anywhere.
— OpenAI Community
Users find OpenAI Whisper's ease of use exceptional, with simple integration and support for various platforms. The local version is free and unlimited.
— G2 Reviews
Configuration Examples
Install and first transcription
# Install via Homebrew (includes Python dependencies)
brew install openai-whisper
# Or via pip
pip install openai-whisper
# Transcribe with medium model (good balance)
whisper audio.mp3 --model medium --language en
# First run downloads the model (~1.5GB for medium)Batch transcription with multiple outputs
# Generate all output formats
whisper meeting.wav --model large-v3 --output_format all --output_dir ./transcripts/
# Produces: meeting.txt, meeting.srt, meeting.vtt, meeting.tsv, meeting.jsonwhisper.cpp alternative (faster on Mac)
# Install whisper.cpp
brew install whisper-cpp
# Download model
whisper-cpp-download-ggml-model large-v3
# Transcribe (Metal accelerated on Apple Silicon)
whisper-cpp -m ~/.whisper/ggml-large-v3.bin -f audio.wav -otxt -osrtInstallation
brew install openai-whisperHomepage: https://openai.com/research/whisper
Source: bundled