Deepgram (Audio Transcription)
Deepgram is a speech-to-text API powering inbound audio and voice note transcription in OpenClaw. Uses the Nova-3 model for highly accurate pre-recorded transcription at $0.0077/min. Not an LLM provider — transcription only. $200 free credit on signup.
Tags: transcription, speech-to-text, audio, voice-notes, nova-3, diarization, multilingual, enterprise
Use Cases
- Automatic voice note transcription for OpenClaw on Telegram and WhatsApp
- Voice-first AI interactions — speak to your agent instead of typing
- Transcribing meeting recordings for summarization by an LLM
- Multilingual voice support for international users (45+ languages)
- Accessibility — enabling voice input for users who prefer speaking over typing
- Podcast and audio content transcription for analysis by OpenClaw
Tips
- The $200 free credit is generous — at $0.0077/min, it covers ~430 hours of transcription. Most personal OpenClaw users won't need to pay.
- Enable smart formatting in your API calls for better readability — automatic punctuation, casing, and date formatting.
- Use keyterm prompting for domain-specific vocabulary. Boost accuracy for product names, technical terms, or jargon.
- Speaker diarization is free and useful for transcribing multi-person audio in group voice notes.
- For OpenClaw, Deepgram is configured once and works automatically — no per-message configuration needed.
- Check the Deepgram console for detailed usage analytics and remaining credit balance.
Known Issues & Gotchas
- Deepgram is NOT an LLM provider. It only handles audio transcription. You still need a separate chat provider (Anthropic, OpenAI, etc.) for AI responses.
- The $200 free credit doesn't expire but applies only to pay-as-you-go. Heavy production use should consider Growth plans for better rates.
- Nova-3 multilingual is more expensive ($0.0092/min) than monolingual ($0.0077/min). OpenClaw typically uses monolingual for single-language voice notes.
- Streaming transcription has slightly different pricing and concurrency limits than pre-recorded (REST) transcription.
- Speaker diarization, redaction, and smart formatting are included at no extra cost — but need to be enabled in API parameters.
- Concurrency limits: Pay-As-You-Go supports up to 100 REST and 150 WSS concurrent connections. Usually sufficient but can be hit in batch processing.
- Audio quality affects transcription accuracy. Low-quality voice notes (noisy environments, compression) may produce errors even with Nova-3.
Alternatives
- OpenAI Whisper API
- Whisper (self-hosted)
- Google Cloud Speech-to-Text
- AssemblyAI
Community Feedback
Deepgram Nova-3 is the best API-based STT I've used. Significantly more accurate than Whisper API and much faster. The $200 free credit is incredibly generous.
— Reddit r/speechrecognition
Deepgram at $0.0077/min vs OpenAI Whisper at similar pricing — Deepgram wins on speed and accuracy. The real-time streaming is particularly impressive for live transcription.
— Hacker News
The $200 free credit on Deepgram is no-expiration, which is amazing. For personal use with occasional voice notes, it could last years. Way more generous than most AI free tiers.
— Reddit r/selfhosted
Frequently Asked Questions
Do I need Deepgram for OpenClaw?
Only if you want voice note support. If you receive voice messages on Telegram or WhatsApp, Deepgram transcribes them so your LLM can process them. If you only use text, you don't need Deepgram.
How long does Deepgram's $200 free credit last?
The credit never expires. At $0.0077/min (Nova-3), $200 covers approximately 430 hours of transcription. For personal voice note use with OpenClaw, this could last several years.
Is Deepgram better than OpenAI Whisper?
Deepgram Nova-3 generally outperforms OpenAI Whisper API on accuracy and speed. Deepgram cuts word errors by more than half compared to competitors in benchmarks. The $200 free credit is also more generous than OpenAI's offering.
Can Deepgram handle multiple languages?
Yes, Nova-3 supports 45+ languages with automatic language detection. The multilingual model costs slightly more ($0.0092/min vs $0.0077/min monolingual). Great for international OpenClaw users.
Does Deepgram work with Telegram voice notes?
Yes. OpenClaw automatically sends Telegram voice notes to Deepgram for transcription. The transcribed text is then processed by your configured LLM provider. No special configuration needed beyond setting the API key.
What's the difference between Deepgram and the LLM providers?
Deepgram is a speech-to-text service only — it converts audio to text. LLM providers (Anthropic, OpenAI, etc.) process text and generate AI responses. In OpenClaw, Deepgram handles the audio→text step, then the LLM handles the text→response step.
Configuration Examples
Basic Deepgram setup for OpenClaw
providers:
deepgram:
apiKey: your-deepgram-api-key
# Handles voice note transcription automaticallyDeepgram with LLM provider
providers:
anthropic:
apiKey: sk-ant-xxxxx
model: anthropic/claude-sonnet-4-6
deepgram:
apiKey: your-deepgram-api-key
# Voice notes → Deepgram (transcription) → Anthropic (AI response)