Deepgram (Audio Transcription)

Deepgram is a speech-to-text API powering inbound audio and voice note transcription in OpenClaw. Uses the Nova-3 model for highly accurate pre-recorded transcription at $0.0077/min. Not an LLM provider — transcription only. $200 free credit on signup.

Deepgram is a specialized speech-to-text (STT) API provider that powers voice note and audio transcription in OpenClaw. Unlike the other providers in ClawPedia which serve as LLM chat providers, Deepgram has a single focused purpose: converting audio to text with high accuracy and low latency. OpenClaw uses Deepgram to automatically transcribe voice messages received on Telegram, WhatsApp, and other messaging channels. When a user sends a voice note, OpenClaw sends it to Deepgram's Nova-3 model for transcription, then processes the resulting text through the configured LLM provider. This enables voice-first interactions without the user needing to type. Deepgram's Nova-3 is their flagship speech-to-text model, delivering industry-leading accuracy with word error rates significantly below competitors. It supports 45+ languages with features like speaker diarization (identifying who said what), smart formatting (punctuation, casing, dates), keyterm prompting (boosting accuracy for domain-specific terms), and automatic language detection. The model handles both pre-recorded audio and real-time streaming. Pricing is straightforward: Nova-3 monolingual costs $0.0077/min ($0.46/hr) on pay-as-you-go, with the Growth plan offering ~15% savings at $0.0065/min. Every new account receives $200 in free credit with no credit card required and no expiration — enough for approximately 430 hours of transcription. This generous free tier means most OpenClaw users will never need to pay for Deepgram. Deepgram also offers Text-to-Speech (Aura-2) and a Voice Agent API, though OpenClaw currently only uses the STT capability. The platform is SOC 2 Type 2 certified, HIPAA compliant, and GDPR-ready with EU data residency options — making it suitable for enterprise and healthcare applications.

Tags: transcription, speech-to-text, audio, voice-notes, nova-3, diarization, multilingual, enterprise

Use Cases

  • Automatic voice note transcription for OpenClaw on Telegram and WhatsApp
  • Voice-first AI interactions — speak to your agent instead of typing
  • Transcribing meeting recordings for summarization by an LLM
  • Multilingual voice support for international users (45+ languages)
  • Accessibility — enabling voice input for users who prefer speaking over typing
  • Podcast and audio content transcription for analysis by OpenClaw

Tips

  • The $200 free credit is generous — at $0.0077/min, it covers ~430 hours of transcription. Most personal OpenClaw users won't need to pay.
  • Enable smart formatting in your API calls for better readability — automatic punctuation, casing, and date formatting.
  • Use keyterm prompting for domain-specific vocabulary. Boost accuracy for product names, technical terms, or jargon.
  • Speaker diarization is free and useful for transcribing multi-person audio in group voice notes.
  • For OpenClaw, Deepgram is configured once and works automatically — no per-message configuration needed.
  • Check the Deepgram console for detailed usage analytics and remaining credit balance.

Known Issues & Gotchas

  • Deepgram is NOT an LLM provider. It only handles audio transcription. You still need a separate chat provider (Anthropic, OpenAI, etc.) for AI responses.
  • The $200 free credit doesn't expire but applies only to pay-as-you-go. Heavy production use should consider Growth plans for better rates.
  • Nova-3 multilingual is more expensive ($0.0092/min) than monolingual ($0.0077/min). OpenClaw typically uses monolingual for single-language voice notes.
  • Streaming transcription has slightly different pricing and concurrency limits than pre-recorded (REST) transcription.
  • Speaker diarization, redaction, and smart formatting are included at no extra cost — but need to be enabled in API parameters.
  • Concurrency limits: Pay-As-You-Go supports up to 100 REST and 150 WSS concurrent connections. Usually sufficient but can be hit in batch processing.
  • Audio quality affects transcription accuracy. Low-quality voice notes (noisy environments, compression) may produce errors even with Nova-3.

Alternatives

  • OpenAI Whisper API
  • Whisper (self-hosted)
  • Google Cloud Speech-to-Text
  • AssemblyAI

Community Feedback

Deepgram Nova-3 is the best API-based STT I've used. Significantly more accurate than Whisper API and much faster. The $200 free credit is incredibly generous.

— Reddit r/speechrecognition

Deepgram at $0.0077/min vs OpenAI Whisper at similar pricing — Deepgram wins on speed and accuracy. The real-time streaming is particularly impressive for live transcription.

— Hacker News

The $200 free credit on Deepgram is no-expiration, which is amazing. For personal use with occasional voice notes, it could last years. Way more generous than most AI free tiers.

— Reddit r/selfhosted

Frequently Asked Questions

Do I need Deepgram for OpenClaw?

Only if you want voice note support. If you receive voice messages on Telegram or WhatsApp, Deepgram transcribes them so your LLM can process them. If you only use text, you don't need Deepgram.

How long does Deepgram's $200 free credit last?

The credit never expires. At $0.0077/min (Nova-3), $200 covers approximately 430 hours of transcription. For personal voice note use with OpenClaw, this could last several years.

Is Deepgram better than OpenAI Whisper?

Deepgram Nova-3 generally outperforms OpenAI Whisper API on accuracy and speed. Deepgram cuts word errors by more than half compared to competitors in benchmarks. The $200 free credit is also more generous than OpenAI's offering.

Can Deepgram handle multiple languages?

Yes, Nova-3 supports 45+ languages with automatic language detection. The multilingual model costs slightly more ($0.0092/min vs $0.0077/min monolingual). Great for international OpenClaw users.

Does Deepgram work with Telegram voice notes?

Yes. OpenClaw automatically sends Telegram voice notes to Deepgram for transcription. The transcribed text is then processed by your configured LLM provider. No special configuration needed beyond setting the API key.

What's the difference between Deepgram and the LLM providers?

Deepgram is a speech-to-text service only — it converts audio to text. LLM providers (Anthropic, OpenAI, etc.) process text and generate AI responses. In OpenClaw, Deepgram handles the audio→text step, then the LLM handles the text→response step.

Configuration Examples

Basic Deepgram setup for OpenClaw

providers:
  deepgram:
    apiKey: your-deepgram-api-key
    # Handles voice note transcription automatically

Deepgram with LLM provider

providers:
  anthropic:
    apiKey: sk-ant-xxxxx
    model: anthropic/claude-sonnet-4-6
  deepgram:
    apiKey: your-deepgram-api-key
    # Voice notes → Deepgram (transcription) → Anthropic (AI response)