OpenRouter Transcription

Multi-lingual audio transcription via OpenRouter (Gemini, etc). Available on ClawHub.

OpenRouter Transcription by @obviyus is a ClawHub skill that provides multi-lingual audio transcription using OpenRouter's model routing. Instead of being locked into a single transcription provider like Whisper, this skill routes audio through OpenRouter to access models like Gemini that excel at multi-lingual transcription with context awareness and speaker diarization. The skill is particularly valuable for OpenClaw users who already have an OpenRouter API key configured. Rather than setting up a separate Whisper API key or running a local Whisper instance, you can leverage your existing OpenRouter credits for transcription. The skill handles audio format conversion, chunking for long recordings, and language detection automatically. Available on ClawHub for one-command installation, this skill integrates cleanly with OpenClaw's voice note handling. When someone sends a voice message on Telegram or WhatsApp, the skill can automatically transcribe it and include the text in the conversation context. For multilingual users or those receiving voice notes in various languages, the auto-detection is a significant quality-of-life improvement over single-language transcription tools.

Tags: transcription, multilingual, skill

Category: voice

Tips

Install from ClawHub with `openclaw skill add obviyus/openrouter-transcribe` — requires an existing OpenRouter API key
Use Gemini Flash through OpenRouter for fast, cost-effective transcription of short voice notes
For longer audio files, the skill automatically handles chunking — but check your OpenRouter rate limits for large batches
Pair with Telegram's voice note feature for seamless voice-to-text in conversations

Community Feedback

Multi-lingual audio transcription via OpenRouter. Supports Gemini and other models for accurate transcription across dozens of languages.
— ClawHub

Multi-lingual audio transcription via OpenRouter (Gemini, etc). Available on ClawHub. No separate Whisper setup needed.
— OpenClaw Showcase

Frequently Asked Questions

How does this compare to OpenAI Whisper?

Whisper is purpose-built for transcription and is highly accurate. This skill offers flexibility — you can use Gemini or other models through OpenRouter, which may handle multilingual content and context better. It also avoids needing a separate OpenAI API key.

Which languages are supported?

Language support depends on the underlying model you route through OpenRouter. Gemini supports 100+ languages. The skill auto-detects the spoken language, so you don't need to specify it in advance.

Does it handle background noise well?

Modern vision/audio models like Gemini are reasonably robust to background noise, but quality depends on the recording. For noisy environments, recording closer to the speaker and using higher-quality voice notes helps significantly.