Sherpa ONNX TTS (Local)
Local text-to-speech via sherpa-onnx. Fully offline, no cloud required. Supports multiple voice models.
Tags: tts, voice, offline, local, speech
Category: Voice
Use Cases
- Offline voice output for OpenClaw on air-gapped or privacy-sensitive systems
- Free unlimited TTS for notifications, reminders, and alerts
- Embedded systems: Raspberry Pi, home automation voice announcements
- Multilingual TTS without cloud API costs
- Fallback TTS when internet is unavailable
- IoT and smart home voice announcements
- Accessibility: screen reader alternative for custom applications
Tips
- Start with the Kokoro English model for the best quality-to-speed ratio on macOS/Linux
- Use MeloTTS for multilingual needs — it handles Chinese, Japanese, Korean, and more
- Convert output to mp3 for smaller files: `ffmpeg -i output.wav -codec:a libmp3lame output.mp3`
- Pre-download models during setup — they're needed only once, then fully offline
- On Apple Silicon, the runtime uses accelerated ONNX inference — performance is excellent
- For OpenClaw, set env vars in openclaw.json so the skill auto-discovers the runtime
- Pair with cron for scheduled spoken reminders without any API costs
- Test multiple voice models — quality varies dramatically between them
Known Issues & Gotchas
- Setup is more involved than cloud TTS — you need to download both the runtime binary AND a voice model separately
- Voice quality is noticeably below ElevenLabs or Google Cloud TTS — good enough for notifications, not for production audio
- The runtime directory and model directory must be configured via environment variables (SHERPA_ONNX_RUNTIME_DIR, SHERPA_ONNX_MODEL_DIR)
- Model files can be large (100MB-1GB) — choose carefully for storage-constrained devices
- Not all voice models support all languages — check model compatibility before downloading
- Output is WAV only by default — you'll need ffmpeg to convert to mp3 or other formats
- CPU inference speed varies: fast on Apple Silicon, slower on older x86 hardware or Raspberry Pi
- No streaming playback built-in — generates the full WAV file before you can play it
Alternatives
- SAG (ElevenLabs TTS)
- Piper TTS
- macOS say
- Coqui TTS
- OpenClaw built-in tts tool
Community Feedback
There are local neural TTS engines for android that work pretty well and have flawless intonations. Two projects which work amazingly: Piper and sherpa-onnx. Things are going really great for on-device TTS.
— Reddit r/androidapps
whisper.cpp vs sherpa-onnx vs something else for speech? I'm looking to run my own endpoint on my server for my apps. Sherpa-onnx supports both STT and TTS in one framework.
— Reddit r/LocalLLaMA
sherpa-onnx-tts provides a local, offline command-line wrapper around the sherpa-onnx TTS runtime to synthesize speech without cloud services. Produces WAV output from text input.
— LobeHub Skills Marketplace
Squawk uses sherpa-onnx for real-time local text-to-speech with AI. The engine handles synthesis without any cloud API dependency.
— OBS Forum
Configuration Examples
Download runtime and model
# Download runtime for macOS ARM64
mkdir -p ~/.openclaw/tools/sherpa-onnx-tts
cd ~/.openclaw/tools/sherpa-onnx-tts
# Download from GitHub releases
# https://github.com/k2-fsa/sherpa-onnx/releases
# Download a voice model (e.g., Kokoro English)
# https://k2-fsa.github.io/sherpa/onnx/tts/all-in-one.htmlConfigure environment
# Set environment variables
export SHERPA_ONNX_RUNTIME_DIR=~/.openclaw/tools/sherpa-onnx-tts/runtime
export SHERPA_ONNX_MODEL_DIR=~/.openclaw/tools/sherpa-onnx-tts/models/kokoro-en
# Or configure in openclaw.json env sectionGenerate speech
# Basic text-to-speech
bin/sherpa-onnx-tts -o output.wav "Hello from OpenClaw, running fully offline!"
# Convert to mp3
ffmpeg -i output.wav -codec:a libmp3lame -qscale:a 2 output.mp3
# Play on macOS
afplay output.wavInstallation
# Download runtime + model (see SKILL.md)Source: bundled