Hugging Face Inference Providers
Hugging Face Inference Providers route requests to 200+ models from leading providers through a single OpenAI-compatible API. Zero markup on pricing. Free monthly credits ($0.10 free, $2 PRO). Supports routing policies (:fastest, :cheapest, :provider) and BYOK.
Tags: gateway, open-source, free-tier, multi-model, routing-policies, byok, embeddings, zero-markup
Use Cases
- Access to 200+ open-source models through a single API for OpenClaw
- Free monthly credits for model experimentation and prototyping
- Open-source model access (DeepSeek, Qwen, Llama, Kimi) without self-hosting
- Routing policy optimization — :cheapest for batch work, :fastest for interactive
- Centralized billing across multiple AI providers for teams
- Embeddings and image generation alongside chat completions
Tips
- Use routing policies: append :cheapest to model name for lowest cost, :fastest for lowest latency.
- Start with free credits for experimentation, upgrade to PRO ($9/mo) for $2/month if HF becomes a regular provider.
- DeepSeek R1 and Kimi K2.5 through HF are often free/near-free — great for OpenClaw heartbeats and cron jobs.
- BYOK lets you use existing provider keys through HF's routing for provider switching without code changes.
- Check huggingface.co/settings/inference-providers/overview for detailed usage breakdown by model and provider.
- For embeddings, HF's hf-inference backend can run models like sentence-transformers at CPU-based pricing.
Known Issues & Gotchas
- Free tier is extremely limited ($0.10/month) — enough for testing, not daily use. PRO ($9/mo) gives $2/month.
- Not all models support all features through every backend. Check per-model capabilities before relying on tool use or vision.
- Custom provider keys (BYOK) don't receive free credit allowance — only HF-routed requests use credits.
- The 'hf-inference' backend is CPU-only for smaller models. GPU models route through external providers.
- Routing policies (:fastest, :cheapest) may not work with all models — depends on how many providers host the model.
- Team/Enterprise billing requires X-HF-Bill-To header. Easy to miss, defaults to personal billing.
- Model availability varies — some models have multiple providers for redundancy, others only one.
Alternatives
- OpenRouter
- Vercel AI Gateway
- Together AI
- Ollama (self-hosted)
Community Feedback
HF Inference Providers is the easiest way to try out open-source models without setting up infrastructure. The routing policies (:fastest, :cheapest) are a nice touch.
— Reddit r/MachineLearning
Zero markup on token pricing through HF is great. Same cost as going direct but with centralized billing and easy provider switching.
— Reddit r/LocalLLaMA
The $0.10/month free tier on HF is extremely limited — enough for maybe 5-10 requests with a big model. PRO's $2 is more usable.
— Hacker News
Configuration Examples
Basic Hugging Face setup
providers:
huggingface:
apiKey: hf_xxxxxxxxxxxxxxxxx
model: huggingface/deepseek-ai/DeepSeek-V3.2HF with routing policy (cheapest)
providers:
huggingface:
apiKey: hf_xxxxxxxxxxxxxxxxx
model: huggingface/deepseek-ai/DeepSeek-R1:cheapest
# Routes to lowest-cost provider hosting this modelHF as free fallback for open-source models
providers:
anthropic:
apiKey: sk-ant-xxxxx
model: anthropic/claude-sonnet-4-6
huggingface:
apiKey: hf_xxxxxxxxxxxxxxxxx
model: huggingface/moonshotai/Kimi-K2.5
# Free Kimi K2.5 through HF as fallback