Hugging Face Inference Providers

Hugging Face Inference Providers route requests to 200+ models from leading providers through a single OpenAI-compatible API. Zero markup on pricing. Free monthly credits ($0.10 free, $2 PRO). Supports routing policies (:fastest, :cheapest, :provider) and BYOK.

Hugging Face Inference Providers is a managed gateway that routes inference requests to 200+ models from leading AI providers through a single OpenAI-compatible API. With zero markup on provider pricing, centralized billing, and free monthly credits, it's one of the most accessible ways to use AI models without managing infrastructure. Every HF account receives free monthly credits: $0.10 for free users, $2 for PRO subscribers ($9/mo), and $2/seat for Team/Enterprise organizations. After credits are exhausted, pay-as-you-go billing kicks in at exact provider rates — no HF markup. The platform supports two billing modes: HF-routed (billed through HF) and custom provider keys (BYOK — billed directly by the provider). A standout feature is routing policies: append :fastest, :cheapest, or a specific :provider suffix to model names to control how requests are routed. This makes it easy to optimize for cost or speed without changing model selection logic. The router automatically discovers providers for each model and handles failover. The HF ecosystem integration is unmatched — any model on the Hub with Inference Provider support can be accessed through the API. This includes open-source models like DeepSeek R1, Qwen3, Llama 3.3, Kimi K2.5, and even OpenAI's GPT-OSS models. For OpenClaw, HF is particularly valuable as a secondary provider that gives access to a broad catalog of open-source models through a single API key. The 'hf-inference' backend provides CPU inference for smaller models (BERT, GPT-2, embedding models) with pricing based on compute time rather than tokens. For GPU-accelerated models, requests route through external providers like Together, Fireworks, or Replicate.

Tags: gateway, open-source, free-tier, multi-model, routing-policies, byok, embeddings, zero-markup

Use Cases

  • Access to 200+ open-source models through a single API for OpenClaw
  • Free monthly credits for model experimentation and prototyping
  • Open-source model access (DeepSeek, Qwen, Llama, Kimi) without self-hosting
  • Routing policy optimization — :cheapest for batch work, :fastest for interactive
  • Centralized billing across multiple AI providers for teams
  • Embeddings and image generation alongside chat completions

Tips

  • Use routing policies: append :cheapest to model name for lowest cost, :fastest for lowest latency.
  • Start with free credits for experimentation, upgrade to PRO ($9/mo) for $2/month if HF becomes a regular provider.
  • DeepSeek R1 and Kimi K2.5 through HF are often free/near-free — great for OpenClaw heartbeats and cron jobs.
  • BYOK lets you use existing provider keys through HF's routing for provider switching without code changes.
  • Check huggingface.co/settings/inference-providers/overview for detailed usage breakdown by model and provider.
  • For embeddings, HF's hf-inference backend can run models like sentence-transformers at CPU-based pricing.

Known Issues & Gotchas

  • Free tier is extremely limited ($0.10/month) — enough for testing, not daily use. PRO ($9/mo) gives $2/month.
  • Not all models support all features through every backend. Check per-model capabilities before relying on tool use or vision.
  • Custom provider keys (BYOK) don't receive free credit allowance — only HF-routed requests use credits.
  • The 'hf-inference' backend is CPU-only for smaller models. GPU models route through external providers.
  • Routing policies (:fastest, :cheapest) may not work with all models — depends on how many providers host the model.
  • Team/Enterprise billing requires X-HF-Bill-To header. Easy to miss, defaults to personal billing.
  • Model availability varies — some models have multiple providers for redundancy, others only one.

Alternatives

  • OpenRouter
  • Vercel AI Gateway
  • Together AI
  • Ollama (self-hosted)

Community Feedback

HF Inference Providers is the easiest way to try out open-source models without setting up infrastructure. The routing policies (:fastest, :cheapest) are a nice touch.

— Reddit r/MachineLearning

Zero markup on token pricing through HF is great. Same cost as going direct but with centralized billing and easy provider switching.

— Reddit r/LocalLLaMA

The $0.10/month free tier on HF is extremely limited — enough for maybe 5-10 requests with a big model. PRO's $2 is more usable.

— Hacker News

Configuration Examples

Basic Hugging Face setup

providers:
  huggingface:
    apiKey: hf_xxxxxxxxxxxxxxxxx
    model: huggingface/deepseek-ai/DeepSeek-V3.2

HF with routing policy (cheapest)

providers:
  huggingface:
    apiKey: hf_xxxxxxxxxxxxxxxxx
    model: huggingface/deepseek-ai/DeepSeek-R1:cheapest
    # Routes to lowest-cost provider hosting this model

HF as free fallback for open-source models

providers:
  anthropic:
    apiKey: sk-ant-xxxxx
    model: anthropic/claude-sonnet-4-6
  huggingface:
    apiKey: hf_xxxxxxxxxxxxxxxxx
    model: huggingface/moonshotai/Kimi-K2.5
    # Free Kimi K2.5 through HF as fallback