Ollama (Local Models)
Local LLM runtime for open-source models. Supports native API with streaming and tool calling. Auto-discovers models from local instance. Free — all costs $0.
Tags: local, free, open-source, self-hosted
Use Cases
- Zero-cost heartbeat and cron job execution for OpenClaw
- Privacy-sensitive workflows where data must not leave your machine
- Offline agent operation without internet connectivity
- Local development and testing before deploying with cloud models
- Simple classification, summarization, and routing tasks
- Learning and experimenting with different open-source models
Tips
- Use Ollama for heartbeats, cron jobs, and simple classification tasks to save on API costs. Reserve cloud models for complex work.
- On Apple Silicon Macs with 32GB+ RAM, Llama 3.3 70B runs well and handles most basic OpenClaw tasks.
- Keep a model loaded by setting OLLAMA_KEEP_ALIVE to a longer duration — avoids cold-start latency between requests.
- Use DeepSeek R1 32B for tasks that benefit from reasoning/thinking — it's the best local reasoning model.
- Set Ollama as a secondary provider in OpenClaw and switch to it per-session with /model for cost-free experimentation.
- Run `ollama list` to see downloaded models and their sizes. Remove unused models with `ollama rm` to free disk space.
Known Issues & Gotchas
- Local models are significantly less capable than frontier cloud models (Claude, GPT) for complex reasoning, creative writing, and multi-step tool use.
- GPU memory (VRAM) limits which models you can run. A 70B model needs ~40GB VRAM for full precision, or ~20-35GB quantized.
- Apple Silicon Macs use unified memory — a 32GB M-series Mac can run 30B-70B models, but performance depends heavily on available RAM.
- Ollama requires a 'dummy' API key in OpenClaw config for provider detection — set any non-empty string.
- Tool calling support varies by model. Not all models handle OpenClaw's tool-use patterns well — Llama 3.3 and GLM 4.7 are among the best.
- First model load is slow (loading weights into memory). Subsequent requests are fast if the model stays loaded.
- The Cloud feature added proprietary model access, which some users consider a departure from Ollama's local-first mission.
Alternatives
- vLLM
- Together AI
- Hugging Face Inference
- LM Studio
Community Feedback
Over the past few months, there's been a serious decline in the updates and update content that releases with Ollama. Then the Cloud update dropped — it feels like they are seriously straying away from the main purpose: to provide a secure inference platform for LOCAL AI models.
— Reddit r/LocalLLaMA
Ollama is still the easiest way to get started with local models. Pull a model, run it. No config files, no environment setup. It just works.
— Reddit r/ollama
For OpenClaw heartbeats and simple tasks, Ollama with a small model is perfect. Zero cost, fast enough, and keeps your API budget for the complex stuff.
— Reddit r/LocalLLaMA
Configuration Examples
Basic Ollama local setup
providers:
ollama:
apiKey: ollama # dummy key required
baseUrl: http://127.0.0.1:11434
model: ollama/llama3.3Ollama as secondary provider
providers:
anthropic:
apiKey: sk-ant-xxxxx
model: anthropic/claude-sonnet-4-6
ollama:
apiKey: ollama
baseUrl: http://127.0.0.1:11434
model: ollama/llama3.3
# Switch per-session: /model ollama/llama3.3Ollama with reasoning model
providers:
ollama:
apiKey: ollama
baseUrl: http://127.0.0.1:11434
model: ollama/deepseek-r1:32b
thinking: on # Enable thinking for reasoning model