Ollama (Local Models)

Local LLM runtime for open-source models. Supports native API with streaming and tool calling. Auto-discovers models from local instance. Free — all costs $0.

Ollama is a local LLM runtime that makes running open-source models as easy as pulling a Docker image. Install Ollama, run `ollama pull llama3.3`, and you have a local model serving on port 11434 with an API ready to go. No API keys, no cloud bills, no data leaving your machine. For OpenClaw, Ollama serves as the zero-cost local inference option. It's ideal for heartbeats, simple cron jobs, and tasks where you don't need frontier-model intelligence. Models like Llama 3.3 70B and DeepSeek R1 32B can handle basic tool calling, summarization, and classification without spending a cent. The tradeoff is obvious: local models are significantly less capable than Claude or GPT for complex reasoning, creative writing, and nuanced instruction-following. Ollama supports the full spectrum of open-source models including Llama, DeepSeek, Qwen, GLM, Gemma, Mistral, and many others. It handles model quantization automatically (GGUF format), manages GPU/CPU offloading, and provides a native API with streaming and tool calling support. For machines with capable GPUs (or Apple Silicon Macs with good RAM), inference is surprisingly fast. Recent developments have been controversial in the community. Ollama introduced a Cloud option for running proprietary models, which some users see as mission creep away from its core local-first philosophy. Despite the debate, Ollama remains the most popular and easiest way to run local models, with a massive library and dead-simple setup.

Tags: local, free, open-source, self-hosted

Use Cases

  • Zero-cost heartbeat and cron job execution for OpenClaw
  • Privacy-sensitive workflows where data must not leave your machine
  • Offline agent operation without internet connectivity
  • Local development and testing before deploying with cloud models
  • Simple classification, summarization, and routing tasks
  • Learning and experimenting with different open-source models

Tips

  • Use Ollama for heartbeats, cron jobs, and simple classification tasks to save on API costs. Reserve cloud models for complex work.
  • On Apple Silicon Macs with 32GB+ RAM, Llama 3.3 70B runs well and handles most basic OpenClaw tasks.
  • Keep a model loaded by setting OLLAMA_KEEP_ALIVE to a longer duration — avoids cold-start latency between requests.
  • Use DeepSeek R1 32B for tasks that benefit from reasoning/thinking — it's the best local reasoning model.
  • Set Ollama as a secondary provider in OpenClaw and switch to it per-session with /model for cost-free experimentation.
  • Run `ollama list` to see downloaded models and their sizes. Remove unused models with `ollama rm` to free disk space.

Known Issues & Gotchas

  • Local models are significantly less capable than frontier cloud models (Claude, GPT) for complex reasoning, creative writing, and multi-step tool use.
  • GPU memory (VRAM) limits which models you can run. A 70B model needs ~40GB VRAM for full precision, or ~20-35GB quantized.
  • Apple Silicon Macs use unified memory — a 32GB M-series Mac can run 30B-70B models, but performance depends heavily on available RAM.
  • Ollama requires a 'dummy' API key in OpenClaw config for provider detection — set any non-empty string.
  • Tool calling support varies by model. Not all models handle OpenClaw's tool-use patterns well — Llama 3.3 and GLM 4.7 are among the best.
  • First model load is slow (loading weights into memory). Subsequent requests are fast if the model stays loaded.
  • The Cloud feature added proprietary model access, which some users consider a departure from Ollama's local-first mission.

Alternatives

  • vLLM
  • Together AI
  • Hugging Face Inference
  • LM Studio

Community Feedback

Over the past few months, there's been a serious decline in the updates and update content that releases with Ollama. Then the Cloud update dropped — it feels like they are seriously straying away from the main purpose: to provide a secure inference platform for LOCAL AI models.

— Reddit r/LocalLLaMA

Ollama is still the easiest way to get started with local models. Pull a model, run it. No config files, no environment setup. It just works.

— Reddit r/ollama

For OpenClaw heartbeats and simple tasks, Ollama with a small model is perfect. Zero cost, fast enough, and keeps your API budget for the complex stuff.

— Reddit r/LocalLLaMA

Configuration Examples

Basic Ollama local setup

providers:
  ollama:
    apiKey: ollama  # dummy key required
    baseUrl: http://127.0.0.1:11434
    model: ollama/llama3.3

Ollama as secondary provider

providers:
  anthropic:
    apiKey: sk-ant-xxxxx
    model: anthropic/claude-sonnet-4-6
  ollama:
    apiKey: ollama
    baseUrl: http://127.0.0.1:11434
    model: ollama/llama3.3
    # Switch per-session: /model ollama/llama3.3

Ollama with reasoning model

providers:
  ollama:
    apiKey: ollama
    baseUrl: http://127.0.0.1:11434
    model: ollama/deepseek-r1:32b
thinking: on  # Enable thinking for reasoning model