Best Local Models for OpenClaw (Free Tier Guide)
Last Updated: February 2026
Running OpenClaw with local models sounds ideal: no API costs, full privacy, no rate limits. The reality is more complicated.
This guide tells you exactly which local models work well with OpenClaw, which ones don't, and how to configure them properly so tools actually call correctly.
The Core Problem with Local Models and OpenClaw
OpenClaw relies heavily on tool calling — the ability for the model to decide when to use tools (web search, file operations, running scripts) and format those calls correctly.
Most local models are poor at tool calling. They either:
- Ignore tools and just answer conversationally
- Hallucinate tool calls that don't match the schema
- Call tools in the right format but with wrong parameters
- Get confused by long tool lists and freeze
This is why the SmallClaw project exists — it's a simplified OpenClaw fork designed specifically for local models with limited tool-calling ability.
The good news: some local models handle it well. And the landscape is improving fast in 2026.
Which Local Models Actually Work
Tier 1: Reliable Tool Calling (Use These)
Qwen2.5-72B-Instruct (via Ollama)
- Best overall local model for OpenClaw
- Strong tool calling, good context handling
- Requires: 48GB+ VRAM (or use the Q4 quantized version on 24GB)
- Config:
ollama/qwen2.5:72b
Qwen2.5-32B-Instruct (via Ollama)
- Good balance of capability and hardware requirements
- Tool calling works reliably with proper system prompts
- Requires: 20GB+ VRAM
- Config:
ollama/qwen2.5:32b
DeepSeek-V3 (via Ollama or LM Studio)
- Excellent reasoning, good tool calling
- Large model — needs serious hardware
- Config:
ollama/deepseek-v3
Tier 2: Works With Limitations
Llama 3.3-70B (via Ollama)
- Good general capability, inconsistent tool calling
- Works for simple tasks, struggles with complex multi-tool chains
- Requires: 48GB VRAM
Mistral-7B-Instruct (via Ollama)
- Fast, low resource requirements (8GB VRAM)
- Basic tool calling — works for simple tasks only
- Good for: crons, summaries, simple lookups
- Not good for: browser automation, multi-step agentic tasks
Phi-4 (Microsoft, via Ollama)
- Surprisingly capable for its size (14B)
- Decent tool calling on simple tasks
- Config:
ollama/phi4
Tier 3: Avoid for Agentic Tasks
- Gemma 2 — poor tool calling
- Code Llama — code-focused, bad at agent tasks
- Nous Hermes variants — inconsistent tool behavior
- TinyLlama, Phi-2 — too small for reliable tool use
The WhatsApp Context Window Problem
One specific issue reported frequently: WhatsApp sessions fail with local models showing "Context window too small (4096 tokens)."
Why this happens: WhatsApp's message formatting adds overhead, and OpenClaw loads your full system context on each request. A 4096-token context window fills up immediately.
Fix:
- Use a model with at least 8K context window (most modern models)
- In your config, explicitly set
context_length:
model: ollama/qwen2.5:32b
context_length: 32768
- Trim your MEMORY.md and workspace files — every token in those files loads before your message
Setting Up Ollama with OpenClaw
Install Ollama
# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Pull your chosen model
ollama pull qwen2.5:32b
Configure OpenClaw to Use Ollama
openclaw config set model ollama/qwen2.5:32b
openclaw config set provider.ollama.host http://localhost:11434
Or set it in your config file:
model: ollama/qwen2.5:32b
providers:
ollama:
host: http://localhost:11434
Test Tool Calling
Before deploying, test that tool calling works:
Ask your agent: "What is 2+2? Use the calculator tool."
If it answers directly without using a tool, your model isn't calling tools properly. Switch models or check the system prompt.
Fixing Tool Calling Hallucinations
If your local model is hallucinating tool calls (calling tools that don't exist, or with wrong parameters):
1. Simplify the Tool List
Fewer tools = better tool selection. Remove any tools you don't actively use.
2. Add Tool Guidance to Your System Prompt
In AGENTS.md:
## Tool Use Rules
- Only use tools that are explicitly listed as available
- Always use exact tool names as provided
- If unsure which tool to use, ask rather than guess
- Do not invent tool parameters not in the schema
3. Use a Tool-Specific System Prompt Format
Some models respond better to explicit tool formatting. OpenClaw supports custom system prompt templates per model — check the docs for your specific model's recommended format.
4. Reduce Context Load
Long contexts confuse local models more than frontier models. Keep MEMORY.md under 2,000 tokens when using local models.
Hardware Requirements at a Glance
| Model | VRAM | RAM | Speed |
|---|---|---|---|
| Qwen2.5-72B (Q4) | 48GB | 64GB | Slow |
| Qwen2.5-32B (Q4) | 20GB | 32GB | Medium |
| Llama 3.3-70B (Q4) | 48GB | 64GB | Slow |
| Qwen2.5-14B | 10GB | 16GB | Fast |
| Mistral-7B | 6GB | 8GB | Very fast |
No GPU? You can run smaller models on CPU with Ollama — just expect 5–10x slower responses.
The Hybrid Approach (Best of Both Worlds)
You don't have to choose between local and cloud. Run a hybrid:
- Default model: Local (Qwen2.5-32B or similar) — zero cost for everyday tasks
- Heavy tasks: Claude Sonnet/Opus — when you need reliable reasoning or complex tool chains
- Crons: Local model — scheduled tasks don't need frontier-level capability
Configure different models per task type in OpenClaw:
model: ollama/qwen2.5:32b # default
# Override for specific crons
cron_model: ollama/mistral:7b
# Override for explicit heavy tasks
# User can say "use Sonnet for this task"
This setup: $0/month for 80% of tasks, pay as you go for the 20% that need cloud intelligence.
Free Cloud Models (Alternative to Local)
If you don't have the GPU hardware but still want to reduce costs, free cloud models via OpenRouter are worth considering:
google/gemini-2.0-flash-exp:free— free, 1M token context, solid tool callingmeta-llama/llama-3.3-70b-instruct:free— free tier, good capabilitydeepseek/deepseek-chat:free— free, strong reasoning
These run in the cloud (not on your machine), but cost $0 and have better tool-calling reliability than most local models.
Related Guides
Learn alongside 1,000+ operators
Ask questions, share workflows, and get help from people running OpenClaw every day.