🦞OpenClaw Guide
← Back to BlogGuide

Best Local Models for OpenClaw (Free Tier Guide)

2026-02-258 min read

Last Updated: February 2026

Running OpenClaw with local models sounds ideal: no API costs, full privacy, no rate limits. The reality is more complicated.

This guide tells you exactly which local models work well with OpenClaw, which ones don't, and how to configure them properly so tools actually call correctly.


The Core Problem with Local Models and OpenClaw

OpenClaw relies heavily on tool calling — the ability for the model to decide when to use tools (web search, file operations, running scripts) and format those calls correctly.

Most local models are poor at tool calling. They either:

  • Ignore tools and just answer conversationally
  • Hallucinate tool calls that don't match the schema
  • Call tools in the right format but with wrong parameters
  • Get confused by long tool lists and freeze

This is why the SmallClaw project exists — it's a simplified OpenClaw fork designed specifically for local models with limited tool-calling ability.

The good news: some local models handle it well. And the landscape is improving fast in 2026.


Which Local Models Actually Work

Tier 1: Reliable Tool Calling (Use These)

Qwen2.5-72B-Instruct (via Ollama)

  • Best overall local model for OpenClaw
  • Strong tool calling, good context handling
  • Requires: 48GB+ VRAM (or use the Q4 quantized version on 24GB)
  • Config: ollama/qwen2.5:72b

Qwen2.5-32B-Instruct (via Ollama)

  • Good balance of capability and hardware requirements
  • Tool calling works reliably with proper system prompts
  • Requires: 20GB+ VRAM
  • Config: ollama/qwen2.5:32b

DeepSeek-V3 (via Ollama or LM Studio)

  • Excellent reasoning, good tool calling
  • Large model — needs serious hardware
  • Config: ollama/deepseek-v3

Tier 2: Works With Limitations

Llama 3.3-70B (via Ollama)

  • Good general capability, inconsistent tool calling
  • Works for simple tasks, struggles with complex multi-tool chains
  • Requires: 48GB VRAM

Mistral-7B-Instruct (via Ollama)

  • Fast, low resource requirements (8GB VRAM)
  • Basic tool calling — works for simple tasks only
  • Good for: crons, summaries, simple lookups
  • Not good for: browser automation, multi-step agentic tasks

Phi-4 (Microsoft, via Ollama)

  • Surprisingly capable for its size (14B)
  • Decent tool calling on simple tasks
  • Config: ollama/phi4

Tier 3: Avoid for Agentic Tasks

  • Gemma 2 — poor tool calling
  • Code Llama — code-focused, bad at agent tasks
  • Nous Hermes variants — inconsistent tool behavior
  • TinyLlama, Phi-2 — too small for reliable tool use

The WhatsApp Context Window Problem

One specific issue reported frequently: WhatsApp sessions fail with local models showing "Context window too small (4096 tokens)."

Why this happens: WhatsApp's message formatting adds overhead, and OpenClaw loads your full system context on each request. A 4096-token context window fills up immediately.

Fix:

  1. Use a model with at least 8K context window (most modern models)
  2. In your config, explicitly set context_length:
model: ollama/qwen2.5:32b
context_length: 32768
  1. Trim your MEMORY.md and workspace files — every token in those files loads before your message

Setting Up Ollama with OpenClaw

Install Ollama

# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Pull your chosen model
ollama pull qwen2.5:32b

Configure OpenClaw to Use Ollama

openclaw config set model ollama/qwen2.5:32b
openclaw config set provider.ollama.host http://localhost:11434

Or set it in your config file:

model: ollama/qwen2.5:32b
providers:
  ollama:
    host: http://localhost:11434

Test Tool Calling

Before deploying, test that tool calling works:

Ask your agent: "What is 2+2? Use the calculator tool."

If it answers directly without using a tool, your model isn't calling tools properly. Switch models or check the system prompt.


Fixing Tool Calling Hallucinations

If your local model is hallucinating tool calls (calling tools that don't exist, or with wrong parameters):

1. Simplify the Tool List

Fewer tools = better tool selection. Remove any tools you don't actively use.

2. Add Tool Guidance to Your System Prompt

In AGENTS.md:

## Tool Use Rules
- Only use tools that are explicitly listed as available
- Always use exact tool names as provided
- If unsure which tool to use, ask rather than guess
- Do not invent tool parameters not in the schema

3. Use a Tool-Specific System Prompt Format

Some models respond better to explicit tool formatting. OpenClaw supports custom system prompt templates per model — check the docs for your specific model's recommended format.

4. Reduce Context Load

Long contexts confuse local models more than frontier models. Keep MEMORY.md under 2,000 tokens when using local models.


Hardware Requirements at a Glance

ModelVRAMRAMSpeed
Qwen2.5-72B (Q4)48GB64GBSlow
Qwen2.5-32B (Q4)20GB32GBMedium
Llama 3.3-70B (Q4)48GB64GBSlow
Qwen2.5-14B10GB16GBFast
Mistral-7B6GB8GBVery fast

No GPU? You can run smaller models on CPU with Ollama — just expect 5–10x slower responses.


The Hybrid Approach (Best of Both Worlds)

You don't have to choose between local and cloud. Run a hybrid:

  • Default model: Local (Qwen2.5-32B or similar) — zero cost for everyday tasks
  • Heavy tasks: Claude Sonnet/Opus — when you need reliable reasoning or complex tool chains
  • Crons: Local model — scheduled tasks don't need frontier-level capability

Configure different models per task type in OpenClaw:

model: ollama/qwen2.5:32b  # default

# Override for specific crons
cron_model: ollama/mistral:7b

# Override for explicit heavy tasks
# User can say "use Sonnet for this task"

This setup: $0/month for 80% of tasks, pay as you go for the 20% that need cloud intelligence.


Free Cloud Models (Alternative to Local)

If you don't have the GPU hardware but still want to reduce costs, free cloud models via OpenRouter are worth considering:

  • google/gemini-2.0-flash-exp:free — free, 1M token context, solid tool calling
  • meta-llama/llama-3.3-70b-instruct:free — free tier, good capability
  • deepseek/deepseek-chat:free — free, strong reasoning

These run in the cloud (not on your machine), but cost $0 and have better tool-calling reliability than most local models.


Related Guides

Learn alongside 1,000+ operators

Ask questions, share workflows, and get help from people running OpenClaw every day.