Run OpenClaw Heartbeats Free with Ollama: Stop Paying for 1,440 API Calls Per Day
Run OpenClaw Heartbeats Free with Ollama: Stop Paying for 1,440 API Calls Per Day
Target keywords: openclaw ollama heartbeat, openclaw free heartbeat
If your heartbeat runs every minute, that's 1,440 paid API calls per day — just to check if you're alive.
That's not a typo. 60 minutes × 24 hours = 1,440 heartbeat pings per day. Each one hits the Anthropic API. Each one burns tokens on a task so simple it barely deserves the name "task" — essentially, "are you there? respond with OK."
At Haiku pricing ($0.00025/1K input tokens), the per-call cost is tiny. But at 1,440 calls/day with a minimal prompt + response, it adds up to $5–15/month purely on heartbeats. Not on actual work. Just on existence checks.
Worse: these calls count against your API rate limits. If you're running busy automation sessions, your heartbeats are competing with real work for quota.
The fix is to stop using the Anthropic API for heartbeats entirely. Ollama runs a local LLM on your machine — no API calls, no tokens billed, no rate limit impact. The heartbeat cost goes from $5–15/month to exactly $0.
Here's how to set it up.
What Heartbeats Are and Why They're Expensive by Default
OpenClaw heartbeats are periodic health checks. They fire on a schedule to confirm your agent is running, responsive, and able to process requests. They're also used to trigger lightweight monitoring tasks — checking for blockers, progress updates, or time-sensitive items that need attention.
By default, heartbeats use your configured primary model — whatever model OpenClaw uses for everything else. If that's Sonnet, your heartbeats are Sonnet calls. If you've already set Haiku as default (article 2), your heartbeats are Haiku calls. Either way, they're paid API calls.
The math:
- 1-minute interval: 1,440 calls/day × 30 days = 43,200 calls/month
- 5-minute interval: 288 calls/day × 30 days = 8,640 calls/month
- 1-hour interval: 24 calls/day × 30 days = 720 calls/month
At Haiku pricing, even the 1-hour interval accumulates. At 1-minute intervals it's a meaningful ongoing cost — again, just for the agent to confirm it's still running.
The heartbeat payload is simple. A prompt like "Check: Any blockers, opportunities, or progress updates needed?" plus a short response is maybe 200–500 tokens total. That doesn't justify the API overhead.
Installing Ollama
Ollama runs open-source LLMs locally. No API key, no cloud, no per-token billing. You download a model once and run it as many times as you want at zero marginal cost.
Install on macOS or Linux:
# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh
Or download the macOS app directly from ollama.ai — it installs as a menu bar app that keeps the server running automatically.
Pull the heartbeat model:
# Pull llama3.2:3b (2GB download, one-time)
ollama pull llama3.2:3b
That's the full install. Ollama starts a local API server at http://localhost:11434 that accepts the same format as OpenAI's API — which is what OpenClaw's Ollama integration talks to.
Why llama3.2:3b Specifically
There's a 1b version (llama3.2:1b) that's smaller and faster — about 1.3GB vs 2GB. You'll see it mentioned in some guides. For heartbeat-only use cases, 1b works fine.
But the source guide recommends llama3.2:3b for a specific reason: it handles complex context better than 1b for production use.
If your heartbeat prompt is simple ("respond with OK"), 1b is sufficient. But heartbeat prompts often include some agent context — recent activity, current project status, things to watch for. When that context gets more detailed, 1b can produce lower-quality responses or miss the point of the check. 3b is more reliable with richer prompts.
The practical guidance:
- Starting out with simple heartbeats → llama3.2:1b works, 1.3GB download
- Heartbeats with agent context or structured responses → llama3.2:3b, 2GB download
- When in doubt → use 3b. The extra 700MB is worth the reliability
Both run fast on modern hardware. On an M-series Mac, llama3.2:3b responds in under a second for simple prompts.
The Full Heartbeat Config
Update your ~/.openclaw/openclaw.json to add the heartbeat section. Here's the full config including the model routing setup from article 2:
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-haiku-4-5"
},
"models": {
"anthropic/claude-sonnet-4-5": {
"alias": "sonnet"
},
"anthropic/claude-haiku-4-5": {
"alias": "haiku"
}
}
}
},
"heartbeat": {
"every": "1h",
"model": "ollama/llama3.2:3b",
"session": "main",
"target": "slack",
"prompt": "Check: Any blockers, opportunities, or progress updates needed?"
}
}
What each field does:
| Field | Value | What it does |
|---|---|---|
every | "1h" | Fires every hour (adjust to your preference: "5m", "30m", "1h", "6h") |
model | "ollama/llama3.2:3b" | Routes heartbeat to local Ollama instead of paid API |
session | "main" | Which agent session handles the heartbeat |
target | "slack" | Where heartbeat responses go (slack, telegram, or omit to suppress output) |
prompt | "Check: Any..." | The actual question asked on each heartbeat |
The critical field is "model": "ollama/llama3.2:3b". This is what routes the heartbeat to your local machine instead of Anthropic's API. Everything else is configuration for frequency and routing.
Adjust the interval based on what you need:
"every": "5m" // High-frequency monitoring (still free with Ollama)
"every": "1h" // Standard health check
"every": "6h" // Low-priority background agent
With a paid API, the interval directly affected cost. With Ollama, you can run every minute at zero cost if you need that level of monitoring. Set it based on how quickly you want to detect issues, not based on cost.
Verifying Ollama Is Running Correctly
After installing and configuring, run these checks:
# Start the Ollama server (if not already running as a service)
ollama serve
# In a separate terminal — test the model responds
ollama run llama3.2:3b "respond with OK"
# Should output something like "OK" within 1-2 seconds
# Verify the API endpoint is live
curl http://localhost:11434/api/generate \
-d '{"model": "llama3.2:3b", "prompt": "respond with OK", "stream": false}'
# Should return a JSON response with "response": "OK"
If ollama serve hangs or shows errors:
- Check if another Ollama instance is already running (
ps aux | grep ollama) - On macOS with the menu bar app, Ollama starts automatically — no need to run
ollama servemanually - Check port 11434 isn't blocked (
lsof -i :11434)
After configuring heartbeats in openclaw.json:
openclaw shell
session_status
# Look for: Heartbeat: Ollama/local (not API)
If the heartbeat still shows API instead of Ollama/local, verify:
- The JSON syntax in openclaw.json is valid (no missing commas or brackets)
- Ollama is actually running (
ollama serveor the menu bar app is active) - The model is pulled (
ollama list— should showllama3.2:3b)
The Monthly Cost: $5–15 → $0
Here's what you're actually saving, broken down by heartbeat interval:
At 1-minute intervals (1,440 calls/day):
| Paid API (Haiku) | Ollama | |
|---|---|---|
| Daily calls | 1,440 | 1,440 |
| Token cost/day | ~$0.50 | $0 |
| Monthly | ~$15 | $0 |
| Annual | ~$180 | $0 |
At 5-minute intervals (288 calls/day):
| Paid API (Haiku) | Ollama | |
|---|---|---|
| Daily calls | 288 | 288 |
| Token cost/day | ~$0.10 | $0 |
| Monthly | ~$3 | $0 |
At 1-hour intervals (24 calls/day):
| Paid API (Haiku) | Ollama | |
|---|---|---|
| Daily calls | 24 | 24 |
| Token cost/day | ~$0.008 | $0 |
| Monthly | ~$0.25 | $0 |
The absolute dollar savings look small at hourly intervals. What matters is the principle: you're eliminating a recurring cost that provides zero value relative to a local alternative, and you're freeing up API quota for actual work.
At 1-minute or 5-minute intervals, the savings are meaningful ($3–15/month) and the Ollama switch is an obvious move.
When You Might Still Want a Paid Model for Heartbeats
Most of the time, Ollama is the right call. But there are scenarios where you'd stick with a paid API:
The agent runs on a remote server without GPU/CPU capacity for Ollama. Ollama runs fine on most modern hardware, but if your agent is on a tiny VPS or a device with very limited resources, the local LLM might be too slow or fail to run at all. In this case, Haiku at 1-hour intervals is still cheap.
Your heartbeat prompt is doing real reasoning, not just health checks. If you've configured your heartbeat to summarize complex project status, generate reports, or make non-trivial decisions, a smarter model might produce better output. llama3.2:3b is capable but not on par with Sonnet for complex reasoning. If the heartbeat output is something you act on, consider whether Haiku is the right tradeoff.
You need the heartbeat output to be highly reliable in production. For most personal agent setups, occasional quirky Ollama output is fine — you review it manually. For production systems where heartbeat responses trigger automated workflows, you might want the consistency of a hosted model.
For everything else — the vast majority of OpenClaw setups — Ollama handles heartbeats perfectly and costs nothing.
Before vs After
| ❌ Before | ✓ After | |
|---|---|---|
| Heartbeat model | Paid API (Haiku or Sonnet) | Ollama local LLM (free) |
| Daily API calls | 1,440 (at 1-min interval) | 0 |
| Monthly cost | $5–15 | $0 |
| Rate limit impact | Yes — competes with real work | None |
| Flexibility on interval | Limited by cost | Run as frequently as needed |
| Dependency | Anthropic API uptime | Local machine uptime |
The only downside to Ollama: if your machine is off, heartbeats don't run. For an always-on desktop or server, this isn't a concern. For a laptop you close overnight, factor in that heartbeats will pause when the machine sleeps.
Key Takeaways
- 1,440 API calls/day at 1-minute intervals: This is what default heartbeat behavior looks like. It adds up to $5–15/month just for existence checks.
- One install, zero ongoing cost:
curl -fsSL https://ollama.ai/install.sh | sh+ollama pull llama3.2:3b. That's it. No API keys, no billing, no rate limits. - llama3.2:3b is the right size: 2GB, fast, handles production context better than 1b. Don't over-optimize to 1b and get inconsistent responses.
- The config change is one line:
"model": "ollama/llama3.2:3b"in the heartbeat section of openclaw.json. - Verify with three commands:
ollama serve,ollama run llama3.2:3b "respond with OK", thensession_statusin openclaw shell. - Free interval flexibility: With Ollama, you can run heartbeats every minute if you want without any cost concern. Set the interval based on operational needs, not API economics.
This is the third of five optimizations. Combined with session initialization (article 1) and model routing (article 2), you've already addressed three of the four major cost drivers. Add rate limits and prompt caching, and the full system takes you from $1,500+/month to under $50/month — no complex infrastructure, just smart configuration.
Learn alongside 1,000+ operators
Ask questions, share workflows, and get help from people running OpenClaw every day.
📚 Explore More
How to Self-Host an LLM: Run AI Models on Your Own Hardware
Complete guide to running large language models locally. Llama, Mistral, Qwen, and other open-source models on your Mac, PC, or server — fully offline, zero API costs.
Setting Up API Keys for All Providers — Complete Guide
Complete guide to configuring API keys and authentication for all OpenClaw providers including Anthropic, OpenAI, Google, OpenRouter, and OAuth-based services like Gmail. Covers multi-auth, token refresh issues, and proper config file structure.
AI Assistant for Small Business Owners
Run your business without running yourself ragged
Linear
Modern project management through natural conversation. Create issues, track cycles, triage bugs, and run Linear queries — all without leaving your chat. Built for the speed-obsessed.