Reduce API Costs — Save Money on AI Usage
Slash your OpenClaw API bill by 50-80%. Learn model selection, caching, prompt optimization, and smart fallback strategies.
⚠️ The Problem
Your OpenClaw API bill is higher than expected:
- Claude API costs adding up fast:
$50+ bill last month — not sure why it's so expensiveJust chatting casually but spending $5/day on API- Using expensive models for simple tasks:
Using Claude Opus for everything including 'hello'No fallback models configured- Context windows exploding:
Long conversations hitting 200k tokensPaying for the same context repeatedly- No visibility into spend:
Don't know which conversations cost the mostNo way to set spending limits🔍 Why This Happens
High API costs usually come from:
-
Using the wrong model for the job — Claude Opus ($15/M output tokens) for tasks that Sonnet ($3/M) or Haiku ($0.25/M) handle perfectly. Most casual chat doesn't need the smartest model.
-
No caching — Anthropic offers prompt caching that can reduce costs by 90% for repeated context. Without it, you pay full price every message.
-
Context bloat — Long conversations accumulate tokens. A 100k token context costs 10x more than a 10k context, even for a simple 'yes' answer.
-
No fallback strategy — When your primary model hits rate limits, you pay for retries. Smart fallbacks to cheaper models save money AND improve reliability.
-
Tools generating large outputs — File reads, web scrapes, and code execution can dump thousands of tokens into context.
✅ The Fix
Quick Wins (Immediate Savings)
1. Switch to Sonnet for Most Tasks
Claude 3.5 Sonnet is 5x cheaper than Opus and handles 90% of tasks equally well. Set it as your default:
# ~/.openclaw/config.yamlmodels: primary: anthropic/claude-sonnet-4-20250514 fallbacks: - anthropic/claude-3-5-haiku-20241022 - openai/gpt-4o-miniCost comparison per 1M tokens:
- Opus: $15 output / $75 input
- Sonnet: $3 output / $15 input
- Haiku: $0.25 output / $1.25 input
Use Opus only when you explicitly need it (complex reasoning, long documents).
2. Enable Prompt Caching
Anthropic's prompt caching stores your system prompt and conversation context, charging only 10% for cached tokens:
models: anthropic: cacheControlTtl: 300 # Cache for 5 minutesThis alone can reduce costs by 50-80% for conversational use.
3. Configure Aggressive Context Pruning
Don't pay for old messages you don't need:
contextPruning: mode: sliding maxMessages: 20 # Keep only last 20 messages maxTokens: 50000 # Cap at 50k tokensFor most conversations, 20 messages of context is plenty.
4. Use Haiku for Heartbeats & Cron
Automated tasks don't need the smartest model:
agents: main: heartbeat: model: anthropic/claude-3-5-haiku-20241022Haiku is 60x cheaper than Opus — perfect for scheduled checks.
Advanced Strategies
5. Set Up Model Routing
Route different tasks to appropriate models:
# Use Haiku for simple queries, Sonnet for code, Opus for analysismodels: primary: anthropic/claude-sonnet-4-20250514 routing: simple: anthropic/claude-3-5-haiku-20241022 code: anthropic/claude-sonnet-4-20250514 analysis: anthropic/claude-opus-4-202505146. Use Local Models for Drafts
Run Ollama locally for first drafts, only use paid APIs for final output:
models: local: ollama/qwen2.5:7b primary: anthropic/claude-sonnet-4-20250514Local models cost $0. Use them for brainstorming, then polish with Claude.
7. Truncate Tool Outputs
Large file reads and web scrapes bloat context. Limit tool output size:
tools: read: maxChars: 10000 # Cap file reads at 10k chars web: maxChars: 5000 # Cap web fetches8. Monitor with Session Status
Check your token usage regularly:
# In chat/status# CLIopenclaw status --usageThis shows tokens used and estimated cost per session.
9. Set Budget Alerts
Configure alerts in your provider dashboard:
- Anthropic Console: console.anthropic.com → Usage → Set alerts
- OpenAI: platform.openai.com → Settings → Limits
Set alerts at 50% and 80% of your monthly budget.
10. Use Batch Processing for Bulk Work
If you're processing many items (emails, documents), batch them instead of one-by-one. Anthropic offers batch API with 50% discount.
Cost Estimation Cheat Sheet
Typical monthly costs by usage pattern:
| Usage | Model | Est. Monthly Cost |
|---|---|---|
| Light (10 msgs/day) | Haiku | $1-3 |
| Medium (50 msgs/day) | Sonnet | $5-15 |
| Heavy (200+ msgs/day) | Sonnet + caching | $15-40 |
| Power user | Opus + Sonnet mix | $30-80 |
With caching enabled, expect 50-80% reduction from these estimates.
Example: Optimized Config
# Optimized for costmodels: primary: anthropic/claude-sonnet-4-20250514 fallbacks: - anthropic/claude-3-5-haiku-20241022 - minimax/MiniMax-M2.1 anthropic: cacheControlTtl: 300contextPruning: mode: sliding maxMessages: 25 maxTokens: 60000agents: main: heartbeat: model: anthropic/claude-3-5-haiku-20241022tools: read: maxChars: 15000 web: maxChars: 8000This config uses Sonnet by default, Haiku for automated tasks, aggressive caching, and limits context size. Expected savings: 60-80% vs naive Opus-only config.
🔥 Your AI should run your business, not just answer questions.
We'll show you how.Free to join.
📋 Quick Commands
| Command | Description |
|---|---|
| /status | Check current session token usage and cost |
| openclaw status --usage | View usage statistics from CLI |
| openclaw config set models.primary anthropic/claude-sonnet-4-20250514 | Switch default model to Sonnet |
| openclaw config set models.anthropic.cacheControlTtl 300 | Enable 5-minute prompt caching |
| openclaw config set contextPruning.maxMessages 20 | Limit context to last 20 messages |
Related Issues
📚 You Might Also Like
OpenClaw Pricing: How Much Does It Cost? (2026 Guide)
OpenClaw is free, but running it costs money. Complete breakdown of AI API costs, hosting, and how to reduce spending.
How to Self-Host an LLM: Run AI Models on Your Own Hardware
Complete guide to running large language models locally. Llama, Mistral, Qwen, and other open-source models on your Mac, PC, or server — fully offline, zero API costs.
KoboldCpp
Run completely local LLMs with OpenClaw using KoboldCpp. Zero API costs, full privacy, offline operation — your AI assistant running entirely on your own hardware.
OpenClaw vs Notion AI
Your assistant, everywhere - not just in one app
🐙 Your AI should run your business.
Weekly live builds + template vault. We'll show you how to make AI actually work.Free to join.
Join Vibe Combinator →