🦞OpenClaw Guide
Models

Reduce API Costs — Save Money on AI Usage

Slash your OpenClaw API bill by 50-80%. Learn model selection, caching, prompt optimization, and smart fallback strategies.

⚠️ The Problem

Your OpenClaw API bill is higher than expected: 1. Claude API costs adding up fast: `` $50+ bill last month — not sure why it's so expensive Just chatting casually but spending $5/day on API 2. **Using expensive models for simple tasks:** Using Claude Opus for everything including 'hello' No fallback models configured 3. **Context windows exploding:** Long conversations hitting 200k tokens Paying for the same context repeatedly 4. **No visibility into spend:** Don't know which conversations cost the most No way to set spending limits

🔍 Why This Happens

High API costs usually come from: 1. Using the wrong model for the job — Claude Opus ($15/M output tokens) for tasks that Sonnet ($3/M) or Haiku ($0.25/M) handle perfectly. Most casual chat doesn't need the smartest model. 2. No caching — Anthropic offers prompt caching that can reduce costs by 90% for repeated context. Without it, you pay full price every message. 3. Context bloat — Long conversations accumulate tokens. A 100k token context costs 10x more than a 10k context, even for a simple 'yes' answer. 4. No fallback strategy — When your primary model hits rate limits, you pay for retries. Smart fallbacks to cheaper models save money AND improve reliability. 5. Tools generating large outputs — File reads, web scrapes, and code execution can dump thousands of tokens into context.

The Fix

## Quick Wins (Immediate Savings)

### 1. Switch to Sonnet for Most Tasks

Claude 3.5 Sonnet is 5x cheaper than Opus and handles 90% of tasks equally well. Set it as your default:

yaml
# ~/.openclaw/config.yamlmodels:  primary: anthropic/claude-sonnet-4-20250514  fallbacks:    - anthropic/claude-3-5-haiku-20241022    - openai/gpt-4o-mini

Cost comparison per 1M tokens: - Opus: $15 output / $75 input - Sonnet: $3 output / $15 input - Haiku: $0.25 output / $1.25 input

Use Opus only when you explicitly need it (complex reasoning, long documents).

### 2. Enable Prompt Caching

Anthropic's prompt caching stores your system prompt and conversation context, charging only 10% for cached tokens:

yaml
models:  anthropic:    cacheControlTtl: 300  # Cache for 5 minutes

This alone can reduce costs by 50-80% for conversational use.

### 3. Configure Aggressive Context Pruning

Don't pay for old messages you don't need:

yaml
contextPruning:  mode: sliding  maxMessages: 20  # Keep only last 20 messages  maxTokens: 50000  # Cap at 50k tokens

For most conversations, 20 messages of context is plenty.

### 4. Use Haiku for Heartbeats & Cron

Automated tasks don't need the smartest model:

yaml
agents:  main:    heartbeat:      model: anthropic/claude-3-5-haiku-20241022

Haiku is 60x cheaper than Opus — perfect for scheduled checks.

## Advanced Strategies

### 5. Set Up Model Routing

Route different tasks to appropriate models:

yaml
# Use Haiku for simple queries, Sonnet for code, Opus for analysismodels:  primary: anthropic/claude-sonnet-4-20250514  routing:    simple: anthropic/claude-3-5-haiku-20241022    code: anthropic/claude-sonnet-4-20250514    analysis: anthropic/claude-opus-4-20250514

### 6. Use Local Models for Drafts

Run Ollama locally for first drafts, only use paid APIs for final output:

yaml
models:  local: ollama/qwen2.5:7b  primary: anthropic/claude-sonnet-4-20250514

Local models cost $0. Use them for brainstorming, then polish with Claude.

### 7. Truncate Tool Outputs

Large file reads and web scrapes bloat context. Limit tool output size:

yaml
tools:  read:    maxChars: 10000  # Cap file reads at 10k chars  web:    maxChars: 5000   # Cap web fetches

### 8. Monitor with Session Status

Check your token usage regularly:

bash
# In chat/status# CLIopenclaw status --usage

This shows tokens used and estimated cost per session.

### 9. Set Budget Alerts

Configure alerts in your provider dashboard: - Anthropic Console: console.anthropic.com → Usage → Set alerts - OpenAI: platform.openai.com → Settings → Limits

Set alerts at 50% and 80% of your monthly budget.

### 10. Use Batch Processing for Bulk Work

If you're processing many items (emails, documents), batch them instead of one-by-one. Anthropic offers batch API with 50% discount.

## Cost Estimation Cheat Sheet

Typical monthly costs by usage pattern:

| Usage | Model | Est. Monthly Cost | |-------|-------|-------------------| | Light (10 msgs/day) | Haiku | $1-3 | | Medium (50 msgs/day) | Sonnet | $5-15 | | Heavy (200+ msgs/day) | Sonnet + caching | $15-40 | | Power user | Opus + Sonnet mix | $30-80 |

With caching enabled, expect 50-80% reduction from these estimates.

## Example: Optimized Config

yaml
# Optimized for costmodels:  primary: anthropic/claude-sonnet-4-20250514  fallbacks:    - anthropic/claude-3-5-haiku-20241022    - minimax/MiniMax-M2.1  anthropic:    cacheControlTtl: 300contextPruning:  mode: sliding  maxMessages: 25  maxTokens: 60000agents:  main:    heartbeat:      model: anthropic/claude-3-5-haiku-20241022tools:  read:    maxChars: 15000  web:    maxChars: 8000

This config uses Sonnet by default, Haiku for automated tasks, aggressive caching, and limits context size. Expected savings: 60-80% vs naive Opus-only config.

🔥 Your AI should run your business, not just answer questions.

We'll show you how.$97/mo (going to $197 soon)

Join Vibe Combinator →

📋 Quick Commands

CommandDescription
/statusCheck current session token usage and cost
openclaw status --usageView usage statistics from CLI
openclaw config set models.primary anthropic/claude-sonnet-4-20250514Switch default model to Sonnet
openclaw config set models.anthropic.cacheControlTtl 300Enable 5-minute prompt caching
openclaw config set contextPruning.maxMessages 20Limit context to last 20 messages

Related Issues

🐙 Your AI should run your business.

Weekly live builds + template vault. We'll show you how to make AI actually work.$97/mo (going to $197 soon)

Join Vibe Combinator →

Still stuck?

Join our Discord community for real-time help.

Join Discord