🦞OpenClaw Guide
Models

Reduce API Costs — Save Money on AI Usage

Slash your OpenClaw API bill by 50-80%. Learn model selection, caching, prompt optimization, and smart fallback strategies.

⚠️ The Problem

Your OpenClaw API bill is higher than expected:

  1. Claude API costs adding up fast:
bash
$50+ bill last month — not sure why it's so expensiveJust chatting casually but spending $5/day on API
  1. Using expensive models for simple tasks:
bash
Using Claude Opus for everything including 'hello'No fallback models configured
  1. Context windows exploding:
bash
Long conversations hitting 200k tokensPaying for the same context repeatedly
  1. No visibility into spend:
bash
Don't know which conversations cost the mostNo way to set spending limits

🔍 Why This Happens

High API costs usually come from:

  1. Using the wrong model for the job — Claude Opus ($15/M output tokens) for tasks that Sonnet ($3/M) or Haiku ($0.25/M) handle perfectly. Most casual chat doesn't need the smartest model.

  2. No caching — Anthropic offers prompt caching that can reduce costs by 90% for repeated context. Without it, you pay full price every message.

  3. Context bloat — Long conversations accumulate tokens. A 100k token context costs 10x more than a 10k context, even for a simple 'yes' answer.

  4. No fallback strategy — When your primary model hits rate limits, you pay for retries. Smart fallbacks to cheaper models save money AND improve reliability.

  5. Tools generating large outputs — File reads, web scrapes, and code execution can dump thousands of tokens into context.

The Fix

Quick Wins (Immediate Savings)

1. Switch to Sonnet for Most Tasks

Claude 3.5 Sonnet is 5x cheaper than Opus and handles 90% of tasks equally well. Set it as your default:

yaml
# ~/.openclaw/config.yamlmodels:  primary: anthropic/claude-sonnet-4-20250514  fallbacks:    - anthropic/claude-3-5-haiku-20241022    - openai/gpt-4o-mini

Cost comparison per 1M tokens:

  • Opus: $15 output / $75 input
  • Sonnet: $3 output / $15 input
  • Haiku: $0.25 output / $1.25 input

Use Opus only when you explicitly need it (complex reasoning, long documents).

2. Enable Prompt Caching

Anthropic's prompt caching stores your system prompt and conversation context, charging only 10% for cached tokens:

yaml
models:  anthropic:    cacheControlTtl: 300  # Cache for 5 minutes

This alone can reduce costs by 50-80% for conversational use.

3. Configure Aggressive Context Pruning

Don't pay for old messages you don't need:

yaml
contextPruning:  mode: sliding  maxMessages: 20  # Keep only last 20 messages  maxTokens: 50000  # Cap at 50k tokens

For most conversations, 20 messages of context is plenty.

4. Use Haiku for Heartbeats & Cron

Automated tasks don't need the smartest model:

yaml
agents:  main:    heartbeat:      model: anthropic/claude-3-5-haiku-20241022

Haiku is 60x cheaper than Opus — perfect for scheduled checks.

Advanced Strategies

5. Set Up Model Routing

Route different tasks to appropriate models:

yaml
# Use Haiku for simple queries, Sonnet for code, Opus for analysismodels:  primary: anthropic/claude-sonnet-4-20250514  routing:    simple: anthropic/claude-3-5-haiku-20241022    code: anthropic/claude-sonnet-4-20250514    analysis: anthropic/claude-opus-4-20250514

6. Use Local Models for Drafts

Run Ollama locally for first drafts, only use paid APIs for final output:

yaml
models:  local: ollama/qwen2.5:7b  primary: anthropic/claude-sonnet-4-20250514

Local models cost $0. Use them for brainstorming, then polish with Claude.

7. Truncate Tool Outputs

Large file reads and web scrapes bloat context. Limit tool output size:

yaml
tools:  read:    maxChars: 10000  # Cap file reads at 10k chars  web:    maxChars: 5000   # Cap web fetches

8. Monitor with Session Status

Check your token usage regularly:

bash
# In chat/status# CLIopenclaw status --usage

This shows tokens used and estimated cost per session.

9. Set Budget Alerts

Configure alerts in your provider dashboard:

  • Anthropic Console: console.anthropic.com → Usage → Set alerts
  • OpenAI: platform.openai.com → Settings → Limits

Set alerts at 50% and 80% of your monthly budget.

10. Use Batch Processing for Bulk Work

If you're processing many items (emails, documents), batch them instead of one-by-one. Anthropic offers batch API with 50% discount.

Cost Estimation Cheat Sheet

Typical monthly costs by usage pattern:

UsageModelEst. Monthly Cost
Light (10 msgs/day)Haiku$1-3
Medium (50 msgs/day)Sonnet$5-15
Heavy (200+ msgs/day)Sonnet + caching$15-40
Power userOpus + Sonnet mix$30-80

With caching enabled, expect 50-80% reduction from these estimates.

Example: Optimized Config

yaml
# Optimized for costmodels:  primary: anthropic/claude-sonnet-4-20250514  fallbacks:    - anthropic/claude-3-5-haiku-20241022    - minimax/MiniMax-M2.1  anthropic:    cacheControlTtl: 300contextPruning:  mode: sliding  maxMessages: 25  maxTokens: 60000agents:  main:    heartbeat:      model: anthropic/claude-3-5-haiku-20241022tools:  read:    maxChars: 15000  web:    maxChars: 8000

This config uses Sonnet by default, Haiku for automated tasks, aggressive caching, and limits context size. Expected savings: 60-80% vs naive Opus-only config.

🔥 Your AI should run your business, not just answer questions.

We'll show you how.Free to join.

Join Vibe Combinator →

📋 Quick Commands

CommandDescription
/statusCheck current session token usage and cost
openclaw status --usageView usage statistics from CLI
openclaw config set models.primary anthropic/claude-sonnet-4-20250514Switch default model to Sonnet
openclaw config set models.anthropic.cacheControlTtl 300Enable 5-minute prompt caching
openclaw config set contextPruning.maxMessages 20Limit context to last 20 messages

Related Issues

🐙 Your AI should run your business.

Weekly live builds + template vault. We'll show you how to make AI actually work.Free to join.

Join Vibe Combinator →