Reduce API Costs — Save Money on AI Usage

⚠️ The Problem

Your OpenClaw API bill is higher than expected: 1. Claude API costs adding up fast: ``


$50+ bill last month — not sure why it's so expensive
Just chatting casually but spending $5/day on API



2. **Using expensive models for simple tasks:**


Using Claude Opus for everything including 'hello'
No fallback models configured



3. **Context windows exploding:**


Long conversations hitting 200k tokens
Paying for the same context repeatedly



4. **No visibility into spend:**


Don't know which conversations cost the most
No way to set spending limits

🔍 Why This Happens

High API costs usually come from: 1. Using the wrong model for the job — Claude Opus ($15/M output tokens) for tasks that Sonnet ($3/M) or Haiku ($0.25/M) handle perfectly. Most casual chat doesn't need the smartest model. 2. No caching — Anthropic offers prompt caching that can reduce costs by 90% for repeated context. Without it, you pay full price every message. 3. Context bloat — Long conversations accumulate tokens. A 100k token context costs 10x more than a 10k context, even for a simple 'yes' answer. 4. No fallback strategy — When your primary model hits rate limits, you pay for retries. Smart fallbacks to cheaper models save money AND improve reliability. 5. Tools generating large outputs — File reads, web scrapes, and code execution can dump thousands of tokens into context.

✅ The Fix

## Quick Wins (Immediate Savings)

### 1. Switch to Sonnet for Most Tasks

Claude 3.5 Sonnet is 5x cheaper than Opus and handles 90% of tasks equally well. Set it as your default:

yaml

# ~/.openclaw/config.yamlmodels:  primary: anthropic/claude-sonnet-4-20250514  fallbacks:    - anthropic/claude-3-5-haiku-20241022    - openai/gpt-4o-mini

Cost comparison per 1M tokens: - Opus: $15 output / $75 input - Sonnet: $3 output / $15 input - Haiku: $0.25 output / $1.25 input

Use Opus only when you explicitly need it (complex reasoning, long documents).

### 2. Enable Prompt Caching

Anthropic's prompt caching stores your system prompt and conversation context, charging only 10% for cached tokens:

yaml

models:  anthropic:    cacheControlTtl: 300  # Cache for 5 minutes

This alone can reduce costs by 50-80% for conversational use.

### 3. Configure Aggressive Context Pruning

Don't pay for old messages you don't need:

yaml

contextPruning:  mode: sliding  maxMessages: 20  # Keep only last 20 messages  maxTokens: 50000  # Cap at 50k tokens

For most conversations, 20 messages of context is plenty.

### 4. Use Haiku for Heartbeats & Cron

Automated tasks don't need the smartest model:

yaml

agents:  main:    heartbeat:      model: anthropic/claude-3-5-haiku-20241022

Haiku is 60x cheaper than Opus — perfect for scheduled checks.

## Advanced Strategies

### 5. Set Up Model Routing

Route different tasks to appropriate models:

yaml

# Use Haiku for simple queries, Sonnet for code, Opus for analysismodels:  primary: anthropic/claude-sonnet-4-20250514  routing:    simple: anthropic/claude-3-5-haiku-20241022    code: anthropic/claude-sonnet-4-20250514    analysis: anthropic/claude-opus-4-20250514

### 6. Use Local Models for Drafts

Run Ollama locally for first drafts, only use paid APIs for final output:

yaml

models:  local: ollama/qwen2.5:7b  primary: anthropic/claude-sonnet-4-20250514

Local models cost $0. Use them for brainstorming, then polish with Claude.

### 7. Truncate Tool Outputs

Large file reads and web scrapes bloat context. Limit tool output size:

yaml

tools:  read:    maxChars: 10000  # Cap file reads at 10k chars  web:    maxChars: 5000   # Cap web fetches

### 8. Monitor with Session Status

Check your token usage regularly:

bash

# In chat/status# CLIopenclaw status --usage

This shows tokens used and estimated cost per session.

### 9. Set Budget Alerts

Configure alerts in your provider dashboard: - Anthropic Console: console.anthropic.com → Usage → Set alerts - OpenAI: platform.openai.com → Settings → Limits

Set alerts at 50% and 80% of your monthly budget.

### 10. Use Batch Processing for Bulk Work

If you're processing many items (emails, documents), batch them instead of one-by-one. Anthropic offers batch API with 50% discount.

## Cost Estimation Cheat Sheet

Typical monthly costs by usage pattern:

| Usage | Model | Est. Monthly Cost | |-------|-------|-------------------| | Light (10 msgs/day) | Haiku | $1-3 | | Medium (50 msgs/day) | Sonnet | $5-15 | | Heavy (200+ msgs/day) | Sonnet + caching | $15-40 | | Power user | Opus + Sonnet mix | $30-80 |

With caching enabled, expect 50-80% reduction from these estimates.

## Example: Optimized Config

yaml

# Optimized for costmodels:  primary: anthropic/claude-sonnet-4-20250514  fallbacks:    - anthropic/claude-3-5-haiku-20241022    - minimax/MiniMax-M2.1  anthropic:    cacheControlTtl: 300contextPruning:  mode: sliding  maxMessages: 25  maxTokens: 60000agents:  main:    heartbeat:      model: anthropic/claude-3-5-haiku-20241022tools:  read:    maxChars: 15000  web:    maxChars: 8000

This config uses Sonnet by default, Haiku for automated tasks, aggressive caching, and limits context size. Expected savings: 60-80% vs naive Opus-only config.

🔥 Your AI should run your business, not just answer questions.

We'll show you how.$97/mo (going to $197 soon)

Join Vibe Combinator →

📋 Quick Commands

Command	Description
/status	Check current session token usage and cost
openclaw status --usage	View usage statistics from CLI
openclaw config set models.primary anthropic/claude-sonnet-4-20250514	Switch default model to Sonnet
openclaw config set models.anthropic.cacheControlTtl 300	Enable 5-minute prompt caching
openclaw config set contextPruning.maxMessages 20	Limit context to last 20 messages

Reduce API Costs — Save Money on AI Usage

⚠️ The Problem

🔍 Why This Happens

✅ The Fix

📋 Quick Commands

Related Issues

📚 You Might Also Like

How to Build an AI Reading List Manager

1Password

Why Self-Host AI? Privacy, Cost, and Control Explained

OpenClaw vs Notion AI

Still stuck?