Managing Token Usage & Costs in OpenClaw
Burning through millions of tokens unexpectedly? Learn how to diagnose runaway token consumption, configure context limits, and avoid bill shock with Gemini and other models.
⚠️ The Problem
Users report consuming 1-3 million tokens within minutes of normal use, hitting API quotas instantly, receiving rate limit errors like 'You exceeded your current quota for generate_content_paid_tier_inp', or sessions growing to multi-megabyte sizes causing exponential token growth on every message.
🔍 Why This Happens
Several factors cause runaway token consumption:
-
Gemini 2.5 Pro's token-hungry architecture — This model is documented to consume 1.9M+ input tokens in just a few dozen API calls, even for simple tasks. This is a known model behavior, not an OpenClaw bug.
-
Thinking/reasoning mode enabled — When
thinking: { type: "enabled" }is set, models generate extensive internal reasoning chains that dramatically increase token usage per request. -
Context compounding bug — Large tool outputs (especially
gateway config.schemareturning 396KB+ of JSON) get permanently stored in session.jsonlfiles. Once this happens, every subsequent message drags that entire blob forward, causing exponential growth. Sessions with ~35 messages have grown to 2.9MB. -
Discord summarization pulling too many messages — The
readMessagestool can fetch up to Discord's API limit (50-100 messages) per call, each adding to context. -
No context token limits configured — Without explicit limits, the agent will use the full context window of your model, reloading everything on each request.
✅ The Fix
Step 1: Check Current Token Usage
First, identify if you're actually hitting limits and where:
For Google/Gemini:
# Check real-time usage statsopen "https://aistudio.google.com/usage?timeRange=last-28-days&tab=rate-limit"For Anthropic:
# Check your Claude usageopen "https://console.anthropic.com/settings/usage"Step 2: Switch to a More Efficient Model
Gemini 2.5 Pro is extremely token-hungry. Switching to Flash can reduce consumption by 10x:
// In ~/.config/openclaw/config.json5{ "models": { "provider": "google", "model": "google/gemini-2.5-flash-preview" // Much cheaper than Pro }}Or via CLI:
openclaw model set google/gemini-2.5-flash-previewStep 3: Disable Thinking Mode
Thinking/reasoning mode can explode token usage by 10-50x. Ensure it's disabled:
// In ~/.config/openclaw/config.json5{ "agents": { "defaults": { "models": { "google/gemini-2.5-flash-preview": { "params": { "thinking": { "type": "disabled" } // Critical! } }, "google/gemini-2.5-pro-preview": { "params": { "thinking": { "type": "disabled" } } } } } }}Step 4: Set Context Token Limits
Prevent over-fetching by limiting the context window:
// In ~/.config/openclaw/config.json5{ "agents": { "defaults": { "contextTokens": 50000 // Limit to 50k tokens } }}Step 5: Fix Bloated Sessions (Context Compounding Bug)
If your sessions have already grown massive, you need to clean them:
# List your sessions and their sizesls -lah ~/.openclaw/agents/main/sessions/# Check for bloated sessions (anything over 500KB is suspicious)du -h ~/.openclaw/agents/main/sessions/*.jsonl | sort -hIf you find multi-megabyte sessions:
# Backup the bloated sessionmv ~/.openclaw/agents/main/sessions/SESSION_ID.jsonl ~/.openclaw/agents/main/sessions/SESSION_ID.jsonl.bak# Start freshopenclaw session newStep 6: Use Subagents for Heavy Tasks
For token-intensive operations, spawn subagents with isolated context:
/spawn Summarize Discord messages from the last 24 hours/spawn Configure Samba with these specs: [detailed specs]Benefits:
- Isolated context (only loads AGENTS.md + TOOLS.md, not full chat history)
- Can use cheaper models for subtasks
- Results announce back to your main chat
Step 7: Request Higher Quotas (Google)
If you legitimately need higher limits:
# Check your current tieropen "https://aistudio.google.com/app/apikey"# Request a rate limit increase (no guarantee, but they review)open "https://forms.gle/ETzX94k8jf7iSotH9"Tier levels:
- Tier 1 (default): ~1M tokens/min limit
- Tier 2 (requires >$250 spend): Higher limits
- Tier 3 (enterprise): Highest limits
🔥 Your AI should run your business, not just answer questions.
We'll show you how.Free to join.
📋 Quick Commands
| Command | Description |
|---|---|
| openclaw model set google/gemini-2.5-flash-preview | Switch to the more efficient Gemini Flash model |
| openclaw model list | Check current model configuration |
| openclaw session new | Start a fresh session with clean context |
| ls -lah ~/.openclaw/agents/main/sessions/ | List session files and their sizes to find bloated ones |
| du -h ~/.openclaw/agents/main/sessions/*.jsonl | sort -h | Sort sessions by size to identify the largest |
| /status | Check current session status and token usage |
| /reasoning off | Disable reasoning/thinking mode in the current session |
Related Issues
📚 You Might Also Like
OpenClaw Cost Optimization: Stay Under $20/Month
People burn through $100 in their first week with OpenClaw. Here's exactly how to run OpenClaw for under $20/month without giving up capability — model tiering, cron configs, and the 5 biggest token drains.
How to Self-Host an LLM: Run AI Models on Your Own Hardware
Complete guide to running large language models locally. Llama, Mistral, Qwen, and other open-source models on your Mac, PC, or server — fully offline, zero API costs.
1Password
Connect OpenClaw to 1Password for secure credential management. Inject secrets into skills, access vaults, and manage credentials programmatically.
AI Assistant for Architects
More time designing. Less time writing specifications and managing project admin.
🐙 Your AI should run your business.
Weekly live builds + template vault. We'll show you how to make AI actually work.Free to join.
Join Vibe Combinator →