Managing Token Usage & Costs in OpenClaw
Burning through millions of tokens unexpectedly? Learn how to diagnose runaway token consumption, configure context limits, and avoid bill shock with Gemini and other models.
⚠️ The Problem
🔍 Why This Happens
thinking: { type: "enabled" } is set, models generate extensive internal reasoning chains that dramatically increase token usage per request.
3. Context compounding bug — Large tool outputs (especially gateway config.schema returning 396KB+ of JSON) get permanently stored in session .jsonl files. Once this happens, every subsequent message drags that entire blob forward, causing exponential growth. Sessions with ~35 messages have grown to 2.9MB.
4. Discord summarization pulling too many messages — The readMessages tool can fetch up to Discord's API limit (50-100 messages) per call, each adding to context.
5. No context token limits configured — Without explicit limits, the agent will use the full context window of your model, reloading everything on each request.✅ The Fix
## Step 1: Check Current Token Usage First, identify if you're actually hitting limits and where: For Google/Gemini:
# Check real-time usage statsopen "https://aistudio.google.com/usage?timeRange=last-28-days&tab=rate-limit"For Anthropic:
# Check your Claude usageopen "https://console.anthropic.com/settings/usage"## Step 2: Switch to a More Efficient Model Gemini 2.5 Pro is extremely token-hungry. Switching to Flash can reduce consumption by 10x:
// In ~/.config/openclaw/config.json5{ "models": { "provider": "google", "model": "google/gemini-2.5-flash-preview" // Much cheaper than Pro }}Or via CLI:
openclaw model set google/gemini-2.5-flash-preview## Step 3: Disable Thinking Mode Thinking/reasoning mode can explode token usage by 10-50x. Ensure it's disabled:
// In ~/.config/openclaw/config.json5{ "agents": { "defaults": { "models": { "google/gemini-2.5-flash-preview": { "params": { "thinking": { "type": "disabled" } // Critical! } }, "google/gemini-2.5-pro-preview": { "params": { "thinking": { "type": "disabled" } } } } } }}## Step 4: Set Context Token Limits Prevent over-fetching by limiting the context window:
// In ~/.config/openclaw/config.json5{ "agents": { "defaults": { "contextTokens": 50000 // Limit to 50k tokens } }}## Step 5: Fix Bloated Sessions (Context Compounding Bug) If your sessions have already grown massive, you need to clean them:
# List your sessions and their sizesls -lah ~/.openclaw/agents/main/sessions/# Check for bloated sessions (anything over 500KB is suspicious)du -h ~/.openclaw/agents/main/sessions/*.jsonl | sort -hIf you find multi-megabyte sessions:
# Backup the bloated sessionmv ~/.openclaw/agents/main/sessions/SESSION_ID.jsonl ~/.openclaw/agents/main/sessions/SESSION_ID.jsonl.bak# Start freshopenclaw session new## Step 6: Use Subagents for Heavy Tasks For token-intensive operations, spawn subagents with isolated context:
/spawn Summarize Discord messages from the last 24 hours/spawn Configure Samba with these specs: [detailed specs]Benefits: - Isolated context (only loads AGENTS.md + TOOLS.md, not full chat history) - Can use cheaper models for subtasks - Results announce back to your main chat
## Step 7: Request Higher Quotas (Google) If you legitimately need higher limits:
# Check your current tieropen "https://aistudio.google.com/app/apikey"# Request a rate limit increase (no guarantee, but they review)open "https://forms.gle/ETzX94k8jf7iSotH9"Tier levels: - Tier 1 (default): ~1M tokens/min limit - Tier 2 (requires >$250 spend): Higher limits - Tier 3 (enterprise): Highest limits
🔥 Your AI should run your business, not just answer questions.
We'll show you how.$97/mo (going to $197 soon)
📋 Quick Commands
| Command | Description |
|---|---|
| openclaw model set google/gemini-2.5-flash-preview | Switch to the more efficient Gemini Flash model |
| openclaw model list | Check current model configuration |
| openclaw session new | Start a fresh session with clean context |
| ls -lah ~/.openclaw/agents/main/sessions/ | List session files and their sizes to find bloated ones |
| du -h ~/.openclaw/agents/main/sessions/*.jsonl | sort -h | Sort sessions by size to identify the largest |
| /status | Check current session status and token usage |
| /reasoning off | Disable reasoning/thinking mode in the current session |
Related Issues
📚 You Might Also Like
Why Self-Host AI? Privacy, Cost, and Control Explained
Why run AI on your own computer when ChatGPT is a browser tab away? Privacy, cost savings, and control. Here's the complete case for self-hosted AI assistants.
OpenClaw Configuration Guide: Complete Settings Reference (2026)
Master OpenClaw configuration with this complete reference. All config.yaml settings explained: AI models, channels, multi-agent setup, plugins, secrets management, and more.
1Password
Secure secrets management through conversation. Access passwords, API keys, and secure notes safely.
AI Assistant for Content Creators
Create more, manage less
🐙 Your AI should run your business.
Weekly live builds + template vault. We'll show you how to make AI actually work.$97/mo (going to $197 soon)
Join Vibe Combinator →