OpenClaw Prompt Caching: Get a 90% Discount on Content You're Already Sending
OpenClaw Prompt Caching: Get a 90% Discount on Content You're Already Sending
Target keywords: openclaw prompt caching | openclaw claude caching
You're Sending the Same 5KB on Every Single API Call
Every time you send a message through OpenClaw, Claude receives more than just your message. It receives your entire system context: SOUL.md, USER.md, IDENTITY.md, TOOLS.md ā all of it, every time.
If your SOUL.md is 2KB and your USER.md is 1KB and your system prompt adds another 2KB, you're sending 5KB of identical, static content with every single API call. Call 2 gets it. Call 50 gets it. Call 100 gets it.
Without prompt caching, you're paying full price for that identical 5KB every single time.
With prompt caching, calls 2 through 100 cost 90% less on that static content.
The setup takes about 5 minutes. Here's exactly how it works and how to configure it.
How Claude's Prompt Caching Works
Claude's prompt caching (available on Claude 3.5 Sonnet and all newer models) operates on a straightforward model:
First request: Content is sent and processed at full price. Claude stores it in a temporary cache tied to your session. This is a "cache write" ā it costs 25% more than a normal token read.
Subsequent requests (within 5 minutes): Claude detects the same content is already in cache. Instead of reprocessing it, it reads from cache. This is a "cache hit" ā it costs 90% less than the original price.
After 5 minutes: Cache expires by default. Next request is a full price call (another cache write).
The math on what this means:
| Scenario | Per 1K tokens | 100 calls Ć 5KB |
|---|---|---|
| No caching | $0.003 | $1.50 |
| Cache writes (25% premium) | $0.00375 | $0.0375 (1 call) |
| Cache hits (90% discount) | $0.0003 | $0.135 (99 calls) |
| Total with caching | ā | $0.173 |
| Savings | ā | $1.33 (88% reduction) |
That's on system prompts alone. Add workspace files and reference docs, and the savings compound quickly.
What to Cache vs. What Not to Cache
Not everything is worth caching. The rule is simple: cache what's stable, skip what changes.
| ā Cache These | ā Don't Cache These |
|---|---|
| System prompts (rarely change) | Daily memory files (change every day) |
| SOUL.md (operator principles) | Recent user messages (fresh each session) |
| USER.md (goals and context) | Tool outputs (change per task) |
| TOOLS.md (tool documentation) | Session notes (dynamic by definition) |
| Reference materials (pricing, docs, specs) | Debugging context (temporary) |
| Project templates (standard structures) | ā |
| REFERENCE.md files (stable project docs) | ā |
The underlying principle: if the content changes more often than once per day, the cache invalidation cost starts eating into your savings. Static files ā the ones that stay consistent across sessions ā are where caching pays off most.
The Workspace Folder Structure for Caching
To maximize cache hit rate, keep static and dynamic content separated. This matters because when dynamic content gets mixed into a cached block, any update to the dynamic content invalidates the cache for everything in that block.
The recommended folder structure:
/workspace/
āāā SOUL.md ā Cache this (stable)
āāā USER.md ā Cache this (stable)
āāā TOOLS.md ā Cache this (stable)
āāā memory/
ā āāā MEMORY.md ā Don't cache (frequently updated)
ā āāā 2026-02-03.md ā Don't cache (daily notes)
āāā projects/
āāā [PROJECT]/REFERENCE.md ā Cache this (stable docs)
The key separation: your memory/ folder contains content that changes constantly ā daily notes, session logs, running context. That content should load dynamically, on demand, without being bundled into the cached layer.
Your SOUL.md, USER.md, and TOOLS.md are structural files that almost never change. Those belong in the cached layer.
The Exact Config JSON
Enable caching by updating ~/.openclaw/openclaw-config.json:
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-haiku-4-5"
},
"cache": {
"enabled": true,
"ttl": "5m",
"priority": "high"
},
"models": {
"anthropic/claude-sonnet-4-5": {
"alias": "sonnet",
"cache": true
},
"anthropic/claude-haiku-4-5": {
"alias": "haiku",
"cache": false
}
}
}
}
}
Configuration options explained:
| Option | What It Does |
|---|---|
cache.enabled | true/false ā global toggle for prompt caching |
cache.ttl | Time-to-live: "5m" (default), "30m" (longer sessions), "24h" for persistent reference docs |
cache.priority | "high" prioritizes caching aggressively; "low" balances cost vs speed |
models.cache: true | Enable per-model; Sonnet recommended, Haiku optional |
Why Haiku has cache: false:
Haiku is already extremely cheap ā $0.00025 per 1K tokens. The caching overhead (the 25% premium on cache writes) can actually exceed the savings for short sessions with Haiku. Caching is most effective with Sonnet, where the base cost is high enough that a 90% discount on repeated content makes a significant difference.
Real-World Example: Outreach Campaign, $102/month ā $32/month
You're running 50 outreach email drafts per week using Sonnet. Each draft requires your full system prompt (5KB) plus the specific contact details (~3KB) and your instructions (~2KB).
Without caching:
| Component | Per Draft | 50 Drafts/Week |
|---|---|---|
| System prompt (5KB) | $0.015 | $0.75 |
| Draft context (8KB) | $0.024 | $1.20 |
| Weekly total | ā | $1.95 |
| Monthly total | ā | $7.80 |
| Ć overhead from loops | ā | ~$102/month |
With caching (batched within 5-minute windows):
| Component | Per Draft | 50 Drafts/Week |
|---|---|---|
| System prompt ā 1 cache write | $0.01875 | $0.01875 (once) |
| System prompt ā 49 cache hits | $0.0015 | $0.0735 |
| Draft context (~50% cache hits) | $0.012 | $0.60 |
| Weekly total | ā | $0.62 |
| Monthly total | ā | $2.48 |
| Ć realistic overhead | ā | ~$32/month |
Savings: $70/month. On a single workflow.
The key multiplier here is batching. When you run all 50 drafts in a single session within a 5-minute window, the system prompt gets cached on call 1 and hits cache on calls 2ā50. That's 49 cache hits instead of 50 full-price calls.
4 Strategies to Maximize Cache Hit Rate
1. Batch Requests Within 5-Minute Windows
The TTL on Claude's cache is 5 minutes by default. If call 1 establishes the cache, call 2 needs to happen within 5 minutes to get the cache hit.
For bulk operations (email drafts, research tasks, data processing), run them back-to-back in a single session. Don't spread them across 8 hours. Batching within a 5-minute window means all 50 calls hit the same cache instead of each one paying for a fresh write.
If your TTL expires mid-batch, the config supports longer windows:
"cache": {
"ttl": "30m"
}
2. Keep System Prompts Stable Mid-Session
Every edit to SOUL.md invalidates the cache for that content block. If you're tweaking your system prompt during a live session, you're paying for a new cache write on every change ā and those 25% premium writes add up.
Batch system prompt updates during maintenance windows, not in the middle of active work. Make all your SOUL.md edits at once, then start a fresh session with the updated cached content.
3. Organize Context Hierarchically
Structure your content so the most stable content comes first in the context:
- Core system prompt (highest priority, cached always)
- Stable workspace files (SOUL.md, USER.md, TOOLS.md)
- Project reference docs (REFERENCE.md per project)
- Dynamic daily notes (uncached, loaded on demand)
When the stable layer is consistently first, it's consistently cached. When dynamic content gets mixed in, it disrupts the cache boundaries.
4. Separate Stable from Dynamic Per Project
For each project, maintain two separate files:
product-reference.mdā pricing, specs, features, company info (stable, cached)project-notes.mdā current status, open questions, recent decisions (dynamic, uncached)
Loading project-notes.md should never invalidate the cache on product-reference.md. Keep them separate.
When NOT to Use Caching
Caching isn't universally beneficial. Skip it when:
Haiku tasks (too cheap to benefit) Haiku at $0.00025/1K tokens is so inexpensive that the overhead of cache writes and the complexity of managing cache TTLs can cost more than you save. Use caching with Sonnet. For Haiku-primary workflows, it's often not worth it.
Frequent prompt changes If you're actively developing and iterating on your system prompt ā changing it multiple times per session ā you'll pay the 25% cache write premium every time without getting meaningful hits. Wait until the prompt stabilizes before enabling caching.
Small requests under 1KB For tiny API calls where the context is minimal, the caching overhead doesn't meaningfully reduce costs. Caching pays off at scale, on large static context blocks.
Development and testing When you're testing prompts, every variation is a cache miss + write. Cache thrashing during development costs more than it saves. Disable caching while iterating, enable it for production runs.
Monitoring Cache Performance with session_status
Once caching is enabled, track whether it's working:
# Start a session
openclaw shell
# Check current cache metrics
session_status
# Expected output:
# Cache hits: 45/50 (90%)
# Cache tokens used: 225KB (vs 250KB without cache)
# Cost savings: $0.22 this session
Or query usage directly:
# Check cache usage over 24 hours
curl https://api.anthropic.com/v1/usage \
-H "Authorization: Bearer $ANTHROPIC_API_KEY" | jq '.usage.cache'
Metrics to watch:
| Metric | Target | What It Means If Off |
|---|---|---|
| Cache hit rate | > 80% | Batching isn't tight enough; requests too spread out |
| Cached tokens < 30% of input | Good ratio | System prompts too large ā trim them |
| Cache writes increasing | Low/stable | System prompt changing too often ā stabilize it |
| Session cost -50% vs last week | Positive trend | Caching + model routing combined effect |
If your cache hit rate is below 80%, the most common fix is batching requests more tightly within the 5-minute TTL window.
The Compounding Effect With Other Optimizations
Caching multiplies the impact of every other optimization:
| Optimization | Per Session (Before) | Per Session (After) | With Cache Added |
|---|---|---|---|
| Session Init (lean context) | $0.40 | $0.05 | $0.005 |
| Model Routing (Haiku default) | $0.05 | $0.02 | $0.002 |
| Heartbeat to Ollama | $0.02 | $0 | $0 |
| Prompt Caching | ā | ā | ā$0.015 |
| Combined Total | $0.47 | $0.07 | $0.012 |
Combined, these four optimizations take a typical $0.47/session cost to $0.012/session ā a 97% reduction.
Caching alone gets you from $0.07 to $0.012 on top of the other optimizations already in place. It's the final multiplier.
Key Takeaways
- You're paying full price for static content on every API call. SOUL.md, USER.md, system prompts ā they're resent and reprocessed every time without caching.
- Cache hits cost 90% less than uncached reads. On 100 API calls with a 5KB system prompt, that's ~$1.33 saved on system prompts alone.
- Cache writes cost 25% more ā factor this in for infrequent operations where hits won't outweigh the write cost.
- Enable caching in config:
cache.enabled: true, cache.ttl: "5m"for Sonnet; skip for Haiku. - Batch requests within 5-minute windows to maximize hit rate on the same cached content.
- Separate stable from dynamic content in your file structure ā never mix them in the same cached block.
- Outreach campaign example: $102/month ā $32/month with batching + caching.
- Monitor with
session_statusā target 80%+ cache hit rate. - Don't cache during active development ā cache thrashing costs more than it saves.
- Combined with the other optimizations, you reach 97% total cost reduction.
Learn alongside 1,000+ operators
Ask questions, share workflows, and get help from people running OpenClaw every day.
š Explore More
Reduce API Costs ā Save Money on AI Usage
Slash your OpenClaw API bill by 50-80%. Learn model selection, caching, prompt optimization, and smart fallback strategies.
Chat with your AI assistant through WhatsApp, the messaging app you already use every day. Send voice notes, share files, and get things done without switching apps.
How to Automate Social Media with AI
Draft posts, schedule content, and manage engagement without the grind.
AI Assistant for Content Creators
Create more, manage less