šŸ¦žOpenClaw Guide
← Back to BlogOptimization

OpenClaw Prompt Caching: Get a 90% Discount on Content You're Already Sending

2026-03-17•10 min read

OpenClaw Prompt Caching: Get a 90% Discount on Content You're Already Sending

Target keywords: openclaw prompt caching | openclaw claude caching


You're Sending the Same 5KB on Every Single API Call

Every time you send a message through OpenClaw, Claude receives more than just your message. It receives your entire system context: SOUL.md, USER.md, IDENTITY.md, TOOLS.md — all of it, every time.

If your SOUL.md is 2KB and your USER.md is 1KB and your system prompt adds another 2KB, you're sending 5KB of identical, static content with every single API call. Call 2 gets it. Call 50 gets it. Call 100 gets it.

Without prompt caching, you're paying full price for that identical 5KB every single time.

With prompt caching, calls 2 through 100 cost 90% less on that static content.

The setup takes about 5 minutes. Here's exactly how it works and how to configure it.


How Claude's Prompt Caching Works

Claude's prompt caching (available on Claude 3.5 Sonnet and all newer models) operates on a straightforward model:

First request: Content is sent and processed at full price. Claude stores it in a temporary cache tied to your session. This is a "cache write" — it costs 25% more than a normal token read.

Subsequent requests (within 5 minutes): Claude detects the same content is already in cache. Instead of reprocessing it, it reads from cache. This is a "cache hit" — it costs 90% less than the original price.

After 5 minutes: Cache expires by default. Next request is a full price call (another cache write).

The math on what this means:

ScenarioPer 1K tokens100 calls Ɨ 5KB
No caching$0.003$1.50
Cache writes (25% premium)$0.00375$0.0375 (1 call)
Cache hits (90% discount)$0.0003$0.135 (99 calls)
Total with caching—$0.173
Savings—$1.33 (88% reduction)

That's on system prompts alone. Add workspace files and reference docs, and the savings compound quickly.


What to Cache vs. What Not to Cache

Not everything is worth caching. The rule is simple: cache what's stable, skip what changes.

āœ… Cache TheseāŒ Don't Cache These
System prompts (rarely change)Daily memory files (change every day)
SOUL.md (operator principles)Recent user messages (fresh each session)
USER.md (goals and context)Tool outputs (change per task)
TOOLS.md (tool documentation)Session notes (dynamic by definition)
Reference materials (pricing, docs, specs)Debugging context (temporary)
Project templates (standard structures)—
REFERENCE.md files (stable project docs)—

The underlying principle: if the content changes more often than once per day, the cache invalidation cost starts eating into your savings. Static files — the ones that stay consistent across sessions — are where caching pays off most.


The Workspace Folder Structure for Caching

To maximize cache hit rate, keep static and dynamic content separated. This matters because when dynamic content gets mixed into a cached block, any update to the dynamic content invalidates the cache for everything in that block.

The recommended folder structure:

/workspace/
ā”œā”€ā”€ SOUL.md                      ← Cache this (stable)
ā”œā”€ā”€ USER.md                      ← Cache this (stable)
ā”œā”€ā”€ TOOLS.md                     ← Cache this (stable)
ā”œā”€ā”€ memory/
│   ā”œā”€ā”€ MEMORY.md                ← Don't cache (frequently updated)
│   └── 2026-02-03.md            ← Don't cache (daily notes)
└── projects/
      └── [PROJECT]/REFERENCE.md  ← Cache this (stable docs)

The key separation: your memory/ folder contains content that changes constantly — daily notes, session logs, running context. That content should load dynamically, on demand, without being bundled into the cached layer.

Your SOUL.md, USER.md, and TOOLS.md are structural files that almost never change. Those belong in the cached layer.


The Exact Config JSON

Enable caching by updating ~/.openclaw/openclaw-config.json:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-haiku-4-5"
      },
      "cache": {
        "enabled": true,
        "ttl": "5m",
        "priority": "high"
      },
      "models": {
        "anthropic/claude-sonnet-4-5": {
          "alias": "sonnet",
          "cache": true
        },
        "anthropic/claude-haiku-4-5": {
          "alias": "haiku",
          "cache": false
        }
      }
    }
  }
}

Configuration options explained:

OptionWhat It Does
cache.enabledtrue/false — global toggle for prompt caching
cache.ttlTime-to-live: "5m" (default), "30m" (longer sessions), "24h" for persistent reference docs
cache.priority"high" prioritizes caching aggressively; "low" balances cost vs speed
models.cache: trueEnable per-model; Sonnet recommended, Haiku optional

Why Haiku has cache: false:

Haiku is already extremely cheap — $0.00025 per 1K tokens. The caching overhead (the 25% premium on cache writes) can actually exceed the savings for short sessions with Haiku. Caching is most effective with Sonnet, where the base cost is high enough that a 90% discount on repeated content makes a significant difference.


Real-World Example: Outreach Campaign, $102/month → $32/month

You're running 50 outreach email drafts per week using Sonnet. Each draft requires your full system prompt (5KB) plus the specific contact details (~3KB) and your instructions (~2KB).

Without caching:

ComponentPer Draft50 Drafts/Week
System prompt (5KB)$0.015$0.75
Draft context (8KB)$0.024$1.20
Weekly total—$1.95
Monthly total—$7.80
Ɨ overhead from loops—~$102/month

With caching (batched within 5-minute windows):

ComponentPer Draft50 Drafts/Week
System prompt — 1 cache write$0.01875$0.01875 (once)
System prompt — 49 cache hits$0.0015$0.0735
Draft context (~50% cache hits)$0.012$0.60
Weekly total—$0.62
Monthly total—$2.48
Ɨ realistic overhead—~$32/month

Savings: $70/month. On a single workflow.

The key multiplier here is batching. When you run all 50 drafts in a single session within a 5-minute window, the system prompt gets cached on call 1 and hits cache on calls 2–50. That's 49 cache hits instead of 50 full-price calls.


4 Strategies to Maximize Cache Hit Rate

1. Batch Requests Within 5-Minute Windows

The TTL on Claude's cache is 5 minutes by default. If call 1 establishes the cache, call 2 needs to happen within 5 minutes to get the cache hit.

For bulk operations (email drafts, research tasks, data processing), run them back-to-back in a single session. Don't spread them across 8 hours. Batching within a 5-minute window means all 50 calls hit the same cache instead of each one paying for a fresh write.

If your TTL expires mid-batch, the config supports longer windows:

"cache": {
  "ttl": "30m"
}

2. Keep System Prompts Stable Mid-Session

Every edit to SOUL.md invalidates the cache for that content block. If you're tweaking your system prompt during a live session, you're paying for a new cache write on every change — and those 25% premium writes add up.

Batch system prompt updates during maintenance windows, not in the middle of active work. Make all your SOUL.md edits at once, then start a fresh session with the updated cached content.

3. Organize Context Hierarchically

Structure your content so the most stable content comes first in the context:

  1. Core system prompt (highest priority, cached always)
  2. Stable workspace files (SOUL.md, USER.md, TOOLS.md)
  3. Project reference docs (REFERENCE.md per project)
  4. Dynamic daily notes (uncached, loaded on demand)

When the stable layer is consistently first, it's consistently cached. When dynamic content gets mixed in, it disrupts the cache boundaries.

4. Separate Stable from Dynamic Per Project

For each project, maintain two separate files:

  • product-reference.md — pricing, specs, features, company info (stable, cached)
  • project-notes.md — current status, open questions, recent decisions (dynamic, uncached)

Loading project-notes.md should never invalidate the cache on product-reference.md. Keep them separate.


When NOT to Use Caching

Caching isn't universally beneficial. Skip it when:

Haiku tasks (too cheap to benefit) Haiku at $0.00025/1K tokens is so inexpensive that the overhead of cache writes and the complexity of managing cache TTLs can cost more than you save. Use caching with Sonnet. For Haiku-primary workflows, it's often not worth it.

Frequent prompt changes If you're actively developing and iterating on your system prompt — changing it multiple times per session — you'll pay the 25% cache write premium every time without getting meaningful hits. Wait until the prompt stabilizes before enabling caching.

Small requests under 1KB For tiny API calls where the context is minimal, the caching overhead doesn't meaningfully reduce costs. Caching pays off at scale, on large static context blocks.

Development and testing When you're testing prompts, every variation is a cache miss + write. Cache thrashing during development costs more than it saves. Disable caching while iterating, enable it for production runs.


Monitoring Cache Performance with session_status

Once caching is enabled, track whether it's working:

# Start a session
openclaw shell

# Check current cache metrics
session_status

# Expected output:
# Cache hits: 45/50 (90%)
# Cache tokens used: 225KB (vs 250KB without cache)
# Cost savings: $0.22 this session

Or query usage directly:

# Check cache usage over 24 hours
curl https://api.anthropic.com/v1/usage \
  -H "Authorization: Bearer $ANTHROPIC_API_KEY" | jq '.usage.cache'

Metrics to watch:

MetricTargetWhat It Means If Off
Cache hit rate> 80%Batching isn't tight enough; requests too spread out
Cached tokens < 30% of inputGood ratioSystem prompts too large — trim them
Cache writes increasingLow/stableSystem prompt changing too often — stabilize it
Session cost -50% vs last weekPositive trendCaching + model routing combined effect

If your cache hit rate is below 80%, the most common fix is batching requests more tightly within the 5-minute TTL window.


The Compounding Effect With Other Optimizations

Caching multiplies the impact of every other optimization:

OptimizationPer Session (Before)Per Session (After)With Cache Added
Session Init (lean context)$0.40$0.05$0.005
Model Routing (Haiku default)$0.05$0.02$0.002
Heartbeat to Ollama$0.02$0$0
Prompt Cachingā€”ā€”āˆ’$0.015
Combined Total$0.47$0.07$0.012

Combined, these four optimizations take a typical $0.47/session cost to $0.012/session — a 97% reduction.

Caching alone gets you from $0.07 to $0.012 on top of the other optimizations already in place. It's the final multiplier.


Key Takeaways

  • You're paying full price for static content on every API call. SOUL.md, USER.md, system prompts — they're resent and reprocessed every time without caching.
  • Cache hits cost 90% less than uncached reads. On 100 API calls with a 5KB system prompt, that's ~$1.33 saved on system prompts alone.
  • Cache writes cost 25% more — factor this in for infrequent operations where hits won't outweigh the write cost.
  • Enable caching in config: cache.enabled: true, cache.ttl: "5m" for Sonnet; skip for Haiku.
  • Batch requests within 5-minute windows to maximize hit rate on the same cached content.
  • Separate stable from dynamic content in your file structure — never mix them in the same cached block.
  • Outreach campaign example: $102/month → $32/month with batching + caching.
  • Monitor with session_status — target 80%+ cache hit rate.
  • Don't cache during active development — cache thrashing costs more than it saves.
  • Combined with the other optimizations, you reach 97% total cost reduction.

Learn alongside 1,000+ operators

Ask questions, share workflows, and get help from people running OpenClaw every day.