Multi-Agent Architecture — Fleet Setup & Best Practices

⚠️ The Problem

Users running multi-agent fleets (e.g., coordinator + developer + content agents) encounter several challenges: 1. Context overflow - Long-running sessions exceed model context limits 2. Memory persistence - Agent state doesn't survive session resets 3. OAuth token errors - OAuth token refresh failed for anthropic: Failed to refresh OAuth token 4. Exec session timeouts - CLI commands work in terminal but timeout when agent runs them 5. Agent misbehavior after upgrade - Crons not firing, soul not updating, unintended skill publishing 6. Multiple model configuration - Using different models for different tasks

🔍 Why This Happens

Context Overflow: Running agents 24/7 in a single session accumulates context until it exceeds model limits (typically 100k-200k tokens). Without compaction, every message adds to the context. OAuth Token Errors: The Anthropic API key has expired, been revoked, or reached rate limits. This happens even with Claude Max subscriptions if using API access incorrectly. Exec Timeouts: The Gateway binds to 127.0.0.1 by default, but exec sessions may run in isolated network namespaces where localhost routing differs. Commands fail with: ``


Error: gateway timeout after 10000ms



**Post-Upgrade Issues**: The

clawdbot → openclaw` migration may leave orphaned configs, cron schedules, or session states that reference old paths. Memory Loss: Without explicit file-based memory (MEMORY.md, daily notes), agent state only exists in the current session context.

✅ The Fix

## Production Architecture for 24/7 Agent Fleet The recommended architecture uses OpenClaw as the control plane with Markdown files as source-of-truth memory:

### 1. Memory Layer Setup Create a structured memory hierarchy in your workspace: ``~/clawd/ ├── MEMORY.md # Curated durable facts, principles, "how we work" ├── IDENTITY.md # Agent personality and instructions ├── memory/ │ └── YYYY-MM-DD.md # Daily logs (auto-created) └── projects/ ├── project-a/ └── project-b/

Each agent (Jarvis, CLU, Cortana) gets its own workspace: ``bash mkdir -p ~/agents/jarvis ~/agents/clu ~/agents/cortana

### 2. Per-Task Sessions (Not Forever-Sessions) Don't run one eternal session. Instead: - Coordinator (Jarvis): Long-lived main session for human chat - Workers (CLU, Cortana): Spawned as subagents per gig/task - Crons/Hooks: Isolated sessions that complete and exit

### 3. Enable Auto-Compaction Prevent context overflow by enabling compaction:

bash

openclaw config set agent.compaction.enabled trueopenclaw config set agent.compaction.threshold 50000

This summarizes old context when approaching limits. See: https://docs.openclaw.ai/concepts/compaction

--- ## Fix OAuth Token Refresh Errors When you see: ``⚠️ Agent failed before reply: OAuth token refresh failed for anthropic: Failed to refresh OAuth token

### For API Key Users 1. Get a fresh API key from https://console.anthropic.com/ 2. Update your config:

bash

openclaw configure anthropic

Or manually: ``bash openclaw config set anthropic.apiKey "sk-ant-api03-..."

3. Restart the gateway:

bash

openclaw gateway restart

### For Claude Max Subscribers Claude Max (subscription) uses OAuth, not API keys. The OAuth flow may have stale tokens: 1. Re-authenticate: ``bash openclaw auth anthropic2. Follow the browser OAuth flow 3. Restart:bash openclaw gateway restart`` Note: If the bot still responds despite the error, it's using cached context. The error indicates token refresh fails on each request but the cached session still works. Fix it anyway to prevent future failures.

--- ## Fix CLI Timeout in Exec Sessions When terminal commands work but agent exec fails: Terminal (works): ``bash $ openclaw cron list # Returns immediately**Agent exec (fails)**:Error: gateway timeout after 10000ms

### Diagnosis Check what the gateway is listening on:

bash

ss -tulpn | grep 18789

If you see 127.0.0.1:18789, that's the problem.

### Solution: Bind to All Interfaces Edit ~/.openclaw/config.json5: ``json5 { "gateway": { "host": "0.0.0.0", // Changed from 127.0.0.1 "port": 18789 } }Or use the bind shortcut:bash openclaw config set gateway.bind "lan"

Restart: ``bash openclaw gateway restart

Verify: ``bash ss -tulpn | grep 18789 # Should show: tcp LISTEN 0 511 0.0.0.0:18789

### Why This Happens Exec sessions may run in isolated network namespaces (containers, different PID namespaces). Even though both use 127.0.0.1, socket routing differs. Binding to 0.0.0.0 makes the Gateway reachable from all network contexts.

--- ## Configure Multiple Models Use different models for different tasks:

### Interactive Model Selection ``bash openclaw config`` Choose "models", then press Space to select multiple models, Enter to confirm.

### Recommended Multi-Model Setup ``json5 { "models": { "default": "anthropic/claude-sonnet-4-20250514", "coding": "anthropic/claude-sonnet-4-20250514", "creative": "anthropic/claude-opus-4-0", "fast": "moonshotai/kimi-k2.5" } }

### Free/Cheap Model Options - Kimi K2.5 - Free tier available via Kilo Gateway: https://blog.kilo.ai/p/kilo-gateway-supercharges-moltbot-fka-clawdbot - Gemini - Google's models with generous free tier - Groq - Fast inference, free tier available

--- ## Factory Reset (Preserve Key Configs) If your agent is misbehaving after upgrade: ### 1. Backup Critical Files ``bash mkdir -p ~/openclaw-backup cp -r ~/.openclaw/config.json5 ~/openclaw-backup/ cp -r ~/clawd/MEMORY.md ~/openclaw-backup/ cp -r ~/clawd/IDENTITY.md ~/openclaw-backup/ cp -r ~/clawd/memory/ ~/openclaw-backup/

### 2. Clean State Reset ``bash openclaw gateway stop rm -rf ~/.openclaw/sessions/ rm -rf ~/.openclaw/cache/

### 3. Reset Crons ``bash openclaw cron clear openclaw cron list # Should be empty

### 4. Restore and Restart ``bash openclaw gateway start

### 5. Re-add Crons Manually re-add your scheduled tasks: ``bash openclaw cron add "0 9 * * *" "Good morning check-in"

### Full Nuclear Reset (Start Fresh) ``bash openclaw gateway stop rm -rf ~/.openclaw/ rm -rf ~/.config/openclaw/ openclaw gateway start # Re-run initial setup openclaw configure

🔥 Your AI should run your business, not just answer questions.

We'll show you how.$97/mo (going to $197 soon)

Join Vibe Combinator →

📋 Quick Commands

Command	Description
openclaw config set agent.compaction.enabled true	Enable auto-compaction to prevent context overflow
openclaw configure anthropic	Reconfigure Anthropic API key interactively
openclaw auth anthropic	Re-authenticate OAuth for Claude Max
openclaw config set gateway.bind "lan"	Bind gateway to all interfaces (fixes exec timeouts)
ss -tulpn \| grep 18789	Check what interface the gateway is listening on
openclaw gateway restart	Restart the gateway after config changes
openclaw cron list	List all scheduled cron jobs
openclaw cron clear	Remove all cron jobs (for reset)
openclaw config	Interactive configuration menu (select multiple models)
rm -rf ~/.openclaw/sessions/	Clear session cache (soft reset)

Multi-Agent Architecture — Fleet Setup & Best Practices

⚠️ The Problem

🔍 Why This Happens

✅ The Fix

📋 Quick Commands

Related Issues

📚 You Might Also Like

OpenClaw Configuration Guide: Complete Settings Reference (2026)

AI Assistant with Memory: How It Works and Why It Matters

Voice-Controlled AI Assistant — Talk Instead of Type

How to Run AI Locally: Complete Step-by-Step Guide

Still stuck?