Voice & Text-to-Speech Setup

⚠️ The Problem

Users encounter issues setting up Text-to-Speech (TTS) with providers like ElevenLabs. The most common problem is the agent outputting MEDIA: /path/to/audio.ogg as plain text instead of attaching the actual voice note. Additionally, users want to configure hands-free voice-only workflows for scenarios like driving where touching the phone isn't safe.

🔍 Why This Happens

The MEDIA: path issue occurs when the TTS tool generates audio successfully, but the channel adapter (Telegram, Discord, etc.) doesn't recognize or process the MEDIA: directive to attach the file. This can happen due to: 1. Model not following TTS tool instructions - The model outputs the MEDIA: path literally instead of letting the system handle attachment 2. TTS auto mode not triggering - When messages.tts.auto is set to always but the model still outputs text 3. Channel capability mismatch - The channel may not support voice note attachments in the expected format 4. Provider configuration issues - Invalid voiceId, modelId, or API credentials For hands-free voice workflows, the challenge is configuring bidirectional voice: Speech-to-Text (STT) for input and TTS for output, without requiring screen interaction.

✅ The Fix

## Fix MEDIA: Path Appearing as Text When you see output like: ``🦞 Got it. MEDIA: /Users/you/.openclaw/cache/tts/abc123.ogg`` Instead of receiving an actual voice note, try these fixes:

### 1. Verify TTS Configuration Check your current TTS settings:

bash

openclaw config get messages.tts

You should see something like: ``json5 { "provider": "elevenlabs", "auto": "always", "elevenlabs": { "voiceId": "FF7KdobWPaiR0vkcALHF", "modelId": "eleven_v3" } }

### 2. Ensure Provider API Key is Set ElevenLabs requires a valid API key:

bash

openclaw config set messages.tts.elevenlabs.apiKey "your-elevenlabs-api-key"

Get your API key from: https://elevenlabs.io/app/settings/api-keys

### 3. Restart the Gateway After any TTS configuration change, restart:

bash

openclaw gateway restart

### 4. Check Channel Voice Capability Not all channels support voice notes the same way. Verify your channel: - Telegram: Full voice note support via asVoice: true - Discord: Voice attachments work as audio files - Slack/iMessage: May render as file attachments

### 5. Test TTS Directly Ask the agent explicitly to speak: ``Say "hello world" as a voice noteOr use the/ttscommand if available:/tts Hello, this is a test

### 6. Check for Model Instruction Issues Some models may not properly handle TTS tool calls. If using a non-Anthropic model (like Kimi K2.5), ensure it understands the TTS tool schema. You may need to add explicit instructions in your IDENTITY.md: ``markdown ## Voice Output When TTS is enabled, use the tts tool to convert responses to speech. Do not output MEDIA: paths as text - the system handles audio attachment automatically.

--- ## Hands-Free Voice-Only Setup (Car/Bluetooth) For safe driving use with Bluetooth, configure full voice loop:

### 1. Enable Always-On TTS Make the agent always respond with voice:

bash

openclaw config set messages.tts.auto "always"openclaw config set messages.tts.provider "elevenlabs"

### 2. Configure Speech-to-Text (STT) Enable STT so you can speak instead of typing:

bash

openclaw config set messages.stt.provider "elevenlabs"openclaw config set messages.stt.auto true

Alternative STT providers: - whisper - OpenAI Whisper (local or API) - deepgram - Fast and accurate - google - Google Cloud Speech

### 3. Telegram Voice Message Workflow For Telegram, the safest hands-free flow: 1. Send voice messages - Telegram converts your speech to a voice note 2. OpenClaw transcribes - STT converts your voice to text 3. Agent processes - Responds to your request 4. TTS converts - Response becomes a voice note 5. Bluetooth plays - You hear the response through car speakers

### 4. iOS Shortcuts Integration (Advanced) Create an iOS Shortcut for truly hands-free activation: 1. Create a Shortcut that sends a message to your Telegram bot 2. Use "Hey Siri" to trigger the shortcut 3. Dictate your message 4. The agent responds via voice note Example Shortcut actions: - Get input from Spoken Text - Send message to Telegram chat - Wait for response - Play audio

### 5. Safety Configuration For driving, add guardrails in your IDENTITY.md: ``markdown ## Driving Mode Keep responses brief and clear for audio playback. No visual elements (tables, code blocks) - voice-friendly only. Confirm critical actions verbally before executing.

🔥 Your AI should run your business, not just answer questions.

We'll show you how.$97/mo (going to $197 soon)

Join Vibe Combinator →

📋 Quick Commands

Command	Description
openclaw config get messages.tts	View current TTS configuration
openclaw config set messages.tts.provider "elevenlabs"	Set ElevenLabs as TTS provider
openclaw config set messages.tts.auto "always"	Enable automatic TTS for all responses
openclaw config set messages.tts.elevenlabs.apiKey "YOUR_KEY"	Set ElevenLabs API key
openclaw config set messages.tts.elevenlabs.voiceId "VOICE_ID"	Set specific ElevenLabs voice
openclaw config set messages.stt.provider "elevenlabs"	Set STT provider for voice input
openclaw config set messages.stt.auto true	Enable automatic speech-to-text
openclaw gateway restart	Restart gateway after config changes

Voice & Text-to-Speech Setup

⚠️ The Problem

🔍 Why This Happens

✅ The Fix

📋 Quick Commands

Related Issues

📚 You Might Also Like

WhatsApp

Voice-Controlled AI Assistant — Talk Instead of Type

How to Run AI Locally: Complete Step-by-Step Guide

How to Set Up OpenClaw on Mac Mini (Perfect Always-On Setup)

Still stuck?