Voice & Text-to-Speech Setup
Configure TTS providers like ElevenLabs, fix MEDIA: path output issues, and set up hands-free voice-only workflows for mobile or car use.
ā ļø The Problem
MEDIA: /path/to/audio.ogg as plain text instead of attaching the actual voice note. Additionally, users want to configure hands-free voice-only workflows for scenarios like driving where touching the phone isn't safe.š Why This Happens
messages.tts.auto is set to always but the model still outputs text
3. Channel capability mismatch - The channel may not support voice note attachments in the expected format
4. Provider configuration issues - Invalid voiceId, modelId, or API credentials
For hands-free voice workflows, the challenge is configuring bidirectional voice: Speech-to-Text (STT) for input and TTS for output, without requiring screen interaction.ā The Fix
## Fix MEDIA: Path Appearing as Text
When you see output like:
``
š¦ Got it. MEDIA: /Users/you/.openclaw/cache/tts/abc123.ogg
``
Instead of receiving an actual voice note, try these fixes:
### 1. Verify TTS Configuration Check your current TTS settings:
openclaw config get messages.ttsYou should see something like:
``json5
{
"provider": "elevenlabs",
"auto": "always",
"elevenlabs": {
"voiceId": "FF7KdobWPaiR0vkcALHF",
"modelId": "eleven_v3"
}
}
### 2. Ensure Provider API Key is Set ElevenLabs requires a valid API key:
openclaw config set messages.tts.elevenlabs.apiKey "your-elevenlabs-api-key"Get your API key from: https://elevenlabs.io/app/settings/api-keys
### 3. Restart the Gateway After any TTS configuration change, restart:
openclaw gateway restart### 4. Check Channel Voice Capability
Not all channels support voice notes the same way. Verify your channel:
- Telegram: Full voice note support via asVoice: true
- Discord: Voice attachments work as audio files
- Slack/iMessage: May render as file attachments
### 5. Test TTS Directly
Ask the agent explicitly to speak:
``
Say "hello world" as a voice note
Or use the /tts command if available:
/tts Hello, this is a test
### 6. Check for Model Instruction Issues
Some models may not properly handle TTS tool calls. If using a non-Anthropic model (like Kimi K2.5), ensure it understands the TTS tool schema. You may need to add explicit instructions in your IDENTITY.md:
``markdown
## Voice Output
When TTS is enabled, use the tts tool to convert responses to speech.
Do not output MEDIA: paths as text - the system handles audio attachment automatically.
--- ## Hands-Free Voice-Only Setup (Car/Bluetooth) For safe driving use with Bluetooth, configure full voice loop:
### 1. Enable Always-On TTS Make the agent always respond with voice:
openclaw config set messages.tts.auto "always"openclaw config set messages.tts.provider "elevenlabs"### 2. Configure Speech-to-Text (STT) Enable STT so you can speak instead of typing:
openclaw config set messages.stt.provider "elevenlabs"openclaw config set messages.stt.auto trueAlternative STT providers:
- whisper - OpenAI Whisper (local or API)
- deepgram - Fast and accurate
- google - Google Cloud Speech
### 3. Telegram Voice Message Workflow For Telegram, the safest hands-free flow: 1. Send voice messages - Telegram converts your speech to a voice note 2. OpenClaw transcribes - STT converts your voice to text 3. Agent processes - Responds to your request 4. TTS converts - Response becomes a voice note 5. Bluetooth plays - You hear the response through car speakers
### 4. iOS Shortcuts Integration (Advanced) Create an iOS Shortcut for truly hands-free activation: 1. Create a Shortcut that sends a message to your Telegram bot 2. Use "Hey Siri" to trigger the shortcut 3. Dictate your message 4. The agent responds via voice note Example Shortcut actions: - Get input from Spoken Text - Send message to Telegram chat - Wait for response - Play audio
### 5. Safety Configuration
For driving, add guardrails in your IDENTITY.md:
``markdown
## Driving Mode
Keep responses brief and clear for audio playback.
No visual elements (tables, code blocks) - voice-friendly only.
Confirm critical actions verbally before executing.
š„ Your AI should run your business, not just answer questions.
We'll show you how.$97/mo (going to $197 soon)
š Quick Commands
| Command | Description |
|---|---|
| openclaw config get messages.tts | View current TTS configuration |
| openclaw config set messages.tts.provider "elevenlabs" | Set ElevenLabs as TTS provider |
| openclaw config set messages.tts.auto "always" | Enable automatic TTS for all responses |
| openclaw config set messages.tts.elevenlabs.apiKey "YOUR_KEY" | Set ElevenLabs API key |
| openclaw config set messages.tts.elevenlabs.voiceId "VOICE_ID" | Set specific ElevenLabs voice |
| openclaw config set messages.stt.provider "elevenlabs" | Set STT provider for voice input |
| openclaw config set messages.stt.auto true | Enable automatic speech-to-text |
| openclaw gateway restart | Restart gateway after config changes |
Related Issues
š You Might Also Like
Chat with your AI assistant through WhatsApp, the messaging app you already use every day. Send voice notes, share files, and get things done without switching apps.
Voice-Controlled AI Assistant ā Talk Instead of Type
Control your AI assistant with your voice through WhatsApp or Telegram. Send voice notes, get spoken responses. Hands-free AI that works while you multitask.
How to Run AI Locally: Complete Step-by-Step Guide
Want to run AI on your own computer ā no cloud, no subscriptions, no data leaving your machine? This tutorial walks you through everything from Ollama to full assistant setup.
How to Set Up OpenClaw on Mac Mini (Perfect Always-On Setup)
The ideal dedicated AI assistant setup. Buy once, runs forever, no monthly fees for hosting.
š Your AI should run your business.
Weekly live builds + template vault. We'll show you how to make AI actually work.$97/mo (going to $197 soon)
Join Vibe Combinator ā