# Speech Features The Home Assistant MCP Server includes powerful speech processing capabilities powered by fast-whisper and custom wake word detection. This guide explains how to set up and use these features effectively. ## Overview The speech processing system consists of two main components: 1. Wake Word Detection - Listens for specific trigger phrases 2. Speech-to-Text - Transcribes spoken commands using fast-whisper ## Setup ### Prerequisites 1. Docker environment: ```bash docker --version # Should be 20.10.0 or higher ``` 2. For GPU acceleration: - NVIDIA GPU with CUDA support - NVIDIA Container Toolkit installed - NVIDIA drivers 450.80.02 or higher ### Installation 1. Enable speech features in your `.env`: ```bash ENABLE_SPEECH_FEATURES=true ENABLE_WAKE_WORD=true ENABLE_SPEECH_TO_TEXT=true ``` 2. Configure model settings: ```bash WHISPER_MODEL_PATH=/models WHISPER_MODEL_TYPE=base WHISPER_LANGUAGE=en WHISPER_TASK=transcribe WHISPER_DEVICE=cuda # or cpu ``` 3. Start the services: ```bash docker-compose up -d ``` ## Usage ### Wake Word Detection The wake word detector continuously listens for configured trigger phrases. Default wake words: - "hey jarvis" - "ok google" - "alexa" Custom wake words can be configured: ```bash WAKE_WORDS=computer,jarvis,assistant ``` When a wake word is detected: 1. The system starts recording audio 2. Audio is processed through the speech-to-text pipeline 3. The resulting command is processed by the server ### Speech-to-Text #### Automatic Transcription After wake word detection: 1. Audio is automatically captured (default: 5 seconds) 2. The audio is transcribed using the configured whisper model 3. The transcribed text is processed as a command #### Manual Transcription You can also manually transcribe audio using the API: ```typescript // Using the TypeScript client import { SpeechService } from '@ha-mcp/client'; const speech = new SpeechService(); // Transcribe from audio buffer const buffer = await getAudioBuffer(); const text = await speech.transcribe(buffer); // Transcribe from file const text = await speech.transcribeFile('command.wav'); ``` ```javascript // Using the REST API POST /api/speech/transcribe Content-Type: multipart/form-data file: