- Create comprehensive README for speech-to-text integration - Implement example script demonstrating wake word detection and transcription - Add Windows batch script for MCP server startup - Include detailed usage instructions, customization options, and troubleshooting guide
Speech-to-Text Examples
This directory contains examples demonstrating how to use the speech-to-text integration with wake word detection.
Prerequisites
- Make sure you have Docker installed and running
- Build and start the services:
docker-compose up -d
Running the Example
-
Install dependencies:
npm install -
Run the example:
npm run example:speechOr using
ts-nodedirectly:npx ts-node examples/speech-to-text-example.ts
Features Demonstrated
-
Wake Word Detection
- Listens for wake words: "hey jarvis", "ok google", "alexa"
- Automatically saves audio when wake word is detected
- Transcribes the detected speech
-
Manual Transcription
- Example of how to transcribe audio files manually
- Supports different models and configurations
-
Event Handling
- Wake word detection events
- Transcription results
- Progress updates
- Error handling
Example Output
When a wake word is detected, you'll see output like this:
🎤 Wake word detected!
Timestamp: 20240203_123456
Audio file: /path/to/audio/wake_word_20240203_123456.wav
Metadata file: /path/to/audio/wake_word_20240203_123456.wav.json
📝 Transcription result:
Full text: This is what was said after the wake word.
Segments:
1. [0.00s - 1.52s] (95.5% confidence)
"This is what was said"
2. [1.52s - 2.34s] (98.2% confidence)
"after the wake word."
Customization
You can customize the behavior by:
- Changing the wake word models in
docker/speech/Dockerfile - Modifying transcription options in the example file
- Adding your own event handlers
- Implementing different audio processing logic
Troubleshooting
-
Docker Issues
- Make sure Docker is running
- Check container logs:
docker-compose logs fast-whisper - Verify container is up:
docker ps
-
Audio Issues
- Check audio device permissions
- Verify audio file format (WAV files recommended)
- Check audio file permissions
-
Performance Issues
- Try using a smaller model (tiny.en or base.en)
- Adjust beam size and patience parameters
- Consider using GPU acceleration if available