docs: Add comprehensive speech features documentation and configuration

- Introduce detailed documentation for speech processing capabilities - Add new speech features documentation in `docs/features/speech.md` - Update README with speech feature highlights and prerequisites - Expand configuration documentation with speech-related settings - Include model selection, GPU acceleration, and best practices guidance
2025-02-06 04:30:20 +01:00
parent f0ff3d5e5a
commit cc9eede856
3 changed files with 425 additions and 26 deletions
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -34,6 +34,14 @@ JWT_SECRET=your_secret_key
  - `MAX_CLIENTS`: Maximum concurrent clients (default: 1000)
  - `PING_INTERVAL`: Keep-alive ping interval in ms (default: 30000)

+### Speech Features (Optional)
+- `ENABLE_SPEECH_FEATURES`: Enable speech processing features (default: false)
+- `ENABLE_WAKE_WORD`: Enable wake word detection (default: false)
+- `ENABLE_SPEECH_TO_TEXT`: Enable speech-to-text conversion (default: false)
+- `WHISPER_MODEL_PATH`: Path to Whisper models directory (default: /models)
+- `WHISPER_MODEL_TYPE`: Whisper model type (default: base)
+  - Available models: tiny.en, base.en, small.en, medium.en, large-v2
+
 ## Environment Variables

 All configuration is managed through environment variables:
@@ -57,6 +65,13 @@ LOG_MAX_SIZE=20m
 LOG_MAX_DAYS=14d
 LOG_COMPRESS=true
 LOG_REQUESTS=true
+
+# Speech Features (Optional)
+ENABLE_SPEECH_FEATURES=false
+ENABLE_WAKE_WORD=false
+ENABLE_SPEECH_TO_TEXT=false
+WHISPER_MODEL_PATH=/models
+WHISPER_MODEL_TYPE=base
 ```

 ## Advanced Configuration
@@ -86,6 +101,26 @@ LOGGING: {
 }
 ```

+### Speech-to-Text Configuration
+When speech features are enabled, you can configure the following options:
+
+```typescript
+SPEECH: {
+  ENABLED: false,  // Master switch for all speech features
+  WAKE_WORD_ENABLED: false,  // Enable wake word detection
+  SPEECH_TO_TEXT_ENABLED: false,  // Enable speech-to-text
+  WHISPER_MODEL_PATH: "/models",  // Path to Whisper models
+  WHISPER_MODEL_TYPE: "base",  // Model type to use
+}
+```
+
+Available Whisper models:
+- `tiny.en`: Fastest, lowest accuracy
+- `base.en`: Good balance of speed and accuracy
+- `small.en`: Better accuracy, slower
+- `medium.en`: High accuracy, much slower
+- `large-v2`: Best accuracy, very slow
+
 For production deployments, we recommend using system tools like `logrotate` for log management.

 Example logrotate configuration (`/etc/logrotate.d/mcp-server`):
@@ -109,13 +144,15 @@ Example logrotate configuration (`/etc/logrotate.d/mcp-server`):
 4. Enable SSL/TLS in production (preferably via reverse proxy)
 5. Monitor log files for issues
 6. Regularly rotate logs in production
+7. Start with smaller Whisper models and upgrade if needed
+8. Consider GPU acceleration for larger Whisper models

 ## Validation

 The server validates configuration on startup using Zod schemas:
 - Required fields are checked (e.g., HASS_TOKEN)
 - Value types are verified
- Enums are validated (e.g., LOG_LEVEL)
+- Enums are validated (e.g., LOG_LEVEL, WHISPER_MODEL_TYPE)
 - Default values are applied when not specified

 ## Troubleshooting
@@ -125,5 +162,109 @@ Common configuration issues:
 2. Invalid environment variable values
 3. Permission issues with log directories
 4. Rate limiting too restrictive
+5. Speech model loading failures
+6. Docker not available for speech features
+7. Insufficient system resources for larger models

-See the [Troubleshooting Guide](troubleshooting.md) for solutions. 
+See the [Troubleshooting Guide](troubleshooting.md) for solutions.
+
+# Configuration Guide
+
+This document describes all available configuration options for the Home Assistant MCP Server.
+
+## Environment Variables
+
+### Required Settings
+
+```bash
+# Server Configuration
+PORT=3000                      # Server port
+HOST=localhost                 # Server host
+
+# Home Assistant
+HASS_URL=http://localhost:8123 # Home Assistant URL
+HASS_TOKEN=your_token         # Long-lived access token
+
+# Security
+JWT_SECRET=your_secret        # JWT signing secret
+```
+
+### Optional Settings
+
+```bash
+# Rate Limiting
+RATE_LIMIT_WINDOW=60000       # Time window in ms (default: 60000)
+RATE_LIMIT_MAX=100           # Max requests per window (default: 100)
+
+# Logging
+LOG_LEVEL=info               # debug, info, warn, error (default: info)
+LOG_DIR=logs                 # Log directory (default: logs)
+LOG_MAX_SIZE=10m            # Max log file size (default: 10m)
+LOG_MAX_FILES=5             # Max number of log files (default: 5)
+
+# WebSocket/SSE
+WS_HEARTBEAT=30000          # WebSocket heartbeat interval in ms (default: 30000)
+SSE_RETRY=3000             # SSE retry interval in ms (default: 3000)
+
+# Speech Features
+ENABLE_SPEECH_FEATURES=false # Enable speech processing (default: false)
+ENABLE_WAKE_WORD=false      # Enable wake word detection (default: false)
+ENABLE_SPEECH_TO_TEXT=false # Enable speech-to-text (default: false)
+
+# Speech Model Configuration
+WHISPER_MODEL_PATH=/models  # Path to whisper models (default: /models)
+WHISPER_MODEL_TYPE=base     # Model type: tiny|base|small|medium|large-v2 (default: base)
+WHISPER_LANGUAGE=en        # Primary language (default: en)
+WHISPER_TASK=transcribe    # Task type: transcribe|translate (default: transcribe)
+WHISPER_DEVICE=cuda        # Processing device: cpu|cuda (default: cuda if available, else cpu)
+
+# Wake Word Configuration
+WAKE_WORDS=hey jarvis,ok google,alexa  # Comma-separated wake words (default: hey jarvis)
+WAKE_WORD_SENSITIVITY=0.5   # Detection sensitivity 0-1 (default: 0.5)
+```
+
+## Speech Features
+
+### Model Selection
+
+Choose a model based on your needs:
+
+| Model      | Size  | Memory Required | Speed | Accuracy |
+|------------|-------|-----------------|-------|----------|
+| tiny.en    | 75MB  | 1GB            | Fast  | Basic    |
+| base.en    | 150MB | 2GB            | Good  | Good     |
+| small.en   | 500MB | 4GB            | Med   | Better   |
+| medium.en  | 1.5GB | 8GB            | Slow  | High     |
+| large-v2   | 3GB   | 16GB           | Slow  | Best     |
+
+### GPU Acceleration
+
+When `WHISPER_DEVICE=cuda`:
+- NVIDIA GPU with CUDA support required
+- Significantly faster processing
+- Higher memory requirements
+
+### Wake Word Detection
+
+- Multiple wake words supported via comma-separated list
+- Adjustable sensitivity (0-1):
+  - Lower values: Fewer false positives, may miss some triggers
+  - Higher values: More responsive, may have false triggers
+  - Default (0.5): Balanced detection
+
+### Best Practices
+
+1. Model Selection:
+   - Start with `base.en` model
+   - Upgrade if better accuracy needed
+   - Downgrade if performance issues
+
+2. Resource Management:
+   - Monitor memory usage
+   - Use GPU acceleration when available
+   - Consider model size vs available resources
+
+3. Wake Word Configuration:
+   - Use distinct wake words
+   - Adjust sensitivity based on environment
+   - Limit number of wake words for better performance