ci: Enhance documentation deployment workflow with debugging and manual trigger

- Add manual workflow dispatch trigger - Include diagnostic logging steps for mkdocs build process - Modify artifact upload path to match project structure - Add verbose output for build configuration and directory contents
ci: Modernize GitHub Actions workflow for documentation deployment
2025-02-06 05:43:24 +01:00 · 2025-02-06 04:49:42 +01:00 · 2025-02-06 04:30:20 +01:00
4 changed files with 477 additions and 36 deletions
--- a/.github/workflows/deploy-docs.yml
+++ b/.github/workflows/deploy-docs.yml
@@ -1,4 +1,5 @@
 name: Deploy Documentation
 on:
  push:
    branches:
@@ -6,29 +7,70 @@ on:
    paths:
      - 'docs/**'
      - 'mkdocs.yml'
  # Allow manual trigger
  workflow_dispatch:
 # Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
 permissions:
  contents: read
  pages: write
  id-token: write
 # Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
 concurrency:
  group: "pages"
  cancel-in-progress: false
 jobs:
-  deploy:
+  build:
    runs-on: ubuntu-latest
    permissions:
      contents: write
    steps:
-      - uses: actions/checkout@v4
+      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
-      - uses: actions/setup-python@v5
+
      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.x'
          cache: 'pip'
      - name: Setup Pages
        uses: actions/configure-pages@v4
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r docs/requirements.txt
-      - name: Configure Git
+
      - name: List mkdocs configuration
        run: |
-          git config --global user.name "github-actions[bot]"
+          echo "Current directory contents:"
-          git config --global user.email "github-actions[bot]@users.noreply.github.com"
+          ls -la
-      - name: Build and Deploy
+          echo "MkDocs version:"
          mkdocs --version
          echo "MkDocs configuration:"
          cat mkdocs.yml
      - name: Build documentation
        run: |
          mkdocs build --strict
-          mkdocs gh-deploy --force --clean 
+          echo "Build output contents:"
          ls -la site/advanced-homeassistant-mcp
      - name: Upload artifact
        uses: actions/upload-pages-artifact@v3
        with:
          path: ./site/advanced-homeassistant-mcp
  deploy:
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    needs: build
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v4 
--- a/README.md
+++ b/README.md
@@ -12,12 +12,22 @@ MCP (Model Context Protocol) Server is a lightweight integration tool for Home A
 - 📡 WebSocket/Server-Sent Events (SSE) for state updates
 - 🤖 Simple automation rule management
 - 🔐 JWT-based authentication
 - 🎤 Real-time device control and monitoring
 - 🎤 Server-Sent Events (SSE) for live updates
 - 🎤 Comprehensive logging
 - 🎤 Optional speech features:
  - 🎤 Wake word detection ("hey jarvis", "ok google", "alexa")
  - 🎤 Speech-to-text using fast-whisper
  - 🎤 Multiple language support
  - 🎤 GPU acceleration support
 ## Prerequisites 📋
 - 🚀 Bun runtime (v1.0.26+)
 - 🏡 Home Assistant instance
- 🐳 Docker (optional, recommended for deployment)
+- 🐳 Docker (optional, recommended for deployment and speech features)
 - 🖥️ Node.js 18+ (optional, for speech features)
 - 🖥️ NVIDIA GPU with CUDA support (optional, for faster speech processing)
 ## Installation 🛠️
@@ -30,7 +40,7 @@ cd homeassistant-mcp
 # Copy and edit environment configuration
 cp .env.example .env
-# Edit .env with your Home Assistant credentials
+# Edit .env with your Home Assistant credentials and speech features settings
 # Build and start containers
 docker compose up -d --build
@@ -79,33 +89,69 @@ ws.onmessage = (event) => {
 };
 ```
-## Current Limitations ⚠️
+## Speech Features (Optional)
- 🎙️ Basic voice command support (work in progress)
+The MCP Server includes optional speech processing capabilities:
 - 🧠 Limited advanced NLP capabilities
 - 🔗 Minimal third-party device integration
 - 🐛 Early-stage error handling
-## Contributing 🤝
+### Prerequisites
 1. Docker installed and running
 2. NVIDIA GPU with CUDA support (optional)
 3. At least 4GB RAM (8GB+ recommended for larger models)
-1. Fork the repository
+### Setup
-2. Create a feature branch:
+
 1. Enable speech features in your .env:
 ```bash
-   git checkout -b feature/your-feature
+ENABLE_SPEECH_FEATURES=true
 ENABLE_WAKE_WORD=true
 ENABLE_SPEECH_TO_TEXT=true
 WHISPER_MODEL_PATH=/models
 WHISPER_MODEL_TYPE=base
 ```
-3. Make your changes
+
-4. Run tests:
+2. Start the speech services:
 ```bash
-   bun test
+docker-compose up -d
 ```
 5. Submit a pull request
-## Roadmap 🗺️
+### Available Models
- 🎤 Enhance voice command processing
+Choose a model based on your needs:
- 🔌 Improve device compatibility
+- `tiny.en`: Fastest, basic accuracy
- 🤖 Expand automation capabilities
+- `base.en`: Good balance (recommended)
- 🛡️ Implement more robust error handling
+- `small.en`: Better accuracy, slower
 - `medium.en`: High accuracy, resource intensive
 - `large-v2`: Best accuracy, very resource intensive
 ### Usage
 1. Wake word detection listens for:
   - "hey jarvis"
   - "ok google"
   - "alexa"
 2. After wake word detection:
   - Audio is automatically captured
   - Speech is transcribed
   - Commands are processed
 3. Manual transcription is also available:
 ```typescript
 const speech = speechService.getSpeechToText();
 const text = await speech.transcribe(audioBuffer);
 ```
 ## Configuration
 See [Configuration Guide](docs/configuration.md) for detailed settings.
 ## API Documentation
 See [API Documentation](docs/api/index.md) for available endpoints.
 ## Development
 See [Development Guide](docs/development/index.md) for contribution guidelines.
 ## License 📄
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -34,6 +34,14 @@ JWT_SECRET=your_secret_key
  - `MAX_CLIENTS`: Maximum concurrent clients (default: 1000)
  - `PING_INTERVAL`: Keep-alive ping interval in ms (default: 30000)
 ### Speech Features (Optional)
 - `ENABLE_SPEECH_FEATURES`: Enable speech processing features (default: false)
 - `ENABLE_WAKE_WORD`: Enable wake word detection (default: false)
 - `ENABLE_SPEECH_TO_TEXT`: Enable speech-to-text conversion (default: false)
 - `WHISPER_MODEL_PATH`: Path to Whisper models directory (default: /models)
 - `WHISPER_MODEL_TYPE`: Whisper model type (default: base)
  - Available models: tiny.en, base.en, small.en, medium.en, large-v2
 ## Environment Variables
 All configuration is managed through environment variables:
@@ -57,6 +65,13 @@ LOG_MAX_SIZE=20m
 LOG_MAX_DAYS=14d
 LOG_COMPRESS=true
 LOG_REQUESTS=true
 # Speech Features (Optional)
 ENABLE_SPEECH_FEATURES=false
 ENABLE_WAKE_WORD=false
 ENABLE_SPEECH_TO_TEXT=false
 WHISPER_MODEL_PATH=/models
 WHISPER_MODEL_TYPE=base
 ```
 ## Advanced Configuration
@@ -86,6 +101,26 @@ LOGGING: {
 }
 ```
 ### Speech-to-Text Configuration
 When speech features are enabled, you can configure the following options:
 ```typescript
 SPEECH: {
  ENABLED: false,  // Master switch for all speech features
  WAKE_WORD_ENABLED: false,  // Enable wake word detection
  SPEECH_TO_TEXT_ENABLED: false,  // Enable speech-to-text
  WHISPER_MODEL_PATH: "/models",  // Path to Whisper models
  WHISPER_MODEL_TYPE: "base",  // Model type to use
 }
 ```
 Available Whisper models:
 - `tiny.en`: Fastest, lowest accuracy
 - `base.en`: Good balance of speed and accuracy
 - `small.en`: Better accuracy, slower
 - `medium.en`: High accuracy, much slower
 - `large-v2`: Best accuracy, very slow
 For production deployments, we recommend using system tools like `logrotate` for log management.
 Example logrotate configuration (`/etc/logrotate.d/mcp-server`):
@@ -109,13 +144,15 @@ Example logrotate configuration (`/etc/logrotate.d/mcp-server`):
 4. Enable SSL/TLS in production (preferably via reverse proxy)
 5. Monitor log files for issues
 6. Regularly rotate logs in production
 7. Start with smaller Whisper models and upgrade if needed
 8. Consider GPU acceleration for larger Whisper models
 ## Validation
 The server validates configuration on startup using Zod schemas:
 - Required fields are checked (e.g., HASS_TOKEN)
 - Value types are verified
- Enums are validated (e.g., LOG_LEVEL)
+- Enums are validated (e.g., LOG_LEVEL, WHISPER_MODEL_TYPE)
 - Default values are applied when not specified
 ## Troubleshooting
@@ -125,5 +162,109 @@ Common configuration issues:
 2. Invalid environment variable values
 3. Permission issues with log directories
 4. Rate limiting too restrictive
 5. Speech model loading failures
 6. Docker not available for speech features
 7. Insufficient system resources for larger models
 See the [Troubleshooting Guide](troubleshooting.md) for solutions.
 # Configuration Guide
 This document describes all available configuration options for the Home Assistant MCP Server.
 ## Environment Variables
 ### Required Settings
 ```bash
 # Server Configuration
 PORT=3000                      # Server port
 HOST=localhost                 # Server host
 # Home Assistant
 HASS_URL=http://localhost:8123 # Home Assistant URL
 HASS_TOKEN=your_token         # Long-lived access token
 # Security
 JWT_SECRET=your_secret        # JWT signing secret
 ```
 ### Optional Settings
 ```bash
 # Rate Limiting
 RATE_LIMIT_WINDOW=60000       # Time window in ms (default: 60000)
 RATE_LIMIT_MAX=100           # Max requests per window (default: 100)
 # Logging
 LOG_LEVEL=info               # debug, info, warn, error (default: info)
 LOG_DIR=logs                 # Log directory (default: logs)
 LOG_MAX_SIZE=10m            # Max log file size (default: 10m)
 LOG_MAX_FILES=5             # Max number of log files (default: 5)
 # WebSocket/SSE
 WS_HEARTBEAT=30000          # WebSocket heartbeat interval in ms (default: 30000)
 SSE_RETRY=3000             # SSE retry interval in ms (default: 3000)
 # Speech Features
 ENABLE_SPEECH_FEATURES=false # Enable speech processing (default: false)
 ENABLE_WAKE_WORD=false      # Enable wake word detection (default: false)
 ENABLE_SPEECH_TO_TEXT=false # Enable speech-to-text (default: false)
 # Speech Model Configuration
 WHISPER_MODEL_PATH=/models  # Path to whisper models (default: /models)
 WHISPER_MODEL_TYPE=base     # Model type: tiny|base|small|medium|large-v2 (default: base)
 WHISPER_LANGUAGE=en        # Primary language (default: en)
 WHISPER_TASK=transcribe    # Task type: transcribe|translate (default: transcribe)
 WHISPER_DEVICE=cuda        # Processing device: cpu|cuda (default: cuda if available, else cpu)
 # Wake Word Configuration
 WAKE_WORDS=hey jarvis,ok google,alexa  # Comma-separated wake words (default: hey jarvis)
 WAKE_WORD_SENSITIVITY=0.5   # Detection sensitivity 0-1 (default: 0.5)
 ```
 ## Speech Features
 ### Model Selection
 Choose a model based on your needs:
 | Model      | Size  | Memory Required | Speed | Accuracy |
 |------------|-------|-----------------|-------|----------|
 | tiny.en    | 75MB  | 1GB            | Fast  | Basic    |
 | base.en    | 150MB | 2GB            | Good  | Good     |
 | small.en   | 500MB | 4GB            | Med   | Better   |
 | medium.en  | 1.5GB | 8GB            | Slow  | High     |
 | large-v2   | 3GB   | 16GB           | Slow  | Best     |
 ### GPU Acceleration
 When `WHISPER_DEVICE=cuda`:
 - NVIDIA GPU with CUDA support required
 - Significantly faster processing
 - Higher memory requirements
 ### Wake Word Detection
 - Multiple wake words supported via comma-separated list
 - Adjustable sensitivity (0-1):
  - Lower values: Fewer false positives, may miss some triggers
  - Higher values: More responsive, may have false triggers
  - Default (0.5): Balanced detection
 ### Best Practices
 1. Model Selection:
   - Start with `base.en` model
   - Upgrade if better accuracy needed
   - Downgrade if performance issues
 2. Resource Management:
   - Monitor memory usage
   - Use GPU acceleration when available
   - Consider model size vs available resources
 3. Wake Word Configuration:
   - Use distinct wake words
   - Adjust sensitivity based on environment
   - Limit number of wake words for better performance 
--- a/docs/features/speech.md
+++ b/docs/features/speech.md
@@ -0,0 +1,212 @@
 # Speech Features
 The Home Assistant MCP Server includes powerful speech processing capabilities powered by fast-whisper and custom wake word detection. This guide explains how to set up and use these features effectively.
 ## Overview
 The speech processing system consists of two main components:
 1. Wake Word Detection - Listens for specific trigger phrases
 2. Speech-to-Text - Transcribes spoken commands using fast-whisper
 ## Setup
 ### Prerequisites
 1. Docker environment:
 ```bash
 docker --version  # Should be 20.10.0 or higher
 ```
 2. For GPU acceleration:
 - NVIDIA GPU with CUDA support
 - NVIDIA Container Toolkit installed
 - NVIDIA drivers 450.80.02 or higher
 ### Installation
 1. Enable speech features in your `.env`:
 ```bash
 ENABLE_SPEECH_FEATURES=true
 ENABLE_WAKE_WORD=true
 ENABLE_SPEECH_TO_TEXT=true
 ```
 2. Configure model settings:
 ```bash
 WHISPER_MODEL_PATH=/models
 WHISPER_MODEL_TYPE=base
 WHISPER_LANGUAGE=en
 WHISPER_TASK=transcribe
 WHISPER_DEVICE=cuda  # or cpu
 ```
 3. Start the services:
 ```bash
 docker-compose up -d
 ```
 ## Usage
 ### Wake Word Detection
 The wake word detector continuously listens for configured trigger phrases. Default wake words:
 - "hey jarvis"
 - "ok google"
 - "alexa"
 Custom wake words can be configured:
 ```bash
 WAKE_WORDS=computer,jarvis,assistant
 ```
 When a wake word is detected:
 1. The system starts recording audio
 2. Audio is processed through the speech-to-text pipeline
 3. The resulting command is processed by the server
 ### Speech-to-Text
 #### Automatic Transcription
 After wake word detection:
 1. Audio is automatically captured (default: 5 seconds)
 2. The audio is transcribed using the configured whisper model
 3. The transcribed text is processed as a command
 #### Manual Transcription
 You can also manually transcribe audio using the API:
 ```typescript
 // Using the TypeScript client
 import { SpeechService } from '@ha-mcp/client';
 const speech = new SpeechService();
 // Transcribe from audio buffer
 const buffer = await getAudioBuffer();
 const text = await speech.transcribe(buffer);
 // Transcribe from file
 const text = await speech.transcribeFile('command.wav');
 ```
 ```javascript
 // Using the REST API
 POST /api/speech/transcribe
 Content-Type: multipart/form-data
 file: <audio file>
 ```
 ### Event Handling
 The system emits various events during speech processing:
 ```typescript
 speech.on('wakeWord', (word: string) => {
  console.log(`Wake word detected: ${word}`);
 });
 speech.on('listening', () => {
  console.log('Listening for command...');
 });
 speech.on('transcribing', () => {
  console.log('Processing speech...');
 });
 speech.on('transcribed', (text: string) => {
  console.log(`Transcribed text: ${text}`);
 });
 speech.on('error', (error: Error) => {
  console.error('Speech processing error:', error);
 });
 ```
 ## Performance Optimization
 ### Model Selection
 Choose an appropriate model based on your needs:
 1. Resource-constrained environments:
   - Use `tiny.en` or `base.en`
   - Run on CPU if GPU unavailable
   - Limit concurrent processing
 2. High-accuracy requirements:
   - Use `small.en` or `medium.en`
   - Enable GPU acceleration
   - Increase audio quality
 3. Production environments:
   - Use `base.en` or `small.en`
   - Enable GPU acceleration
   - Configure appropriate timeouts
 ### GPU Acceleration
 When using GPU acceleration:
 1. Monitor GPU memory usage:
 ```bash
 nvidia-smi -l 1
 ```
 2. Adjust model size if needed:
 ```bash
 WHISPER_MODEL_TYPE=small  # Decrease if GPU memory limited
 ```
 3. Configure processing device:
 ```bash
 WHISPER_DEVICE=cuda      # Use GPU
 WHISPER_DEVICE=cpu      # Use CPU if GPU unavailable
 ```
 ## Troubleshooting
 ### Common Issues
 1. Wake word detection not working:
   - Check microphone permissions
   - Adjust `WAKE_WORD_SENSITIVITY`
   - Verify wake words configuration
 2. Poor transcription quality:
   - Check audio input quality
   - Try a larger model
   - Verify language settings
 3. Performance issues:
   - Monitor resource usage
   - Consider smaller model
   - Check GPU acceleration status
 ### Logging
 Enable debug logging for detailed information:
 ```bash
 LOG_LEVEL=debug
 ```
 Speech-specific logs will be tagged with `[SPEECH]` prefix.
 ## Security Considerations
 1. Audio Privacy:
   - Audio is processed locally
   - No data sent to external services
   - Temporary files automatically cleaned
 2. Access Control:
   - Speech endpoints require authentication
   - Rate limiting applies to transcription
   - Configurable command restrictions
 3. Resource Protection:
   - Timeouts prevent hanging
   - Memory limits enforced
   - Graceful error handling