Compare commits
3 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
fca193b5b2 | ||
|
|
cc9eede856 | ||
|
|
f0ff3d5e5a |
48
.github/workflows/deploy-docs.yml
vendored
48
.github/workflows/deploy-docs.yml
vendored
@@ -1,4 +1,5 @@
|
|||||||
name: Deploy Documentation
|
name: Deploy Documentation
|
||||||
|
|
||||||
on:
|
on:
|
||||||
push:
|
push:
|
||||||
branches:
|
branches:
|
||||||
@@ -7,28 +8,53 @@ on:
|
|||||||
- 'docs/**'
|
- 'docs/**'
|
||||||
- 'mkdocs.yml'
|
- 'mkdocs.yml'
|
||||||
|
|
||||||
|
# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
|
||||||
|
permissions:
|
||||||
|
contents: read
|
||||||
|
pages: write
|
||||||
|
id-token: write
|
||||||
|
|
||||||
|
# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
|
||||||
|
concurrency:
|
||||||
|
group: "pages"
|
||||||
|
cancel-in-progress: false
|
||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
deploy:
|
build:
|
||||||
runs-on: ubuntu-latest
|
runs-on: ubuntu-latest
|
||||||
permissions:
|
|
||||||
contents: write
|
|
||||||
steps:
|
steps:
|
||||||
- uses: actions/checkout@v4
|
- uses: actions/checkout@v4
|
||||||
with:
|
with:
|
||||||
fetch-depth: 0
|
fetch-depth: 0
|
||||||
|
|
||||||
- uses: actions/setup-python@v5
|
- uses: actions/setup-python@v5
|
||||||
with:
|
with:
|
||||||
python-version: '3.x'
|
python-version: '3.x'
|
||||||
cache: 'pip'
|
cache: 'pip'
|
||||||
|
|
||||||
|
- name: Setup Pages
|
||||||
|
uses: actions/configure-pages@v4
|
||||||
|
|
||||||
- name: Install dependencies
|
- name: Install dependencies
|
||||||
run: |
|
run: |
|
||||||
python -m pip install --upgrade pip
|
python -m pip install --upgrade pip
|
||||||
pip install -r docs/requirements.txt
|
pip install -r docs/requirements.txt
|
||||||
- name: Configure Git
|
|
||||||
run: |
|
- name: Build documentation
|
||||||
git config --global user.name "github-actions[bot]"
|
run: mkdocs build --strict
|
||||||
git config --global user.email "github-actions[bot]@users.noreply.github.com"
|
|
||||||
- name: Build and Deploy
|
- name: Upload artifact
|
||||||
run: |
|
uses: actions/upload-pages-artifact@v3
|
||||||
mkdocs build --strict
|
with:
|
||||||
mkdocs gh-deploy --force --clean
|
path: ./site
|
||||||
|
|
||||||
|
deploy:
|
||||||
|
environment:
|
||||||
|
name: github-pages
|
||||||
|
url: ${{ steps.deployment.outputs.page_url }}
|
||||||
|
needs: build
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Deploy to GitHub Pages
|
||||||
|
id: deployment
|
||||||
|
uses: actions/deploy-pages@v4
|
||||||
94
README.md
94
README.md
@@ -12,12 +12,22 @@ MCP (Model Context Protocol) Server is a lightweight integration tool for Home A
|
|||||||
- 📡 WebSocket/Server-Sent Events (SSE) for state updates
|
- 📡 WebSocket/Server-Sent Events (SSE) for state updates
|
||||||
- 🤖 Simple automation rule management
|
- 🤖 Simple automation rule management
|
||||||
- 🔐 JWT-based authentication
|
- 🔐 JWT-based authentication
|
||||||
|
- 🎤 Real-time device control and monitoring
|
||||||
|
- 🎤 Server-Sent Events (SSE) for live updates
|
||||||
|
- 🎤 Comprehensive logging
|
||||||
|
- 🎤 Optional speech features:
|
||||||
|
- 🎤 Wake word detection ("hey jarvis", "ok google", "alexa")
|
||||||
|
- 🎤 Speech-to-text using fast-whisper
|
||||||
|
- 🎤 Multiple language support
|
||||||
|
- 🎤 GPU acceleration support
|
||||||
|
|
||||||
## Prerequisites 📋
|
## Prerequisites 📋
|
||||||
|
|
||||||
- 🚀 Bun runtime (v1.0.26+)
|
- 🚀 Bun runtime (v1.0.26+)
|
||||||
- 🏡 Home Assistant instance
|
- 🏡 Home Assistant instance
|
||||||
- 🐳 Docker (optional, recommended for deployment)
|
- 🐳 Docker (optional, recommended for deployment and speech features)
|
||||||
|
- 🖥️ Node.js 18+ (optional, for speech features)
|
||||||
|
- 🖥️ NVIDIA GPU with CUDA support (optional, for faster speech processing)
|
||||||
|
|
||||||
## Installation 🛠️
|
## Installation 🛠️
|
||||||
|
|
||||||
@@ -30,7 +40,7 @@ cd homeassistant-mcp
|
|||||||
|
|
||||||
# Copy and edit environment configuration
|
# Copy and edit environment configuration
|
||||||
cp .env.example .env
|
cp .env.example .env
|
||||||
# Edit .env with your Home Assistant credentials
|
# Edit .env with your Home Assistant credentials and speech features settings
|
||||||
|
|
||||||
# Build and start containers
|
# Build and start containers
|
||||||
docker compose up -d --build
|
docker compose up -d --build
|
||||||
@@ -79,33 +89,69 @@ ws.onmessage = (event) => {
|
|||||||
};
|
};
|
||||||
```
|
```
|
||||||
|
|
||||||
## Current Limitations ⚠️
|
## Speech Features (Optional)
|
||||||
|
|
||||||
- 🎙️ Basic voice command support (work in progress)
|
The MCP Server includes optional speech processing capabilities:
|
||||||
- 🧠 Limited advanced NLP capabilities
|
|
||||||
- 🔗 Minimal third-party device integration
|
|
||||||
- 🐛 Early-stage error handling
|
|
||||||
|
|
||||||
## Contributing 🤝
|
### Prerequisites
|
||||||
|
1. Docker installed and running
|
||||||
|
2. NVIDIA GPU with CUDA support (optional)
|
||||||
|
3. At least 4GB RAM (8GB+ recommended for larger models)
|
||||||
|
|
||||||
1. Fork the repository
|
### Setup
|
||||||
2. Create a feature branch:
|
|
||||||
```bash
|
|
||||||
git checkout -b feature/your-feature
|
|
||||||
```
|
|
||||||
3. Make your changes
|
|
||||||
4. Run tests:
|
|
||||||
```bash
|
|
||||||
bun test
|
|
||||||
```
|
|
||||||
5. Submit a pull request
|
|
||||||
|
|
||||||
## Roadmap 🗺️
|
1. Enable speech features in your .env:
|
||||||
|
```bash
|
||||||
|
ENABLE_SPEECH_FEATURES=true
|
||||||
|
ENABLE_WAKE_WORD=true
|
||||||
|
ENABLE_SPEECH_TO_TEXT=true
|
||||||
|
WHISPER_MODEL_PATH=/models
|
||||||
|
WHISPER_MODEL_TYPE=base
|
||||||
|
```
|
||||||
|
|
||||||
- 🎤 Enhance voice command processing
|
2. Start the speech services:
|
||||||
- 🔌 Improve device compatibility
|
```bash
|
||||||
- 🤖 Expand automation capabilities
|
docker-compose up -d
|
||||||
- 🛡️ Implement more robust error handling
|
```
|
||||||
|
|
||||||
|
### Available Models
|
||||||
|
|
||||||
|
Choose a model based on your needs:
|
||||||
|
- `tiny.en`: Fastest, basic accuracy
|
||||||
|
- `base.en`: Good balance (recommended)
|
||||||
|
- `small.en`: Better accuracy, slower
|
||||||
|
- `medium.en`: High accuracy, resource intensive
|
||||||
|
- `large-v2`: Best accuracy, very resource intensive
|
||||||
|
|
||||||
|
### Usage
|
||||||
|
|
||||||
|
1. Wake word detection listens for:
|
||||||
|
- "hey jarvis"
|
||||||
|
- "ok google"
|
||||||
|
- "alexa"
|
||||||
|
|
||||||
|
2. After wake word detection:
|
||||||
|
- Audio is automatically captured
|
||||||
|
- Speech is transcribed
|
||||||
|
- Commands are processed
|
||||||
|
|
||||||
|
3. Manual transcription is also available:
|
||||||
|
```typescript
|
||||||
|
const speech = speechService.getSpeechToText();
|
||||||
|
const text = await speech.transcribe(audioBuffer);
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
See [Configuration Guide](docs/configuration.md) for detailed settings.
|
||||||
|
|
||||||
|
## API Documentation
|
||||||
|
|
||||||
|
See [API Documentation](docs/api/index.md) for available endpoints.
|
||||||
|
|
||||||
|
## Development
|
||||||
|
|
||||||
|
See [Development Guide](docs/development/index.md) for contribution guidelines.
|
||||||
|
|
||||||
## License 📄
|
## License 📄
|
||||||
|
|
||||||
|
|||||||
@@ -4,103 +4,267 @@ This document provides detailed information about configuring the Home Assistant
|
|||||||
|
|
||||||
## Configuration File Structure
|
## Configuration File Structure
|
||||||
|
|
||||||
The MCP Server uses a hierarchical configuration structure:
|
The MCP Server uses environment variables for configuration, with support for different environments (development, test, production):
|
||||||
|
|
||||||
```yaml
|
```bash
|
||||||
server:
|
# .env, .env.development, or .env.test
|
||||||
host: 0.0.0.0
|
PORT=4000
|
||||||
port: 8123
|
NODE_ENV=development
|
||||||
log_level: INFO
|
HASS_HOST=http://192.168.178.63:8123
|
||||||
|
HASS_TOKEN=your_token_here
|
||||||
security:
|
JWT_SECRET=your_secret_key
|
||||||
jwt_secret: YOUR_SECRET_KEY
|
|
||||||
allowed_origins:
|
|
||||||
- http://localhost:3000
|
|
||||||
- https://your-domain.com
|
|
||||||
|
|
||||||
devices:
|
|
||||||
scan_interval: 30
|
|
||||||
default_timeout: 10
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Server Settings
|
## Server Settings
|
||||||
|
|
||||||
### Basic Server Configuration
|
### Basic Server Configuration
|
||||||
- `host`: Server binding address (default: 0.0.0.0)
|
- `PORT`: Server port number (default: 4000)
|
||||||
- `port`: Server port number (default: 8123)
|
- `NODE_ENV`: Environment mode (development, production, test)
|
||||||
- `log_level`: Logging level (INFO, DEBUG, WARNING, ERROR)
|
- `HASS_HOST`: Home Assistant instance URL
|
||||||
|
- `HASS_TOKEN`: Home Assistant long-lived access token
|
||||||
|
|
||||||
### Security Settings
|
### Security Settings
|
||||||
- `jwt_secret`: Secret key for JWT token generation
|
- `JWT_SECRET`: Secret key for JWT token generation
|
||||||
- `allowed_origins`: CORS allowed origins list
|
- `RATE_LIMIT`: Rate limiting configuration
|
||||||
- `ssl_cert`: Path to SSL certificate (optional)
|
- `windowMs`: Time window in milliseconds (default: 15 minutes)
|
||||||
- `ssl_key`: Path to SSL private key (optional)
|
- `max`: Maximum requests per window (default: 100)
|
||||||
|
|
||||||
### Device Management
|
### WebSocket Settings
|
||||||
- `scan_interval`: Device state scan interval in seconds
|
- `SSE`: Server-Sent Events configuration
|
||||||
- `default_timeout`: Default device command timeout
|
- `MAX_CLIENTS`: Maximum concurrent clients (default: 1000)
|
||||||
- `retry_attempts`: Number of retry attempts for failed commands
|
- `PING_INTERVAL`: Keep-alive ping interval in ms (default: 30000)
|
||||||
|
|
||||||
|
### Speech Features (Optional)
|
||||||
|
- `ENABLE_SPEECH_FEATURES`: Enable speech processing features (default: false)
|
||||||
|
- `ENABLE_WAKE_WORD`: Enable wake word detection (default: false)
|
||||||
|
- `ENABLE_SPEECH_TO_TEXT`: Enable speech-to-text conversion (default: false)
|
||||||
|
- `WHISPER_MODEL_PATH`: Path to Whisper models directory (default: /models)
|
||||||
|
- `WHISPER_MODEL_TYPE`: Whisper model type (default: base)
|
||||||
|
- Available models: tiny.en, base.en, small.en, medium.en, large-v2
|
||||||
|
|
||||||
## Environment Variables
|
## Environment Variables
|
||||||
|
|
||||||
Environment variables override configuration file settings:
|
All configuration is managed through environment variables:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
MCP_HOST=0.0.0.0
|
# Server
|
||||||
MCP_PORT=8123
|
PORT=4000
|
||||||
MCP_LOG_LEVEL=INFO
|
NODE_ENV=development
|
||||||
MCP_JWT_SECRET=your-secret-key
|
|
||||||
|
# Home Assistant
|
||||||
|
HASS_HOST=http://your-hass-instance:8123
|
||||||
|
HASS_TOKEN=your_token_here
|
||||||
|
|
||||||
|
# Security
|
||||||
|
JWT_SECRET=your-secret-key
|
||||||
|
|
||||||
|
# Logging
|
||||||
|
LOG_LEVEL=info
|
||||||
|
LOG_DIR=logs
|
||||||
|
LOG_MAX_SIZE=20m
|
||||||
|
LOG_MAX_DAYS=14d
|
||||||
|
LOG_COMPRESS=true
|
||||||
|
LOG_REQUESTS=true
|
||||||
|
|
||||||
|
# Speech Features (Optional)
|
||||||
|
ENABLE_SPEECH_FEATURES=false
|
||||||
|
ENABLE_WAKE_WORD=false
|
||||||
|
ENABLE_SPEECH_TO_TEXT=false
|
||||||
|
WHISPER_MODEL_PATH=/models
|
||||||
|
WHISPER_MODEL_TYPE=base
|
||||||
```
|
```
|
||||||
|
|
||||||
## Advanced Configuration
|
## Advanced Configuration
|
||||||
|
|
||||||
### Rate Limiting
|
### Security Rate Limiting
|
||||||
```yaml
|
Rate limiting is enabled by default to protect against brute force attacks:
|
||||||
rate_limit:
|
|
||||||
enabled: true
|
|
||||||
requests_per_minute: 100
|
|
||||||
burst: 20
|
|
||||||
```
|
|
||||||
|
|
||||||
### Caching
|
```typescript
|
||||||
```yaml
|
RATE_LIMIT: {
|
||||||
cache:
|
windowMs: 15 * 60 * 1000, // 15 minutes
|
||||||
enabled: true
|
max: 100 // limit each IP to 100 requests per window
|
||||||
ttl: 300 # seconds
|
}
|
||||||
max_size: 1000 # entries
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Logging
|
### Logging
|
||||||
```yaml
|
The server uses Bun's built-in logging capabilities with additional configuration:
|
||||||
logging:
|
|
||||||
file: /var/log/mcp-server.log
|
```typescript
|
||||||
max_size: 10MB
|
LOGGING: {
|
||||||
backup_count: 5
|
LEVEL: "info", // debug, info, warn, error
|
||||||
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
|
DIR: "logs",
|
||||||
|
MAX_SIZE: "20m",
|
||||||
|
MAX_DAYS: "14d",
|
||||||
|
COMPRESS: true,
|
||||||
|
TIMESTAMP_FORMAT: "YYYY-MM-DD HH:mm:ss:ms",
|
||||||
|
LOG_REQUESTS: true
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Speech-to-Text Configuration
|
||||||
|
When speech features are enabled, you can configure the following options:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
SPEECH: {
|
||||||
|
ENABLED: false, // Master switch for all speech features
|
||||||
|
WAKE_WORD_ENABLED: false, // Enable wake word detection
|
||||||
|
SPEECH_TO_TEXT_ENABLED: false, // Enable speech-to-text
|
||||||
|
WHISPER_MODEL_PATH: "/models", // Path to Whisper models
|
||||||
|
WHISPER_MODEL_TYPE: "base", // Model type to use
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Available Whisper models:
|
||||||
|
- `tiny.en`: Fastest, lowest accuracy
|
||||||
|
- `base.en`: Good balance of speed and accuracy
|
||||||
|
- `small.en`: Better accuracy, slower
|
||||||
|
- `medium.en`: High accuracy, much slower
|
||||||
|
- `large-v2`: Best accuracy, very slow
|
||||||
|
|
||||||
|
For production deployments, we recommend using system tools like `logrotate` for log management.
|
||||||
|
|
||||||
|
Example logrotate configuration (`/etc/logrotate.d/mcp-server`):
|
||||||
|
```
|
||||||
|
/var/log/mcp-server.log {
|
||||||
|
daily
|
||||||
|
rotate 7
|
||||||
|
compress
|
||||||
|
delaycompress
|
||||||
|
missingok
|
||||||
|
notifempty
|
||||||
|
create 644 mcp mcp
|
||||||
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
## Best Practices
|
## Best Practices
|
||||||
|
|
||||||
1. Always use environment variables for sensitive information
|
1. Always use environment variables for sensitive information
|
||||||
2. Keep configuration files in a secure location
|
2. Keep .env files secure and never commit them to version control
|
||||||
3. Regularly backup your configuration
|
3. Use different environment files for development, test, and production
|
||||||
4. Use SSL in production environments
|
4. Enable SSL/TLS in production (preferably via reverse proxy)
|
||||||
5. Monitor log files for issues
|
5. Monitor log files for issues
|
||||||
|
6. Regularly rotate logs in production
|
||||||
|
7. Start with smaller Whisper models and upgrade if needed
|
||||||
|
8. Consider GPU acceleration for larger Whisper models
|
||||||
|
|
||||||
## Validation
|
## Validation
|
||||||
|
|
||||||
The server validates configuration on startup:
|
The server validates configuration on startup using Zod schemas:
|
||||||
- Required fields are checked
|
- Required fields are checked (e.g., HASS_TOKEN)
|
||||||
- Value types are verified
|
- Value types are verified
|
||||||
- Ranges are validated
|
- Enums are validated (e.g., LOG_LEVEL, WHISPER_MODEL_TYPE)
|
||||||
- Security settings are assessed
|
- Default values are applied when not specified
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
Common configuration issues:
|
Common configuration issues:
|
||||||
1. Permission denied accessing files
|
1. Missing required environment variables
|
||||||
2. Invalid YAML syntax
|
2. Invalid environment variable values
|
||||||
3. Missing required fields
|
3. Permission issues with log directories
|
||||||
4. Type mismatches in values
|
4. Rate limiting too restrictive
|
||||||
|
5. Speech model loading failures
|
||||||
|
6. Docker not available for speech features
|
||||||
|
7. Insufficient system resources for larger models
|
||||||
|
|
||||||
See the [Troubleshooting Guide](troubleshooting.md) for solutions.
|
See the [Troubleshooting Guide](troubleshooting.md) for solutions.
|
||||||
|
|
||||||
|
# Configuration Guide
|
||||||
|
|
||||||
|
This document describes all available configuration options for the Home Assistant MCP Server.
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
### Required Settings
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Server Configuration
|
||||||
|
PORT=3000 # Server port
|
||||||
|
HOST=localhost # Server host
|
||||||
|
|
||||||
|
# Home Assistant
|
||||||
|
HASS_URL=http://localhost:8123 # Home Assistant URL
|
||||||
|
HASS_TOKEN=your_token # Long-lived access token
|
||||||
|
|
||||||
|
# Security
|
||||||
|
JWT_SECRET=your_secret # JWT signing secret
|
||||||
|
```
|
||||||
|
|
||||||
|
### Optional Settings
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Rate Limiting
|
||||||
|
RATE_LIMIT_WINDOW=60000 # Time window in ms (default: 60000)
|
||||||
|
RATE_LIMIT_MAX=100 # Max requests per window (default: 100)
|
||||||
|
|
||||||
|
# Logging
|
||||||
|
LOG_LEVEL=info # debug, info, warn, error (default: info)
|
||||||
|
LOG_DIR=logs # Log directory (default: logs)
|
||||||
|
LOG_MAX_SIZE=10m # Max log file size (default: 10m)
|
||||||
|
LOG_MAX_FILES=5 # Max number of log files (default: 5)
|
||||||
|
|
||||||
|
# WebSocket/SSE
|
||||||
|
WS_HEARTBEAT=30000 # WebSocket heartbeat interval in ms (default: 30000)
|
||||||
|
SSE_RETRY=3000 # SSE retry interval in ms (default: 3000)
|
||||||
|
|
||||||
|
# Speech Features
|
||||||
|
ENABLE_SPEECH_FEATURES=false # Enable speech processing (default: false)
|
||||||
|
ENABLE_WAKE_WORD=false # Enable wake word detection (default: false)
|
||||||
|
ENABLE_SPEECH_TO_TEXT=false # Enable speech-to-text (default: false)
|
||||||
|
|
||||||
|
# Speech Model Configuration
|
||||||
|
WHISPER_MODEL_PATH=/models # Path to whisper models (default: /models)
|
||||||
|
WHISPER_MODEL_TYPE=base # Model type: tiny|base|small|medium|large-v2 (default: base)
|
||||||
|
WHISPER_LANGUAGE=en # Primary language (default: en)
|
||||||
|
WHISPER_TASK=transcribe # Task type: transcribe|translate (default: transcribe)
|
||||||
|
WHISPER_DEVICE=cuda # Processing device: cpu|cuda (default: cuda if available, else cpu)
|
||||||
|
|
||||||
|
# Wake Word Configuration
|
||||||
|
WAKE_WORDS=hey jarvis,ok google,alexa # Comma-separated wake words (default: hey jarvis)
|
||||||
|
WAKE_WORD_SENSITIVITY=0.5 # Detection sensitivity 0-1 (default: 0.5)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Speech Features
|
||||||
|
|
||||||
|
### Model Selection
|
||||||
|
|
||||||
|
Choose a model based on your needs:
|
||||||
|
|
||||||
|
| Model | Size | Memory Required | Speed | Accuracy |
|
||||||
|
|------------|-------|-----------------|-------|----------|
|
||||||
|
| tiny.en | 75MB | 1GB | Fast | Basic |
|
||||||
|
| base.en | 150MB | 2GB | Good | Good |
|
||||||
|
| small.en | 500MB | 4GB | Med | Better |
|
||||||
|
| medium.en | 1.5GB | 8GB | Slow | High |
|
||||||
|
| large-v2 | 3GB | 16GB | Slow | Best |
|
||||||
|
|
||||||
|
### GPU Acceleration
|
||||||
|
|
||||||
|
When `WHISPER_DEVICE=cuda`:
|
||||||
|
- NVIDIA GPU with CUDA support required
|
||||||
|
- Significantly faster processing
|
||||||
|
- Higher memory requirements
|
||||||
|
|
||||||
|
### Wake Word Detection
|
||||||
|
|
||||||
|
- Multiple wake words supported via comma-separated list
|
||||||
|
- Adjustable sensitivity (0-1):
|
||||||
|
- Lower values: Fewer false positives, may miss some triggers
|
||||||
|
- Higher values: More responsive, may have false triggers
|
||||||
|
- Default (0.5): Balanced detection
|
||||||
|
|
||||||
|
### Best Practices
|
||||||
|
|
||||||
|
1. Model Selection:
|
||||||
|
- Start with `base.en` model
|
||||||
|
- Upgrade if better accuracy needed
|
||||||
|
- Downgrade if performance issues
|
||||||
|
|
||||||
|
2. Resource Management:
|
||||||
|
- Monitor memory usage
|
||||||
|
- Use GPU acceleration when available
|
||||||
|
- Consider model size vs available resources
|
||||||
|
|
||||||
|
3. Wake Word Configuration:
|
||||||
|
- Use distinct wake words
|
||||||
|
- Adjust sensitivity based on environment
|
||||||
|
- Limit number of wake words for better performance
|
||||||
212
docs/features/speech.md
Normal file
212
docs/features/speech.md
Normal file
@@ -0,0 +1,212 @@
|
|||||||
|
# Speech Features
|
||||||
|
|
||||||
|
The Home Assistant MCP Server includes powerful speech processing capabilities powered by fast-whisper and custom wake word detection. This guide explains how to set up and use these features effectively.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The speech processing system consists of two main components:
|
||||||
|
1. Wake Word Detection - Listens for specific trigger phrases
|
||||||
|
2. Speech-to-Text - Transcribes spoken commands using fast-whisper
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
1. Docker environment:
|
||||||
|
```bash
|
||||||
|
docker --version # Should be 20.10.0 or higher
|
||||||
|
```
|
||||||
|
|
||||||
|
2. For GPU acceleration:
|
||||||
|
- NVIDIA GPU with CUDA support
|
||||||
|
- NVIDIA Container Toolkit installed
|
||||||
|
- NVIDIA drivers 450.80.02 or higher
|
||||||
|
|
||||||
|
### Installation
|
||||||
|
|
||||||
|
1. Enable speech features in your `.env`:
|
||||||
|
```bash
|
||||||
|
ENABLE_SPEECH_FEATURES=true
|
||||||
|
ENABLE_WAKE_WORD=true
|
||||||
|
ENABLE_SPEECH_TO_TEXT=true
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Configure model settings:
|
||||||
|
```bash
|
||||||
|
WHISPER_MODEL_PATH=/models
|
||||||
|
WHISPER_MODEL_TYPE=base
|
||||||
|
WHISPER_LANGUAGE=en
|
||||||
|
WHISPER_TASK=transcribe
|
||||||
|
WHISPER_DEVICE=cuda # or cpu
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Start the services:
|
||||||
|
```bash
|
||||||
|
docker-compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Wake Word Detection
|
||||||
|
|
||||||
|
The wake word detector continuously listens for configured trigger phrases. Default wake words:
|
||||||
|
- "hey jarvis"
|
||||||
|
- "ok google"
|
||||||
|
- "alexa"
|
||||||
|
|
||||||
|
Custom wake words can be configured:
|
||||||
|
```bash
|
||||||
|
WAKE_WORDS=computer,jarvis,assistant
|
||||||
|
```
|
||||||
|
|
||||||
|
When a wake word is detected:
|
||||||
|
1. The system starts recording audio
|
||||||
|
2. Audio is processed through the speech-to-text pipeline
|
||||||
|
3. The resulting command is processed by the server
|
||||||
|
|
||||||
|
### Speech-to-Text
|
||||||
|
|
||||||
|
#### Automatic Transcription
|
||||||
|
|
||||||
|
After wake word detection:
|
||||||
|
1. Audio is automatically captured (default: 5 seconds)
|
||||||
|
2. The audio is transcribed using the configured whisper model
|
||||||
|
3. The transcribed text is processed as a command
|
||||||
|
|
||||||
|
#### Manual Transcription
|
||||||
|
|
||||||
|
You can also manually transcribe audio using the API:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Using the TypeScript client
|
||||||
|
import { SpeechService } from '@ha-mcp/client';
|
||||||
|
|
||||||
|
const speech = new SpeechService();
|
||||||
|
|
||||||
|
// Transcribe from audio buffer
|
||||||
|
const buffer = await getAudioBuffer();
|
||||||
|
const text = await speech.transcribe(buffer);
|
||||||
|
|
||||||
|
// Transcribe from file
|
||||||
|
const text = await speech.transcribeFile('command.wav');
|
||||||
|
```
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
// Using the REST API
|
||||||
|
POST /api/speech/transcribe
|
||||||
|
Content-Type: multipart/form-data
|
||||||
|
|
||||||
|
file: <audio file>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Event Handling
|
||||||
|
|
||||||
|
The system emits various events during speech processing:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
speech.on('wakeWord', (word: string) => {
|
||||||
|
console.log(`Wake word detected: ${word}`);
|
||||||
|
});
|
||||||
|
|
||||||
|
speech.on('listening', () => {
|
||||||
|
console.log('Listening for command...');
|
||||||
|
});
|
||||||
|
|
||||||
|
speech.on('transcribing', () => {
|
||||||
|
console.log('Processing speech...');
|
||||||
|
});
|
||||||
|
|
||||||
|
speech.on('transcribed', (text: string) => {
|
||||||
|
console.log(`Transcribed text: ${text}`);
|
||||||
|
});
|
||||||
|
|
||||||
|
speech.on('error', (error: Error) => {
|
||||||
|
console.error('Speech processing error:', error);
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
## Performance Optimization
|
||||||
|
|
||||||
|
### Model Selection
|
||||||
|
|
||||||
|
Choose an appropriate model based on your needs:
|
||||||
|
|
||||||
|
1. Resource-constrained environments:
|
||||||
|
- Use `tiny.en` or `base.en`
|
||||||
|
- Run on CPU if GPU unavailable
|
||||||
|
- Limit concurrent processing
|
||||||
|
|
||||||
|
2. High-accuracy requirements:
|
||||||
|
- Use `small.en` or `medium.en`
|
||||||
|
- Enable GPU acceleration
|
||||||
|
- Increase audio quality
|
||||||
|
|
||||||
|
3. Production environments:
|
||||||
|
- Use `base.en` or `small.en`
|
||||||
|
- Enable GPU acceleration
|
||||||
|
- Configure appropriate timeouts
|
||||||
|
|
||||||
|
### GPU Acceleration
|
||||||
|
|
||||||
|
When using GPU acceleration:
|
||||||
|
|
||||||
|
1. Monitor GPU memory usage:
|
||||||
|
```bash
|
||||||
|
nvidia-smi -l 1
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Adjust model size if needed:
|
||||||
|
```bash
|
||||||
|
WHISPER_MODEL_TYPE=small # Decrease if GPU memory limited
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Configure processing device:
|
||||||
|
```bash
|
||||||
|
WHISPER_DEVICE=cuda # Use GPU
|
||||||
|
WHISPER_DEVICE=cpu # Use CPU if GPU unavailable
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Common Issues
|
||||||
|
|
||||||
|
1. Wake word detection not working:
|
||||||
|
- Check microphone permissions
|
||||||
|
- Adjust `WAKE_WORD_SENSITIVITY`
|
||||||
|
- Verify wake words configuration
|
||||||
|
|
||||||
|
2. Poor transcription quality:
|
||||||
|
- Check audio input quality
|
||||||
|
- Try a larger model
|
||||||
|
- Verify language settings
|
||||||
|
|
||||||
|
3. Performance issues:
|
||||||
|
- Monitor resource usage
|
||||||
|
- Consider smaller model
|
||||||
|
- Check GPU acceleration status
|
||||||
|
|
||||||
|
### Logging
|
||||||
|
|
||||||
|
Enable debug logging for detailed information:
|
||||||
|
```bash
|
||||||
|
LOG_LEVEL=debug
|
||||||
|
```
|
||||||
|
|
||||||
|
Speech-specific logs will be tagged with `[SPEECH]` prefix.
|
||||||
|
|
||||||
|
## Security Considerations
|
||||||
|
|
||||||
|
1. Audio Privacy:
|
||||||
|
- Audio is processed locally
|
||||||
|
- No data sent to external services
|
||||||
|
- Temporary files automatically cleaned
|
||||||
|
|
||||||
|
2. Access Control:
|
||||||
|
- Speech endpoints require authentication
|
||||||
|
- Rate limiting applies to transcription
|
||||||
|
- Configurable command restrictions
|
||||||
|
|
||||||
|
3. Resource Protection:
|
||||||
|
- Timeouts prevent hanging
|
||||||
|
- Memory limits enforced
|
||||||
|
- Graceful error handling
|
||||||
Reference in New Issue
Block a user