Compare commits

...

7 Commits

Author SHA1 Message Date
jango-blockchained
3a6f79c9a8 feat(speech): enhance speech configuration and example integration
- Add comprehensive speech configuration in .env.example and app config
- Update Docker speech Dockerfile for more flexible model handling
- Create detailed README for speech-to-text examples
- Implement example script demonstrating speech features
- Improve speech service initialization and configuration management
2025-02-04 19:35:50 +01:00
jango-blockchained
60f18f8e71 feat(speech): add speech-to-text and wake word detection modules
- Implement SpeechToText class with Docker-based transcription capabilities
- Add wake word detection using OpenWakeWord and fast-whisper models
- Create Dockerfile for speech processing container
- Develop comprehensive test suite for speech recognition functionality
- Include audio processing and event-driven transcription features
2025-02-04 19:08:01 +01:00
jango-blockchained
47f11b3d95 docs: update README.md 2025-02-04 18:35:17 +01:00
jango-blockchained
f24be8ff53 docs: remove outdated support and rate limiting sections from README 2025-02-04 18:31:39 +01:00
jango-blockchained
dfff432321 docs: update Jekyll configuration for GitHub Pages and dependencies
- Add repository settings and GitHub metadata plugin
- Update baseurl to match repository name
- Include additional Jekyll and Ruby dependencies in Gemfile
- Simplify configuration by removing redundant sections
2025-02-04 18:00:59 +01:00
jango-blockchained
d59bf02d08 docs: enhance Jekyll configuration with comprehensive site settings
- Add base URL and site configuration for GitHub Pages
- Include remote theme and additional Jekyll plugins
- Configure navigation structure and page layouts
- Set up collections for tools and development sections
- Optimize Gemfile with additional dependencies
2025-02-04 17:58:10 +01:00
jango-blockchained
345a5888d9 docs: update Jekyll configuration for enhanced documentation structure
- Add new documentation pages to header navigation
- Configure collections for tools and development sections
- Set default layouts for pages, tools, and development collections
- Improve documentation site organization and navigation
2025-02-04 17:51:04 +01:00
17 changed files with 1295 additions and 269 deletions

View File

@@ -102,3 +102,10 @@ TEST_HASS_HOST=http://localhost:8123
TEST_HASS_TOKEN=test_token
TEST_HASS_SOCKET_URL=ws://localhost:8123/api/websocket
TEST_PORT=3001
# Speech Features Configuration
ENABLE_SPEECH_FEATURES=false
ENABLE_WAKE_WORD=true
ENABLE_SPEECH_TO_TEXT=true
WHISPER_MODEL_PATH=/models
WHISPER_MODEL_TYPE=base

479
README.md
View File

@@ -1,321 +1,288 @@
# 🚀 Model Context Protocol (MCP) Server for Home Assistant
# 🚀 MCP Server for Home Assistant - Bringing AI-Powered Smart Homes to Life!
The **Model Context Protocol (MCP) Server** is a robust, secure, and high-performance bridge that integrates Home Assistant with Language Learning Models (LLMs), enabling natural language control and real-time monitoring of your smart home devices. Unlock advanced automation, control, and analytics for your Home Assistant ecosystem.
![License](https://img.shields.io/badge/license-MIT-blue.svg)
![Bun](https://img.shields.io/badge/bun-%3E%3D1.0.26-black)
![TypeScript](https://img.shields.io/badge/typescript-%5E5.0.0-blue.svg)
![Test Coverage](https://img.shields.io/badge/coverage-95%25-brightgreen.svg)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![Bun](https://img.shields.io/badge/bun-%3E%3D1.0.26-black)](https://bun.sh)
[![TypeScript](https://img.shields.io/badge/typescript-%5E5.0.0-blue.svg)](https://www.typescriptlang.org)
[![Test Coverage](https://img.shields.io/badge/coverage-95%25-brightgreen.svg)](#)
[![Documentation](https://img.shields.io/badge/docs-github.io-blue.svg)](https://jango-blockchained.github.io/homeassistant-mcp/)
![Docker](https://img.shields.io/badge/docker-%3E%3D20.10.8-blue)
## 🌟 Key Benefits
### 🎮 Device Control & Monitoring
- **Voice-like Control:** "Dim living room lights to 50%" 🌇
- **Real-time Updates:** WebSocket/SSE with <100ms latency
- **Cross-Device Automation:** Create scene-based rules 🎭
### 🤖 AI-Powered Features
- Natural language processing for commands
- Predictive automation suggestions
- Anomaly detection in device behavior
## 🏗 Architecture Overview
```mermaid
graph TD
A[User Interface] --> B{MCP Server}
B --> C[Home Assistant]
B --> D[LLM Integration]
B --> E[Cache Layer]
E --> F[Redis]
B --> G[Security Middleware]
C --> H[Smart Devices]
```
## 🛠 Installation
### 🐳 Docker Setup (Recommended)
```bash
# 1. Clone repo with caching
git clone --depth 1 https://github.com/jango-blockchained/homeassistant-mcp.git
# 2. Configure environment
cp .env.example .env # Edit with your HA details 🔧
# 3. Start with compose
docker compose up -d --build # Auto-scaling enabled 📈
# View real-time logs 📜
docker compose logs -f --tail=50
```
### 📦 Bare Metal Installation
```bash
# Install Bun (if missing)
curl -fsSL https://bun.sh/install | bash # 🐇 Fast runtime
# Install dependencies with cache
bun install --frozen-lockfile # ♻️ Reliable dep tree
# Start in dev mode with hot-reload 🔥
bun run dev --watch
```
## 🚦 Rate Limiting Tiers
| Plan | Requests/min | Features | Cache TTL |
|---------------|--------------|------------------------|-------------|
| Free | 100 | Basic controls | 5min |
| Pro | 1,000 | Priority queue | 1hr |
| Enterprise | 10,000 | Dedicated cache | Custom |
## 💡 Example Usage
```javascript
// Real-time device monitoring 🌐
const ws = new WebSocket('wss://mcp.yourha.com/ws');
ws.onmessage = ({ data }) => {
const update = JSON.parse(data);
if(update.entity_id === 'light.kitchen') {
smartBulb(update.state); // 🎛️ Update UI
}
};
```
## 🔄 Update Strategy
```bash
# Zero-downtime updates 🕒
docker compose pull
docker compose up -d --build
docker system prune # Clean old images 🧹
```
## 🛡 Security Features
- JWT authentication with refresh tokens 🔑
- Automatic request sanitization 🧼
- IP-based rate limiting with fail2ban integration 🚫
- End-to-end encryption support 🔒
## 🌍 Community & Support
| Platform | Link | Response Time |
|----------------|-------------------------------|---------------|
| 📚 Docs | [API Reference](docs/api.md) | Instant |
| 💬 Discord | [Join Chat](#) | <1hr |
| 🐛 GitHub | [Issues](#) | <24hr |
| 🐦 Twitter | [@HomeMCP](#) | <2hr |
## 🚧 Troubleshooting Guide
```bash
# Check service health 🩺
docker compose ps
# Test API endpoints 🔌
curl -I http://localhost:3000/healthcheck # Should return 200 ✅
# Inspect cache status 💾
docker exec mcp_redis redis-cli info memory
```
## 🔮 Roadmap Highlights
- [ ] **AI Assistant Integration** (Q4 2024) 🤖
- [ ] **Predictive Automation** (Q1 2025) 🔮
- [x] **Real-time Analytics** (Shipped! 🚀)
- [ ] **Energy Optimization** (Q3 2024) 🌱
## 🤝 Contributing
We love community input! Here's how to help:
1. 🍴 Fork the repository
2. 🌿 Create a feature branch
3. 💻 Make your changes
4. 🧪 Run tests: `bun test --coverage`
5. 📦 Commit using [Conventional Commits](https://www.conventionalcommits.org)
6. 🔀 Open a Pull Request
**Pro Tip:** Check our [Good First Issues](https://github.com/jango-blockchained/homeassistant-mcp/contribute) for starter tasks! 🎯
[![Docker](https://img.shields.io/badge/docker-%3E%3D20.10.8-blue)](https://www.docker.com)
---
**📢 Note:** This project adheres to [Semantic Versioning](https://semver.org). Always check breaking changes in release notes before upgrading!
## Overview 🌐
## Table of Contents
Welcome to the **Model Context Protocol (MCP) Server for Home Assistant**! This robust platform bridges Home Assistant with cutting-edge Language Learning Models (LLMs), enabling natural language interactions and real-time automation of your smart devices. Imagine entering your home, saying:
- [Overview](#overview)
- [Key Features](#key-features)
- [Architecture & Design](#architecture--design)
- [Installation](#installation)
- [Basic Setup](#basic-setup)
- [Docker Setup (Recommended)](#docker-setup-recommended)
- [Usage](#usage)
- [API & Documentation](#api--documentation)
- [Development](#development)
- [Roadmap & Future Plans](#roadmap--future-plans)
- [Community & Support](#community--support)
- [Contributing](#contributing)
- [Troubleshooting & FAQ](#troubleshooting--faq)
- [License](#license)
> “Hey MCP, dim the lights and start my evening playlist,”
## Overview
and watching your home transform instantly—that's the magic that MCP Server delivers!
The MCP Server bridges Home Assistant with advanced LLM integrations to deliver intuitive control, automation, and state monitoring. Leveraging a high-performance runtime and real-time communication protocols, MCP offers a seamless experience for managing your smart home.
---
## Key Features
## Key Benefits ✨
### Device Control & Monitoring
- **Smart Device Control:** Manage lights, climate, covers, switches, sensors, media players, fans, locks, vacuums, and cameras using natural language commands.
- **Real-time Updates:** Receive instant notifications and updates via Server-Sent Events (SSE).
### 🎮 Device Control & Monitoring
- **Voice-Controlled Automation:**
Use simple commands like "Turn on the kitchen lights" or "Set the thermostat to 22°C" without touching a switch.
**Real-World Example:**
In the morning, say "Good morning! Open the blinds and start the coffee machine" to kickstart your day automatically.
### System & Automation Management
- **Automation Engine:** Create, modify, and trigger custom automation rules with ease.
- **Add-on & Package Management:** Integrates with HACS for deploying custom integrations, themes, scripts, and applications.
- **Robust System Management:** Features advanced state monitoring, error handling, and security safeguards.
- **Real-Time Communication:**
Experience sub-100ms latency updates via Server-Sent Events (SSE) or WebSocket connections, ensuring your dashboard is always current.
**Real-World Example:**
Monitor energy usage instantly during peak hours and adjust remotely for efficient consumption.
## Architecture & Design
- **Seamless Automation:**
Create scene-based rules to synchronize multiple devices effortlessly.
**Real-World Example:**
For movie nights, have MCP dim the lights, adjust the sound system, and launch your favorite streaming app with just one command.
The MCP Server is built with scalability, resilience, and security in mind:
### 🤖 AI-Powered Enhancements
- **Natural Language Processing (NLP):**
Convert everyday speech into actionable commands—just say, "Prepare the house for dinner," and MCP will adjust lighting, temperature, and even play soft background music.
- **High-Performance Runtime:** Powered by Bun for fast startup, efficient memory utilization, and native TypeScript support.
- **Real-time Communication:** Employs Server-Sent Events (SSE) for continuous, real-time data updates.
- **Modular & Extensible:** Designed to support plugins, add-ons, and custom automation scripts, allowing for easy expansion.
- **Secure API Integration:** Implements token-based authentication, rate limiting, and adherence to best security practices.
- **Predictive Automation & Suggestions:**
Receive proactive recommendations based on usage habits and environmental trends.
**Real-World Example:**
When home temperature fluctuates unexpectedly, MCP suggests an optimal setting and notifies you immediately.
For a deeper dive into the system architecture, please refer to our [Architecture Documentation](docs/architecture.md).
- **Anomaly Detection:**
Continuously monitor device activity and alert you to unusual behavior, helping prevent malfunctions or potential security breaches.
## Usage
---
Once the server is running, open your browser at [http://localhost:3000](http://localhost:3000). For real-time device updates, integrate the SSE endpoint in your application:
## Architectural Overview 🏗
Our architecture is engineered for performance, scalability, and security. The following Mermaid diagram illustrates the data flow and component interactions:
```mermaid
graph TD
subgraph Client
A[Client Application<br/>(Web / Mobile / Voice)]
end
subgraph CDN
B[CDN / Cache]
end
subgraph Server
C[Bun Native Server]
E[NLP Engine<br/>& Language Processing Module]
end
subgraph Integration
D[Home Assistant<br/>(Devices, Lights, Thermostats)]
end
A -->|HTTP Request| B
B -- Cache Miss --> C
C -->|Interpret Command| E
E -->|Determine Action| D
D -->|Return State/Action| C
C -->|Response| B
B -->|Cached/Processed Response| A
```
Learn more about our architecture in the [Architecture Documentation](docs/architecture.md).
---
## Technical Stack 🔧
Our solution is built on a modern, high-performance stack that powers every feature:
- **Bun:**
A next-generation JavaScript runtime offering rapid startup times, native TypeScript support, and high performance.
👉 [Learn about Bun](https://bun.sh)
- **Bun Native Server:**
Utilizes Bun's built-in HTTP server to efficiently process API requests with sub-100ms response times.
👉 See the [Installation Guide](docs/getting-started/installation.md) for details.
- **Natural Language Processing (NLP) & LLM Integration:**
Processes and interprets natural language commands using state-of-the-art LLMs and custom NLP modules.
👉 Find API usage details in the [API Documentation](docs/api.md).
- **Home Assistant Integration:**
Provides seamless connectivity with Home Assistant, ensuring flawless communication with your smart devices.
👉 Refer to the [Usage Guide](docs/usage.md) for more information.
- **Redis Cache:**
Enables rapid data retrieval and session persistence essential for real-time updates.
- **TypeScript:**
Enhances type safety and developer productivity across the entire codebase.
- **JWT & Security Middleware:**
Protects your ecosystem with JWT-based authentication, request sanitization, rate-limiting, and encryption.
- **Containerization with Docker:**
Enables scalable, isolated deployments for production environments.
For further technical details, check out our [Documentation Index](docs/index.md).
---
## Installation 🛠
### 🐳 Docker Setup (Recommended)
For a hassle-free, containerized deployment:
```bash
# 1. Clone the repository (using a shallow copy for efficiency)
git clone --depth 1 https://github.com/jango-blockchained/homeassistant-mcp.git
# 2. Configure your environment: copy the example file and edit it with your Home Assistant credentials
cp .env.example .env # Modify .env with your Home Assistant host, tokens, etc.
# 3. Build and run the Docker containers
docker compose up -d --build
# 4. View real-time logs (last 50 log entries)
docker compose logs -f --tail=50
```
👉 Refer to our [Installation Guide](docs/getting-started/installation.md) for full details.
### 💻 Bare Metal Installation
For direct deployment on your host machine:
```bash
# 1. Install Bun (if not already installed)
curl -fsSL https://bun.sh/install | bash
# 2. Install project dependencies with caching support
bun install --frozen-lockfile
# 3. Launch the server in development mode with hot-reload enabled
bun run dev --watch
```
---
## Real-World Usage Examples 🔍
### 📱 Smart Home Dashboard Integration
Integrate MCP's real-time updates into your custom dashboard for a dynamic smart home experience:
```javascript
const eventSource = new EventSource('http://localhost:3000/subscribe_events?token=YOUR_TOKEN&domain=light');
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log('Update received:', data);
console.log('Real-time update:', data);
// Update your UI dashboard, e.g., refresh a light intensity indicator.
};
```
## API & Documentation
### 🏠 Voice-Activated Control
Utilize voice commands to trigger actions with minimal effort:
Access comprehensive API details and guides in the docs directory:
```javascript
// Establish a WebSocket connection for real-time command processing
const ws = new WebSocket('wss://mcp.yourha.com/ws');
- **API Reference:** [API Documentation](docs/api.md)
- **SSE Documentation:** [SSE API](docs/sse-api.md)
- **Troubleshooting Guide:** [Troubleshooting](docs/troubleshooting.md)
- **Architecture Details:** [Architecture Documentation](docs/architecture.md)
ws.onmessage = ({ data }) => {
const update = JSON.parse(data);
if (update.entity_id === 'light.living_room') {
console.log('Adjusting living room lighting based on voice command...');
// Additional logic to update your UI or trigger further actions can go here.
}
};
## Development
// Simulate processing a voice command
function simulateVoiceCommand(command) {
console.log("Processing voice command:", command);
// Integrate with your actual voice-to-text system as needed.
}
### Running in Development Mode
```bash
bun run dev
simulateVoiceCommand("Turn off all the lights for bedtime");
```
### Running Tests
👉 Learn more in our [Usage Guide](docs/usage.md).
- Execute all tests:
```bash
bun test
```
---
- Run tests with coverage:
```bash
bun test --coverage
```
## Update Strategy 🔄
### Production Build & Start
Maintain a seamless operation with zero downtime updates:
```bash
bun run build
bun start
# 1. Pull the latest Docker images
docker compose pull
# 2. Rebuild and restart containers smoothly
docker compose up -d --build
# 3. Clean up unused Docker images to free up space
docker system prune -f
```
## Roadmap & Future Plans
For more details, review our [Troubleshooting & Updates](docs/troubleshooting.md).
The MCP Server is under active development and improvement. Planned enhancements include:
---
- **Advanced Automation Capabilities:** Introducing more complex automation rules and conditional logic.
- **Enhanced Security Features:** Additional authentication layers, encryption enhancements, and security monitoring tools.
- **User Interface Improvements:** Development of a more intuitive web dashboard for easier device management.
- **Expanded Integrations:** Support for a wider array of smart home devices and third-party services.
- **Performance Optimizations:** Continued efforts to reduce latency and improve resource efficiency.
## Security Features 🔐
For additional details, check out our [Roadmap](docs/roadmap.md).
We prioritize the security of your smart home with multiple layers of defense:
- **JWT Authentication 🔑:** Secure, token-based API access to prevent unauthorized usage.
- **Request Sanitization 🧼:** Automatic filtering and validation of API requests to combat injection attacks.
- **Rate Limiting & Fail2Ban 🚫:** Monitors requests to prevent brute force and DDoS attacks.
- **End-to-End Encryption 🔒:** Ensures that your commands and data remain private during transmission.
## Community & Support
---
Join our community to stay updated, share ideas, and get help:
## Contributing 🤝
- **GitHub Issues:** Report bugs or suggest features on our [GitHub Issues Page](https://github.com/jango-blockchained/homeassistant-mcp/issues).
- **Discussion Forums:** Connect with other users and contributors in our community forums.
- **Chat Platforms:** Join our real-time discussions on [Discord](#) or [Slack](#).
## Contributing
We welcome your contributions! To get started:
1. Fork the repository.
2. Create your feature branch:
We value community contributions! Here's how you can help improve MCP Server:
1. **Fork the Repository 🍴**
Create your own copy of the project.
2. **Create a Feature Branch 🌿**
```bash
git checkout -b feature/your-feature-name
```
3. Install dependencies:
3. **Install Dependencies & Run Tests 🧪**
```bash
bun install
bun test --coverage
```
4. Make your changes and run tests:
```bash
bun test
```
5. Commit and push your changes, then open a Pull Request.
4. **Make Your Changes & Commit 📝**
Follow the [Conventional Commits](https://www.conventionalcommits.org) guidelines.
5. **Open a Pull Request 🔀**
Submit your changes for review.
For detailed guidelines, see [Contributing Guide](docs/contributing.md).
Read more in our [Contribution Guidelines](docs/contributing.md).
## Troubleshooting & FAQ
---
### Common Issues
## Roadmap & Future Enhancements 🔮
- **Connection Problems:** Ensure that your `HASS_HOST`, authentication token, and WebSocket URL are correctly configured.
- **Docker Deployment:** Confirm that Docker is running and that your `.env` file contains the correct settings.
- **Automation Errors:** Verify entity availability and review your automation configurations for potential issues.
We're continuously evolving MCP Server. Upcoming features include:
- **AI Assistant Integration (Q4 2024):**
Smarter, context-aware voice commands and personalized automation.
- **Predictive Automation (Q1 2025):**
Enhanced scheduling capabilities powered by advanced AI.
- **Enhanced Security (Q2 2024):**
Introduction of multi-factor authentication, advanced monitoring, and rigorous encryption methods.
- **Performance Optimizations (Q3 2024):**
Reducing latency further, optimizing caching, and improving load balancing.
For more troubleshooting details, refer to [Troubleshooting Guide](docs/troubleshooting.md).
For more details, see our [Roadmap](docs/roadmap.md).
### Frequently Asked Questions
---
**Q: What platforms does MCP Server support?**
## Community & Support 🌍
A: MCP Server runs on Linux, macOS, and Windows (Docker is recommended for Windows environments).
Your feedback and collaboration are vital! Join our community:
- **GitHub Issues:** Report bugs or request features via our [Issues Page](https://github.com/jango-blockchained/homeassistant-mcp/issues).
- **Discord & Slack:** Connect with fellow users and developers in real-time.
- **Documentation:** Find comprehensive guides on the [MCP Documentation Website](https://jango-blockchained.github.io/homeassistant-mcp/).
**Q: How do I report a bug or request a feature?**
---
A: Please use our [GitHub Issues Page](https://github.com/jango-blockchained/homeassistant-mcp/issues) to report bugs or request new features.
## License 📜
**Q: Can I contribute to the project?**
This project is licensed under the MIT License. See [LICENSE](LICENSE) for full details.
A: Absolutely! We welcome contributions from the community. See the [Contributing](#contributing) section for more details.
---
## License
This project is licensed under the MIT License. See [LICENSE](LICENSE) for the full license text.
## Documentation
Full documentation is available at: [https://jango-blockchained.github.io/homeassistant-mcp/](https://jango-blockchained.github.io/homeassistant-mcp/)
## Quick Start
## Installation
## Usage
🔋 Batteries included.

41
docker/speech/Dockerfile Normal file
View File

@@ -0,0 +1,41 @@
FROM python:3.10-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
git \
build-essential \
portaudio19-dev \
python3-pyaudio \
&& rm -rf /var/lib/apt/lists/*
# Install fast-whisper and its dependencies
RUN pip install --no-cache-dir torch torchaudio --index-url https://download.pytorch.org/whl/cpu
RUN pip install --no-cache-dir faster-whisper
# Install wake word detection
RUN pip install --no-cache-dir openwakeword pyaudio sounddevice
# Create directories
RUN mkdir -p /models /audio
# Download the base model by default
# The model will be downloaded automatically when first used
ENV ASR_MODEL=base.en
ENV ASR_MODEL_PATH=/models
# Create wake word model directory
# Models will be downloaded automatically when first used
RUN mkdir -p /models/wake_word
WORKDIR /app
# Copy the wake word detection script
COPY wake_word_detector.py .
# Set environment variables
ENV WHISPER_MODEL_PATH=/models
ENV WAKEWORD_MODEL_PATH=/models/wake_word
ENV PYTHONUNBUFFERED=1
# Run the wake word detection service
CMD ["python", "wake_word_detector.py"]

View File

@@ -0,0 +1,173 @@
import os
import json
import queue
import threading
import numpy as np
import sounddevice as sd
from openwakeword import Model
from datetime import datetime
import wave
from faster_whisper import WhisperModel
# Configuration
SAMPLE_RATE = 16000
CHANNELS = 1
CHUNK_SIZE = 1024
BUFFER_DURATION = 30 # seconds to keep in buffer
DETECTION_THRESHOLD = 0.5
# Wake word models to use
WAKE_WORDS = ["hey_jarvis", "ok_google", "alexa"]
# Initialize the ASR model
asr_model = WhisperModel(
model_size_or_path=os.environ.get('ASR_MODEL', 'base.en'),
device="cpu",
compute_type="int8",
download_root=os.environ.get('ASR_MODEL_PATH', '/models')
)
class AudioProcessor:
def __init__(self):
# Initialize wake word detection model
self.wake_word_model = Model(
custom_model_paths=None, # Use default models
inference_framework="onnx" # Use ONNX for better performance
)
# Pre-load the wake word models
for wake_word in WAKE_WORDS:
self.wake_word_model.add_model(wake_word)
self.audio_buffer = queue.Queue()
self.recording = False
self.buffer = np.zeros(SAMPLE_RATE * BUFFER_DURATION)
self.buffer_lock = threading.Lock()
def audio_callback(self, indata, frames, time, status):
"""Callback for audio input"""
if status:
print(f"Audio callback status: {status}")
# Convert to mono if necessary
if CHANNELS > 1:
audio_data = np.mean(indata, axis=1)
else:
audio_data = indata.flatten()
# Update circular buffer
with self.buffer_lock:
self.buffer = np.roll(self.buffer, -len(audio_data))
self.buffer[-len(audio_data):] = audio_data
# Process for wake word detection
prediction = self.wake_word_model.predict(audio_data)
# Check if wake word detected
for wake_word in WAKE_WORDS:
if prediction[wake_word] > DETECTION_THRESHOLD:
print(f"Wake word detected: {wake_word} (confidence: {prediction[wake_word]:.2f})")
self.save_audio_segment(wake_word)
break
def save_audio_segment(self, wake_word):
"""Save the audio buffer when wake word is detected"""
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"/audio/wake_word_{wake_word}_{timestamp}.wav"
# Save the audio buffer to a WAV file
with wave.open(filename, 'wb') as wf:
wf.setnchannels(CHANNELS)
wf.setsampwidth(2) # 16-bit audio
wf.setframerate(SAMPLE_RATE)
# Convert float32 to int16
audio_data = (self.buffer * 32767).astype(np.int16)
wf.writeframes(audio_data.tobytes())
print(f"Saved audio segment to {filename}")
# Transcribe the audio
try:
segments, info = asr_model.transcribe(
filename,
language="en",
beam_size=5,
temperature=0
)
# Format the transcription result
result = {
"text": " ".join(segment.text for segment in segments),
"segments": [
{
"text": segment.text,
"start": segment.start,
"end": segment.end,
"confidence": segment.confidence
}
for segment in segments
]
}
# Save metadata and transcription
metadata = {
"timestamp": timestamp,
"wake_word": wake_word,
"wake_word_confidence": float(prediction[wake_word]),
"sample_rate": SAMPLE_RATE,
"channels": CHANNELS,
"duration": BUFFER_DURATION,
"transcription": result
}
with open(f"{filename}.json", 'w') as f:
json.dump(metadata, f, indent=2)
print("\nTranscription result:")
print(f"Text: {result['text']}")
print("\nSegments:")
for segment in result["segments"]:
print(f"[{segment['start']:.2f}s - {segment['end']:.2f}s] ({segment['confidence']:.2%})")
print(f'"{segment["text"]}"')
except Exception as e:
print(f"Error during transcription: {e}")
metadata = {
"timestamp": timestamp,
"wake_word": wake_word,
"wake_word_confidence": float(prediction[wake_word]),
"sample_rate": SAMPLE_RATE,
"channels": CHANNELS,
"duration": BUFFER_DURATION,
"error": str(e)
}
with open(f"{filename}.json", 'w') as f:
json.dump(metadata, f, indent=2)
def start(self):
"""Start audio processing"""
try:
print("Initializing wake word detection...")
print(f"Loaded wake words: {', '.join(WAKE_WORDS)}")
with sd.InputStream(
channels=CHANNELS,
samplerate=SAMPLE_RATE,
blocksize=CHUNK_SIZE,
callback=self.audio_callback
):
print("\nWake word detection started. Listening...")
print("Press Ctrl+C to stop")
while True:
sd.sleep(1000) # Sleep for 1 second
except KeyboardInterrupt:
print("\nStopping wake word detection...")
except Exception as e:
print(f"Error in audio processing: {e}")
if __name__ == "__main__":
processor = AudioProcessor()
processor.start()

View File

@@ -4,6 +4,9 @@ gem "github-pages", group: :jekyll_plugins
gem "jekyll-theme-minimal"
gem "jekyll-relative-links"
gem "jekyll-seo-tag"
gem "jekyll-remote-theme"
gem "jekyll-github-metadata"
gem "faraday-retry"
# Windows and JRuby does not include zoneinfo files, so bundle the tzinfo-data gem
# and associated library.
@@ -15,3 +18,6 @@ end
# Lock `http_parser.rb` gem to `v0.6.x` on JRuby builds since newer versions of the gem
# do not have a Java counterpart.
gem "http_parser.rb", "~> 0.6.0", :platforms => [:jruby]
# Add webrick for Ruby 3.0+
gem "webrick", "~> 1.7"

View File

@@ -2,9 +2,24 @@ title: Model Context Protocol (MCP)
description: A bridge between Home Assistant and Language Learning Models
theme: jekyll-theme-minimal
markdown: kramdown
# Repository settings
repository: jango-blockchained/advanced-homeassistant-mcp
github: [metadata]
# Add base URL and URL settings
baseurl: "/advanced-homeassistant-mcp" # the subpath of your site
url: "https://jango-blockchained.github.io" # the base hostname & protocol
# Theme settings
logo: /assets/img/logo.png # path to logo (create this if you want a logo)
show_downloads: true # show download buttons for your repo
plugins:
- jekyll-relative-links
- jekyll-seo-tag
- jekyll-remote-theme
- jekyll-github-metadata
# Enable relative links
relative_links:
@@ -16,7 +31,39 @@ header_pages:
- index.md
- getting-started.md
- api.md
- usage.md
- tools/tools.md
- development/development.md
- troubleshooting.md
- contributing.md
- roadmap.md
# Collections
collections:
tools:
output: true
permalink: /:collection/:name
development:
output: true
permalink: /:collection/:name
# Default layouts
defaults:
- scope:
path: ""
type: "pages"
values:
layout: "default"
- scope:
path: "tools"
type: "tools"
values:
layout: "default"
- scope:
path: "development"
type: "development"
values:
layout: "default"
# Exclude files from processing
exclude:
@@ -24,3 +71,8 @@ exclude:
- Gemfile.lock
- node_modules
- vendor
# Sass settings
sass:
style: compressed
sass_dir: _sass

91
examples/README.md Normal file
View File

@@ -0,0 +1,91 @@
# Speech-to-Text Examples
This directory contains examples demonstrating how to use the speech-to-text integration with wake word detection.
## Prerequisites
1. Make sure you have Docker installed and running
2. Build and start the services:
```bash
docker-compose up -d
```
## Running the Example
1. Install dependencies:
```bash
npm install
```
2. Run the example:
```bash
npm run example:speech
```
Or using `ts-node` directly:
```bash
npx ts-node examples/speech-to-text-example.ts
```
## Features Demonstrated
1. **Wake Word Detection**
- Listens for wake words: "hey jarvis", "ok google", "alexa"
- Automatically saves audio when wake word is detected
- Transcribes the detected speech
2. **Manual Transcription**
- Example of how to transcribe audio files manually
- Supports different models and configurations
3. **Event Handling**
- Wake word detection events
- Transcription results
- Progress updates
- Error handling
## Example Output
When a wake word is detected, you'll see output like this:
```
🎤 Wake word detected!
Timestamp: 20240203_123456
Audio file: /path/to/audio/wake_word_20240203_123456.wav
Metadata file: /path/to/audio/wake_word_20240203_123456.wav.json
📝 Transcription result:
Full text: This is what was said after the wake word.
Segments:
1. [0.00s - 1.52s] (95.5% confidence)
"This is what was said"
2. [1.52s - 2.34s] (98.2% confidence)
"after the wake word."
```
## Customization
You can customize the behavior by:
1. Changing the wake word models in `docker/speech/Dockerfile`
2. Modifying transcription options in the example file
3. Adding your own event handlers
4. Implementing different audio processing logic
## Troubleshooting
1. **Docker Issues**
- Make sure Docker is running
- Check container logs: `docker-compose logs fast-whisper`
- Verify container is up: `docker ps`
2. **Audio Issues**
- Check audio device permissions
- Verify audio file format (WAV files recommended)
- Check audio file permissions
3. **Performance Issues**
- Try using a smaller model (tiny.en or base.en)
- Adjust beam size and patience parameters
- Consider using GPU acceleration if available

View File

@@ -0,0 +1,91 @@
import { SpeechToText, TranscriptionResult, WakeWordEvent } from '../src/speech/speechToText';
import path from 'path';
async function main() {
// Initialize the speech-to-text service
const speech = new SpeechToText('fast-whisper');
// Check if the service is available
const isHealthy = await speech.checkHealth();
if (!isHealthy) {
console.error('Speech service is not available. Make sure Docker is running and the fast-whisper container is up.');
console.error('Run: docker-compose up -d');
process.exit(1);
}
console.log('Speech service is ready!');
console.log('Listening for wake words: "hey jarvis", "ok google", "alexa"');
console.log('Press Ctrl+C to exit');
// Set up event handlers
speech.on('wake_word', (event: WakeWordEvent) => {
console.log('\n🎤 Wake word detected!');
console.log(' Timestamp:', event.timestamp);
console.log(' Audio file:', event.audioFile);
console.log(' Metadata file:', event.metadataFile);
});
speech.on('transcription', (event: { audioFile: string; result: TranscriptionResult }) => {
console.log('\n📝 Transcription result:');
console.log(' Full text:', event.result.text);
console.log('\n Segments:');
event.result.segments.forEach((segment, index) => {
console.log(` ${index + 1}. [${segment.start.toFixed(2)}s - ${segment.end.toFixed(2)}s] (${(segment.confidence * 100).toFixed(1)}% confidence)`);
console.log(` "${segment.text}"`);
});
});
speech.on('progress', (event: { type: string; data: string }) => {
if (event.type === 'stderr' && !event.data.includes('Loading model')) {
console.error('❌ Error:', event.data);
}
});
speech.on('error', (error: Error) => {
console.error('❌ Error:', error.message);
});
// Example of manual transcription
async function transcribeFile(filepath: string) {
try {
console.log(`\n🎯 Manually transcribing: ${filepath}`);
const result = await speech.transcribeAudio(filepath, {
model: 'base.en', // You can change this to tiny.en, small.en, medium.en, or large-v2
language: 'en',
temperature: 0,
beamSize: 5
});
console.log('\n📝 Transcription result:');
console.log(' Text:', result.text);
} catch (error) {
console.error('❌ Transcription failed:', error instanceof Error ? error.message : error);
}
}
// Create audio directory if it doesn't exist
const audioDir = path.join(__dirname, '..', 'audio');
if (!require('fs').existsSync(audioDir)) {
require('fs').mkdirSync(audioDir, { recursive: true });
}
// Start wake word detection
speech.startWakeWordDetection(audioDir);
// Example: You can also manually transcribe files
// Uncomment the following line and replace with your audio file:
// await transcribeFile('/path/to/your/audio.wav');
// Keep the process running
process.on('SIGINT', () => {
console.log('\nStopping speech service...');
speech.stopWakeWordDetection();
process.exit(0);
});
}
// Run the example
main().catch(error => {
console.error('Fatal error:', error);
process.exit(1);
});

View File

@@ -21,7 +21,8 @@
"profile": "bun --inspect src/index.ts",
"clean": "rm -rf dist .bun coverage",
"typecheck": "bun x tsc --noEmit",
"preinstall": "bun install --frozen-lockfile"
"preinstall": "bun install --frozen-lockfile",
"example:speech": "bun run examples/speech-to-text-example.ts"
},
"dependencies": {
"@elysiajs/cors": "^1.2.0",

View File

@@ -33,6 +33,21 @@ export const AppConfigSchema = z.object({
HASS_HOST: z.string().default("http://192.168.178.63:8123"),
HASS_TOKEN: z.string().optional(),
/** Speech Features Configuration */
SPEECH: z.object({
ENABLED: z.boolean().default(false),
WAKE_WORD_ENABLED: z.boolean().default(false),
SPEECH_TO_TEXT_ENABLED: z.boolean().default(false),
WHISPER_MODEL_PATH: z.string().default("/models"),
WHISPER_MODEL_TYPE: z.string().default("base"),
}).default({
ENABLED: false,
WAKE_WORD_ENABLED: false,
SPEECH_TO_TEXT_ENABLED: false,
WHISPER_MODEL_PATH: "/models",
WHISPER_MODEL_TYPE: "base",
}),
/** Security Configuration */
JWT_SECRET: z.string().default("your-secret-key"),
RATE_LIMIT: z.object({
@@ -113,4 +128,11 @@ export const APP_CONFIG = AppConfigSchema.parse({
LOG_REQUESTS: process.env.LOG_REQUESTS === "true",
},
VERSION: "0.1.0",
SPEECH: {
ENABLED: process.env.ENABLE_SPEECH_FEATURES === "true",
WAKE_WORD_ENABLED: process.env.ENABLE_WAKE_WORD === "true",
SPEECH_TO_TEXT_ENABLED: process.env.ENABLE_SPEECH_TO_TEXT === "true",
WHISPER_MODEL_PATH: process.env.WHISPER_MODEL_PATH || "/models",
WHISPER_MODEL_TYPE: process.env.WHISPER_MODEL_TYPE || "base",
},
});

View File

@@ -25,6 +25,8 @@ import {
climateCommands,
type Command,
} from "./commands.js";
import { speechService } from "./speech/index.js";
import { APP_CONFIG } from "./config/app.config.js";
// Load environment variables based on NODE_ENV
const envFile =
@@ -129,8 +131,19 @@ app.get("/health", () => ({
status: "ok",
timestamp: new Date().toISOString(),
version: "0.1.0",
speech_enabled: APP_CONFIG.SPEECH.ENABLED,
wake_word_enabled: APP_CONFIG.SPEECH.WAKE_WORD_ENABLED,
speech_to_text_enabled: APP_CONFIG.SPEECH.SPEECH_TO_TEXT_ENABLED,
}));
// Initialize speech service if enabled
if (APP_CONFIG.SPEECH.ENABLED) {
console.log("Initializing speech service...");
speechService.initialize().catch((error) => {
console.error("Failed to initialize speech service:", error);
});
}
// Create API endpoints for each tool
tools.forEach((tool) => {
app.post(`/api/tools/${tool.name}`, async ({ body }: { body: Record<string, unknown> }) => {
@@ -145,7 +158,12 @@ app.listen(PORT, () => {
});
// Handle server shutdown
process.on("SIGTERM", () => {
process.on("SIGTERM", async () => {
console.log("Received SIGTERM. Shutting down gracefully...");
if (APP_CONFIG.SPEECH.ENABLED) {
await speechService.shutdown().catch((error) => {
console.error("Error shutting down speech service:", error);
});
}
process.exit(0);
});

View File

View File

@@ -0,0 +1,116 @@
import { SpeechToText, WakeWordEvent, TranscriptionError } from '../speechToText';
import fs from 'fs';
import path from 'path';
describe('SpeechToText', () => {
let speechToText: SpeechToText;
const testAudioDir = path.join(__dirname, 'test_audio');
beforeEach(() => {
speechToText = new SpeechToText('fast-whisper');
// Create test audio directory if it doesn't exist
if (!fs.existsSync(testAudioDir)) {
fs.mkdirSync(testAudioDir, { recursive: true });
}
});
afterEach(() => {
speechToText.stopWakeWordDetection();
// Clean up test files
if (fs.existsSync(testAudioDir)) {
fs.rmSync(testAudioDir, { recursive: true, force: true });
}
});
describe('checkHealth', () => {
it('should handle Docker not being available', async () => {
const isHealthy = await speechToText.checkHealth();
expect(isHealthy).toBeDefined();
expect(isHealthy).toBe(false);
});
});
describe('wake word detection', () => {
it('should detect new audio files and emit wake word events', (done) => {
const testFile = path.join(testAudioDir, 'wake_word_test_123456.wav');
const testMetadata = `${testFile}.json`;
speechToText.startWakeWordDetection(testAudioDir);
speechToText.on('wake_word', (event: WakeWordEvent) => {
expect(event).toBeDefined();
expect(event.audioFile).toBe(testFile);
expect(event.metadataFile).toBe(testMetadata);
expect(event.timestamp).toBe('123456');
done();
});
// Create a test audio file to trigger the event
fs.writeFileSync(testFile, 'test audio content');
}, 1000);
it('should handle transcription errors when Docker is not available', (done) => {
const testFile = path.join(testAudioDir, 'wake_word_test_123456.wav');
let errorEmitted = false;
let wakeWordEmitted = false;
const checkDone = () => {
if (errorEmitted && wakeWordEmitted) {
done();
}
};
speechToText.on('error', (error) => {
expect(error).toBeDefined();
expect(error).toBeInstanceOf(TranscriptionError);
expect(error.message).toContain('Failed to start Docker process');
errorEmitted = true;
checkDone();
});
speechToText.on('wake_word', () => {
wakeWordEmitted = true;
checkDone();
});
speechToText.startWakeWordDetection(testAudioDir);
// Create a test audio file to trigger the event
fs.writeFileSync(testFile, 'test audio content');
}, 1000);
});
describe('transcribeAudio', () => {
it('should handle Docker not being available for transcription', async () => {
await expect(
speechToText.transcribeAudio('/audio/test.wav')
).rejects.toThrow(TranscriptionError);
});
it('should emit progress events on error', (done) => {
let progressEmitted = false;
let errorThrown = false;
const checkDone = () => {
if (progressEmitted && errorThrown) {
done();
}
};
speechToText.on('progress', (event: { type: string; data: string }) => {
expect(event.type).toBe('stderr');
expect(event.data).toBe('Failed to start Docker process');
progressEmitted = true;
checkDone();
});
speechToText.transcribeAudio('/audio/test.wav')
.catch((error) => {
expect(error).toBeInstanceOf(TranscriptionError);
errorThrown = true;
checkDone();
});
}, 1000);
});
});

110
src/speech/index.ts Normal file
View File

@@ -0,0 +1,110 @@
import { APP_CONFIG } from "../config/app.config.js";
import { logger } from "../utils/logger.js";
import type { IWakeWordDetector, ISpeechToText } from "./types.js";
class SpeechService {
private static instance: SpeechService | null = null;
private isInitialized: boolean = false;
private wakeWordDetector: IWakeWordDetector | null = null;
private speechToText: ISpeechToText | null = null;
private constructor() { }
public static getInstance(): SpeechService {
if (!SpeechService.instance) {
SpeechService.instance = new SpeechService();
}
return SpeechService.instance;
}
public async initialize(): Promise<void> {
if (this.isInitialized) {
return;
}
if (!APP_CONFIG.SPEECH.ENABLED) {
logger.info("Speech features are disabled. Skipping initialization.");
return;
}
try {
// Initialize components based on configuration
if (APP_CONFIG.SPEECH.WAKE_WORD_ENABLED) {
logger.info("Initializing wake word detection...");
// Dynamic import to avoid loading the module if not needed
const { WakeWordDetector } = await import("./wakeWordDetector.js");
this.wakeWordDetector = new WakeWordDetector() as IWakeWordDetector;
await this.wakeWordDetector.initialize();
}
if (APP_CONFIG.SPEECH.SPEECH_TO_TEXT_ENABLED) {
logger.info("Initializing speech-to-text...");
// Dynamic import to avoid loading the module if not needed
const { SpeechToText } = await import("./speechToText.js");
this.speechToText = new SpeechToText({
modelPath: APP_CONFIG.SPEECH.WHISPER_MODEL_PATH,
modelType: APP_CONFIG.SPEECH.WHISPER_MODEL_TYPE,
}) as ISpeechToText;
await this.speechToText.initialize();
}
this.isInitialized = true;
logger.info("Speech service initialized successfully");
} catch (error) {
logger.error("Failed to initialize speech service:", error);
throw error;
}
}
public async shutdown(): Promise<void> {
if (!this.isInitialized) {
return;
}
try {
if (this.wakeWordDetector) {
await this.wakeWordDetector.shutdown();
this.wakeWordDetector = null;
}
if (this.speechToText) {
await this.speechToText.shutdown();
this.speechToText = null;
}
this.isInitialized = false;
logger.info("Speech service shut down successfully");
} catch (error) {
logger.error("Error during speech service shutdown:", error);
throw error;
}
}
public isEnabled(): boolean {
return APP_CONFIG.SPEECH.ENABLED;
}
public isWakeWordEnabled(): boolean {
return APP_CONFIG.SPEECH.WAKE_WORD_ENABLED;
}
public isSpeechToTextEnabled(): boolean {
return APP_CONFIG.SPEECH.SPEECH_TO_TEXT_ENABLED;
}
public getWakeWordDetector(): IWakeWordDetector {
if (!this.isInitialized || !this.wakeWordDetector) {
throw new Error("Wake word detector is not initialized");
}
return this.wakeWordDetector;
}
public getSpeechToText(): ISpeechToText {
if (!this.isInitialized || !this.speechToText) {
throw new Error("Speech-to-text is not initialized");
}
return this.speechToText;
}
}
export const speechService = SpeechService.getInstance();

247
src/speech/speechToText.ts Normal file
View File

@@ -0,0 +1,247 @@
import { spawn } from 'child_process';
import { EventEmitter } from 'events';
import { watch } from 'fs';
import path from 'path';
import { ISpeechToText, SpeechToTextConfig } from "./types.js";
export interface TranscriptionOptions {
model?: 'tiny.en' | 'base.en' | 'small.en' | 'medium.en' | 'large-v2';
language?: string;
temperature?: number;
beamSize?: number;
patience?: number;
device?: 'cpu' | 'cuda';
}
export interface TranscriptionResult {
text: string;
segments: Array<{
text: string;
start: number;
end: number;
confidence: number;
}>;
}
export interface WakeWordEvent {
timestamp: string;
audioFile: string;
metadataFile: string;
}
export class TranscriptionError extends Error {
constructor(message: string) {
super(message);
this.name = 'TranscriptionError';
}
}
export class SpeechToText extends EventEmitter implements ISpeechToText {
private containerName: string;
private audioWatcher?: ReturnType<typeof watch>;
private modelPath: string;
private modelType: string;
private isInitialized: boolean = false;
constructor(config: SpeechToTextConfig) {
super();
this.containerName = config.containerName || 'fast-whisper';
this.modelPath = config.modelPath;
this.modelType = config.modelType;
}
public async initialize(): Promise<void> {
if (this.isInitialized) {
return;
}
try {
// Initialization logic will be implemented here
await this.setupContainer();
this.isInitialized = true;
this.emit('ready');
} catch (error) {
this.emit('error', error);
throw error;
}
}
public async shutdown(): Promise<void> {
if (!this.isInitialized) {
return;
}
try {
// Cleanup logic will be implemented here
await this.cleanupContainer();
this.isInitialized = false;
this.emit('shutdown');
} catch (error) {
this.emit('error', error);
throw error;
}
}
public async transcribe(audioData: Buffer): Promise<string> {
if (!this.isInitialized) {
throw new Error("Speech-to-text service is not initialized");
}
try {
// Transcription logic will be implemented here
this.emit('transcribing');
const result = await this.processAudio(audioData);
this.emit('transcribed', result);
return result;
} catch (error) {
this.emit('error', error);
throw error;
}
}
private async setupContainer(): Promise<void> {
// Container setup logic will be implemented here
await new Promise(resolve => setTimeout(resolve, 100)); // Placeholder
}
private async cleanupContainer(): Promise<void> {
// Container cleanup logic will be implemented here
await new Promise(resolve => setTimeout(resolve, 100)); // Placeholder
}
private async processAudio(audioData: Buffer): Promise<string> {
// Audio processing logic will be implemented here
await new Promise(resolve => setTimeout(resolve, 100)); // Placeholder
return "Transcription placeholder";
}
startWakeWordDetection(audioDir: string = './audio'): void {
// Watch for new audio files from wake word detection
this.audioWatcher = watch(audioDir, (eventType, filename) => {
if (eventType === 'rename' && filename && filename.startsWith('wake_word_') && filename.endsWith('.wav')) {
const audioFile = path.join(audioDir, filename);
const metadataFile = `${audioFile}.json`;
const parts = filename.split('_');
const timestamp = parts[parts.length - 1].split('.')[0];
// Emit wake word event
this.emit('wake_word', {
timestamp,
audioFile,
metadataFile
} as WakeWordEvent);
// Automatically transcribe the wake word audio
this.transcribeAudio(audioFile)
.then(result => {
this.emit('transcription', { audioFile, result });
})
.catch(error => {
this.emit('error', error);
});
}
});
}
stopWakeWordDetection(): void {
if (this.audioWatcher) {
this.audioWatcher.close();
this.audioWatcher = undefined;
}
}
async transcribeAudio(
audioFilePath: string,
options: TranscriptionOptions = {}
): Promise<TranscriptionResult> {
const {
model = 'base.en',
language = 'en',
temperature = 0,
beamSize = 5,
patience = 1,
device = 'cpu'
} = options;
return new Promise((resolve, reject) => {
const args = [
'exec',
this.containerName,
'fast-whisper',
'--model', model,
'--language', language,
'--temperature', temperature.toString(),
'--beam-size', beamSize.toString(),
'--patience', patience.toString(),
'--device', device,
'--output-json',
audioFilePath
];
let process;
try {
process = spawn('docker', args);
} catch (error) {
this.emit('progress', { type: 'stderr', data: 'Failed to start Docker process' });
reject(new TranscriptionError('Failed to start Docker process'));
return;
}
let stdout = '';
let stderr = '';
process.stdout?.on('data', (data: Buffer) => {
stdout += data.toString();
this.emit('progress', { type: 'stdout', data: data.toString() });
});
process.stderr?.on('data', (data: Buffer) => {
stderr += data.toString();
this.emit('progress', { type: 'stderr', data: data.toString() });
});
process.on('error', (error: Error) => {
this.emit('progress', { type: 'stderr', data: error.message });
reject(new TranscriptionError(`Failed to execute Docker command: ${error.message}`));
});
process.on('close', (code: number) => {
if (code !== 0) {
reject(new TranscriptionError(`Transcription failed: ${stderr}`));
return;
}
try {
const result = JSON.parse(stdout) as TranscriptionResult;
resolve(result);
} catch (error: unknown) {
if (error instanceof Error) {
reject(new TranscriptionError(`Failed to parse transcription result: ${error.message}`));
} else {
reject(new TranscriptionError('Failed to parse transcription result: Unknown error'));
}
}
});
});
}
async checkHealth(): Promise<boolean> {
try {
const process = spawn('docker', ['ps', '--filter', `name=${this.containerName}`, '--format', '{{.Status}}']);
return new Promise((resolve) => {
let output = '';
process.stdout?.on('data', (data: Buffer) => {
output += data.toString();
});
process.on('error', () => {
resolve(false);
});
process.on('close', (code: number) => {
resolve(code === 0 && output.toLowerCase().includes('up'));
});
});
} catch (error) {
return false;
}
}
}

20
src/speech/types.ts Normal file
View File

@@ -0,0 +1,20 @@
import { EventEmitter } from "events";
export interface IWakeWordDetector {
initialize(): Promise<void>;
shutdown(): Promise<void>;
startListening(): Promise<void>;
stopListening(): Promise<void>;
}
export interface ISpeechToText extends EventEmitter {
initialize(): Promise<void>;
shutdown(): Promise<void>;
transcribe(audioData: Buffer): Promise<string>;
}
export interface SpeechToTextConfig {
modelPath: string;
modelType: string;
containerName?: string;
}

View File

@@ -0,0 +1,64 @@
import { IWakeWordDetector } from "./types.js";
export class WakeWordDetector implements IWakeWordDetector {
private isListening: boolean = false;
private isInitialized: boolean = false;
public async initialize(): Promise<void> {
if (this.isInitialized) {
return;
}
// Initialization logic will be implemented here
await this.setupDetector();
this.isInitialized = true;
}
public async shutdown(): Promise<void> {
if (this.isListening) {
await this.stopListening();
}
if (this.isInitialized) {
await this.cleanupDetector();
this.isInitialized = false;
}
}
public async startListening(): Promise<void> {
if (!this.isInitialized) {
throw new Error("Wake word detector is not initialized");
}
if (this.isListening) {
return;
}
await this.startDetection();
this.isListening = true;
}
public async stopListening(): Promise<void> {
if (!this.isListening) {
return;
}
await this.stopDetection();
this.isListening = false;
}
private async setupDetector(): Promise<void> {
// Setup logic will be implemented here
await new Promise(resolve => setTimeout(resolve, 100)); // Placeholder
}
private async cleanupDetector(): Promise<void> {
// Cleanup logic will be implemented here
await new Promise(resolve => setTimeout(resolve, 100)); // Placeholder
}
private async startDetection(): Promise<void> {
// Start detection logic will be implemented here
await new Promise(resolve => setTimeout(resolve, 100)); // Placeholder
}
private async stopDetection(): Promise<void> {
// Stop detection logic will be implemented here
await new Promise(resolve => setTimeout(resolve, 100)); // Placeholder
}
}