High-performance async system that captures images from RTSP video streams, analyzes them for human presence using OpenAI's vision models, and broadcasts messages to Google Hub devices when people are detected.
- Async/await architecture for 3x better performance
- RTSP stream capture with automatic resource cleanup
- Two-stage detection - YOLO for fast screening, then LLM for detailed analysis
- Cost optimization - Only processes images with LLM when YOLO detects people
- Flexible LLM support - OpenAI API or local Ollama (llama3.2-vision) for zero cost
- Advanced notification system with threading, duplicate filtering, and optimized TTS
- Cross-platform TTS - Local speakers with pyttsx3 and system fallbacks
- Google Hub/Chromecast broadcasting with device discovery
- Non-blocking notifications - Threaded and async dispatch options
- Intelligent duplicate filtering - Prevents repetitive announcements
- Health checks for external dependencies on startup
- Input validation and structured logging throughout
- Automatic image cleanup to prevent disk space issues
- Context managers for proper resource management
- Python 3.11+
- RTSP-compatible camera or stream
- Google Hub or Chromecast device on the same network
- Local speakers for TTS notifications (optional)
- LLM Provider (choose one):
- OpenAI API key for cloud analysis
- Ollama with
llama3.2-vision:latest
for local processing
Install all dependencies with:
pip install -r requirements.txt
Key dependencies:
pyttsx3
- Cross-platform text-to-speech engineopencv-python
- Image processing and RTSP captureultralytics
- YOLOv8 object detectionopenai
- Vision API for image analysispychromecast
- Google Hub/Chromecast communication
Unit tests are provided in the tests/
directory and use pytest
.
To run all tests:
pytest
To run a specific test file:
pytest tests/test_process_image.py
Make sure all dependencies are installed before running tests.
Copy .env.example
to .env
and configure:
# Required
RTSP_URL=rtsp://username:[email protected]/stream
GOOGLE_DEVICE_IP=192.168.1.200
# LLM Provider (choose one)
OPENAI_API_KEY=your_openai_api_key_here # For cloud analysis
DEFAULT_LLM_PROVIDER=ollama # For local processing
# Optional
IMAGES_DIR=images
MAX_IMAGES=100
CAPTURE_INTERVAL=10
LLM_TIMEOUT=30
All settings are centralized in src/config.py
with validation and defaults.
python -m src.app
What it does:
- Runs health checks for RTSP stream and OpenAI API
- Captures images from RTSP stream (configurable interval)
- Processes multiple images concurrently using async/await
- Uses YOLO for fast person detection, then OpenAI for detailed analysis
- Broadcasts to Google Hub when person confirmed
- Automatically cleans up old images
The system includes an advanced notification dispatcher with multiple performance optimizations:
from src.notification_dispatcher import NotificationDispatcher, NotificationTarget
# Initialize with Google Hub (optional)
dispatcher = NotificationDispatcher(
google_device_ip="192.168.1.200",
google_device_name="Kitchen Display"
)
# Send notifications to different targets
dispatcher.dispatch("Person detected at front door", NotificationTarget.LOCAL_SPEAKER)
dispatcher.dispatch("Security alert", NotificationTarget.GOOGLE_HUB)
dispatcher.dispatch("Important message", NotificationTarget.BOTH)
# Non-blocking notifications (recommended for real-time processing)
dispatcher.dispatch_threaded("Person walking by") # Fire-and-forget
# Async notifications with result checking
future = dispatcher.dispatch_async("Motion detected")
# Continue processing...
success = future.result() # Check result when needed
# Duplicate filtering (automatic)
dispatcher.dispatch("Same message") # First time: sent
dispatcher.dispatch("Same message") # Within 5 seconds: skipped
- Faster speech rate: 200 WPM (33% faster than default)
- Cross-platform support: Windows (pyttsx3), macOS (say), Linux (espeak)
- Automatic fallbacks: System commands if pyttsx3 unavailable
- Voice optimization: Uses best available voice on Windows
python -m src.notification_dispatcher
List all Google Hub/Chromecast devices on your network:
python -m src.google_devices
Capture a single image from an RTSP stream:
python -m src.image_capture
Send a custom message to a Google Hub:
python -m src.google_broadcast
sequenceDiagram
participant HealthCheck
participant MainLoop
participant RTSP
participant YOLOv8
participant OpenAI
participant GoogleHub
HealthCheck->>RTSP: Check stream connectivity
HealthCheck->>OpenAI: Validate API access
MainLoop->>RTSP: capture_image_from_rtsp()
MainLoop->>MainLoop: asyncio.create_task(process_frame)
par Async Processing
MainLoop->>YOLOv8: person_detected_yolov8(image)
alt Person detected
MainLoop->>OpenAI: analyze_image_async(image)
OpenAI-->>MainLoop: {person_present, description}
MainLoop->>GoogleHub: send_message_to_google_hub()
else No person
MainLoop->>MainLoop: cleanup_image()
end
end
Key Improvements:
- 3x faster processing with concurrent image analysis
- Health checks prevent runtime failures
- Context managers ensure proper resource cleanup
- Retry logic with exponential backoff for network calls
src/app.py
— Async main loop with health checkssrc/services.py
— AsyncRTSPProcessingService for business logicsrc/image_capture.py
— RTSP capture with context managerssrc/image_analysis.py
— Async OpenAI vision analysissrc/computer_vision.py
— YOLOv8 person detectionsrc/notification_dispatcher.py
— Advanced notification system with threading and TTS
src/config.py
— Centralized configuration with validationsrc/health_checks.py
— Startup dependency validationsrc/context_managers.py
— Resource cleanup automationsrc/google_broadcast.py
— Chromecast/Google Hub messagingsrc/google_devices.py
— Network device discoverysrc/llm_factory.py
— LangChain model factory (legacy)
requirements.txt
— Python dependencies (includes aiohttp).env.example
— Environment configuration template
# Set logging level for debugging
export PYTHONPATH=.
python -c "import logging; logging.basicConfig(level=logging.DEBUG)" -m src.app
- Processing Speed: 3x faster than synchronous version
- Concurrent Processing: Multiple images analyzed simultaneously
- Non-blocking Notifications: Threaded dispatch prevents processing delays
- TTS Optimization: 33% faster speech (200 WPM vs 150 WPM)
- Duplicate Filtering: Intelligent suppression of repetitive messages
- Resource Management: Automatic cleanup prevents memory/disk leaks
- Error Recovery: Retry logic with exponential backoff
- Health Monitoring: Startup validation of all dependencies
Contributions are welcome! Please open an issue or submit a pull request on GitHub. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/YourFeature
) - Commit your changes (
git commit -am 'Add new feature'
) - Push to the branch (
git push origin feature/YourFeature
) - Open a pull request
- OpenAI: Cloud-based, requires API key and internet connectivity
- Ollama: Local processing with
llama3.2-vision:latest
, zero API costs - RTSP stream must be accessible from the application
- Async/await: Non-blocking I/O for better performance
- Health checks: Early detection of configuration issues
- Input validation: Comprehensive validation prevents runtime errors
- Context managers: Automatic resource cleanup
- Structured logging: Better debugging and monitoring
This project is licensed under the MIT License.