Production-grade implementation of an advanced acoustic projection microphone system with real-time translation capabilities.
- Features
- 🌍 Local Translation (100% Private)
- Architecture
- 🚀 Quick Start
- 🌐 REST API with Global Node Access
- 📊 Production Launcher Features
- 🧪 System Validation
- 📁 Project Structure Overview
- 🐳 Docker Deployment
- 🛠️ Troubleshooting
- 📊 Performance
- 💻 API Documentation
- 🧪 Testing
- ⚙️ Configuration
- 🤝 Contributing
- 📜 System Requirements
- 🎯 What's Next
- License
- 🙏 Acknowledgments
- 📚 References
- 📖 Citation
- 📧 Support
- Advanced Beamforming: Delay-and-sum, superdirective, and adaptive null-steering algorithms
- Deep Noise Suppression: LSTM-based neural network for speech enhancement
- Acoustic Echo Cancellation: NLMS adaptive filter with double-talk detection
- Voice Activity Detection: Energy and zero-crossing rate based VAD with hangover mechanism
- Real-time Translation: Local Whisper + NLLB pipeline (200+ languages, fully offline); TensorFlow Lite engine also supported
- Directional Audio Projection: Phased array synthesis for targeted audio delivery
- End-to-End Encryption: ChaCha20-Poly1305 + X25519 key exchange via libsodium; Argon2id password-based key derivation and file encryption
- Push-to-Talk (PTT) Controller: Hardware/software PTT with keyboard, mouse, external pedal, and software-controlled modes
- Call Signaling: UDP-based call setup, teardown, and session management with peer discovery
- Real Audio I/O: PortAudio integration for live microphone capture and speaker playback (when PortAudio is installed)
- WAV File I/O: Read and write WAV files for offline processing and testing
- High Performance: FFTW-optimized FFT with STFT support, multi-threaded processing, SIMD-ready
- Production Launcher: Enterprise-grade startup system with automatic health checks and monitoring
- REST API with Global Node Access: FastAPI-based REST API for peer discovery and session management across all network nodes
- Auto-Calibration Mode: Streamlines initial setup by automatically tuning key system parameters.
- Real-Time Monitoring Dashboard: Expands live visibility into system status and processing behavior.
- Adaptive Feedback Suppression: Reduces acoustic feedback dynamically during operation.
- Preset Profiles: Adds ready-to-use configuration profiles for common deployment scenarios.
- Diagnostics: Improves troubleshooting with clearer runtime checks and diagnostic reporting.
APM System includes fully local speech recognition and translation using state-of-the-art AI models. Your conversations never leave your device.
- 🔒 100% Private - No cloud APIs, all processing on-device
- 🌐 200+ Languages - Powered by Meta's NLLB translation model
- 🎤 Accurate Speech Recognition - OpenAI Whisper for transcription
- ⚡ Real-time Performance - 2-4 seconds per sentence (GPU) or 5-8 seconds (CPU)
- 🚫 No Internet Required - Works completely offline after initial setup
# One-command setup
./scripts/setup.sh --full
# Or step-by-step: activate venv and run translation bridge
source venv/bin/activate
python3 scripts/translation_bridge.py audio.wav --source en --target esPlatform Notes
- Supported on Linux and macOS (Intel & Apple Silicon)
- Text translation fallback uses portable
<cctype>classification (no locale-dependent behavior) for cross-platform correctness - CI and Docker validate Linux builds; macOS builds are verified locally
English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, Arabic, Hindi, and 180+ more.
See docs/translation/ for complete translation documentation.
┌─────────────────────────────────────────────────────────────┐
│ APM System Pipeline │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐
│ Microphone │───▶│ Beamforming │───▶│ Echo │
│ Array (4-16) │ │ Engine │ │ Cancellation│
└─────────────────┘ └──────────────────┘ └─────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐
│ Directional │◀───│ Translation │◀───│ Noise │
│ Projector │ │ Engine │ │ Suppression │
└─────────────────┘ └──────────────────┘ └─────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────┐
│ Speaker Array │ │ VAD │
│ (3-8) │ │ Engine │
└─────────────────┘ └─────────────┘
The codebase is intentionally split into two layers:
APMCore(src/apm_core.cpp): lightweight DSP + text fallback suitable for embeddingAPMSystem(src/core/apm_system.cpp): full real-time pipeline (beamforming, NS, AEC, VAD, translation, projection)
Each component is defined exactly once to ensure clean builds across Linux and macOS.
- Node.js 14+ (for launcher) - Download
- CMake 3.18+ - Download
- C++20 Compiler - GCC 10+, Clang 11+, or MSVC 2019+
- FFTW3 (optional) -
sudo apt-get install libfftw3-dev(Linux) orbrew install fftw(Mac) - libsodium (optional, for encryption) -
sudo apt-get install libsodium-dev(Linux) orbrew install libsodium(Mac) - PortAudio (optional, for live audio I/O) -
sudo apt-get install portaudio19-dev(Linux) orbrew install portaudio(Mac)
# Linux/Mac
./scripts/start-apm.sh
# Windows
scripts\start-apm.batThat's it! The launcher will:
- ✅ Validate your environment
- ✅ Install Node.js dependencies (if needed)
- ✅ Build the C++ backend (if needed)
- ✅ Start the APM system
- ✅ Open the dashboard in your browser
If you prefer step-by-step control:
# 1. Install launcher dependencies
cd launcher
npm install
# 2. Build C++ backend
cd ..
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release
# 3. Start the system
cd launcher
npm startThe system starts on:
- Backend API: http://localhost:8080
- Dashboard UI: http://localhost:4173
# Use different ports
APM_BACKEND_PORT=9000 APM_UI_PORT=5000 npm start
# Enable debug logging
DEBUG=1 npm start
# Combined
APM_BACKEND_PORT=9000 DEBUG=1 npm startThe APM system includes a FastAPI-based REST API server that enables peer discovery and session management across all nodes in the network.
Start the REST API server independently:
# Linux/Mac
./scripts/start-api.sh
# Windows
scripts\start-api.bat
# With custom configuration
python3 backend/main.py --host 0.0.0.0 --port 8080By default, the API binds to 0.0.0.0, making it accessible from any node in the network. This enables:
- Peer Discovery: Other nodes can query this node's peer list
- Session Management: Remote nodes can initiate and manage sessions
- Health Monitoring: Network-wide health checks
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/api/peers |
GET | List all discovered peers |
/api/peers/{peer_id} |
GET | Get specific peer details |
/api/status |
POST | Update local peer status |
/api/session |
POST | Create new session |
/api/session/{session_id} |
GET | Get session status |
/api/session/{session_id}/accept |
POST | Accept session |
/api/session/{session_id}/end |
POST | End session |
Once running, visit:
- Swagger UI: http://localhost:8080/docs
- ReDoc: http://localhost:8080/redoc
# Environment variables
export APM_API_HOST=0.0.0.0 # Global access (default)
export APM_API_PORT=8080 # API port (default)
# Command-line arguments
python3 backend/main.py --host 0.0.0.0 --port 8080 --reloadFor complete REST API documentation, see backend/README.md
The APM launcher is enterprise-grade with:
- Checks for required executables and files
- Validates port availability
- Verifies build artifacts
- Ensures proper file permissions
- Automatic backend health checks every 300ms
- 60-second timeout with informative error messages
- Real-time process monitoring
- Captures backend stdout/stderr for debugging
- Graceful shutdown on SIGINT/SIGTERM
- Force-kill after 5-second timeout
- Port conflict detection
- Detailed error messages with solutions
[2025-01-15T10:30:45.123Z] [INFO] Validating environment...
[2025-01-15T10:30:45.456Z] [SUCCESS] Environment validation passed
[2025-01-15T10:30:45.789Z] [INFO] Starting C++ backend...
[2025-01-15T10:30:46.012Z] [Backend] Server listening on port 8080
[2025-01-15T10:30:47.345Z] [SUCCESS] Backend healthy after 3 checks (1234ms)
[2025-01-15T10:30:47.678Z] [SUCCESS] UI server listening on http://localhost:4173
[2025-01-15T10:30:48.901Z] [SUCCESS] APM System is fully operational! 🚀
- UI server binds to
127.0.0.1only (localhost) - Security headers enabled (X-Frame-Options, X-XSS-Protection, X-Content-Type-Options)
- No external file system access from UI server
- 404 for all non-root paths
- Fast startup: < 5 seconds typical
- Minimal overhead: ~30MB RAM for launcher
- Automatic process cleanup
- Multi-platform support (Windows/Mac/Linux)
# Validate your entire setup
node scripts/healthcheck.jsChecks:
- ✅ Node.js version (14+)
- ✅ CMake installation
- ✅ C++ compiler availability
- ✅ File structure integrity
- ✅ Backend binary exists
- ✅ Dependencies installed
- ✅ Runtime status (if running)
- ✅ Port availability
# Run full integration test suite
node tests/integration/integration.test.jsTests include:
- Backend health endpoint
- Response time benchmarks
- Concurrent request handling
- UI server functionality
- Security headers
- Load testing (100 sequential, 50 concurrent requests)
Acoustic-Projection-Microphone-System/
├── backend/ # Python FastAPI backend and signaling/control modules
├── cmake/ # CMake package/config helpers
├── config/ # Optional/runtime dependency configuration
├── docker/ # Container build assets
├── docs/ # Architecture, build, deployment, security, and user docs
├── examples/ # C++ and Python usage examples
├── frontend/ # Frontend adapters/hooks/components
├── include/apm/ # Public C++ API headers
├── installers/ # Installer assets (e.g., Inno Setup)
├── launcher/ # Node.js launcher/orchestration entrypoint
├── scripts/ # Setup, startup, and utility scripts
├── src/ # Core C++ implementation (DSP, signaling, translation, tools)
├── tests/ # C++ and Node integration tests
├── tools/ # Utility tools and scripts
├── ui/ # React + Vite web UI
├── CMakeLists.txt # Root CMake build configuration
├── README.md # Project overview and usage
├── TESTING_AND_VALIDATION.md # Validation matrix and test guidance
└── SYSTEM_MANIFEST.md # Repository/system manifest
# Build the image
docker build -t apm-system .
# Run example
docker run --rm apm-system
# Development environment
docker run -it --rm -v $(pwd):/workspace/apm apm-system:developmentFROM node:18-alpine AS launcher
WORKDIR /app
COPY launcher/package*.json ./
RUN npm ci --production
FROM gcc:11 AS backend
WORKDIR /app
COPY . .
RUN cmake -B build -DCMAKE_BUILD_TYPE=Release && \
cmake --build build --config Release
FROM node:18-alpine
WORKDIR /app
COPY --from=launcher /app/node_modules ./launcher/node_modules
COPY --from=backend /app/build/apm_backend ./apm_backend
COPY launcher/apm_launcher.js ./launcher/
COPY ui/apm-dashboard.html ./ui/
EXPOSE 8080 4173
CMD ["node", "launcher/apm_launcher.js"]Error: Backend executable not found
# Rebuild the backend
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --config ReleaseError: Backend port 8080 is already in use
# Find and kill the process
lsof -i :8080 # Linux/Mac
netstat -ano | findstr :8080 # Windows
# Or use a different port
APM_BACKEND_PORT=8081 npm startError: Backend health check timed out
# Run backend directly to see errors
./apm_backend # or ./build/apm_backend
# Check for:
# - Firewall blocking localhost
# - Missing dependencies
# - Port conflictsError: UI file not found
# Verify file exists in parent directory
ls -la ../apm-dashboard.html
# File must be at: apm/apm-dashboard.html
# Launcher must be at: apm/launcher/apm_launcher.jsQ: Build fails with "fftw3.h not found"
A: Install FFTW:
sudo apt-get install libfftw3-dev # Ubuntu/Debian
brew install fftw # macOS
vcpkg install fftw3 # WindowsQ: Tests fail with "Segmentation fault"
A: Check audio frame sizes match across components. Ensure FFT size ≤ frame size.
Q: Poor beamforming performance
A: Verify microphone spacing matches speed of sound. Calibrate microphone positions.
Q: High CPU usage
A: Reduce sample rate from 48kHz to 16kHz for lower quality requirements.
Q: Echo cancellation not working
A: Ensure speaker reference signal is provided. Check for timing synchronization.
- Check logs: Enable debug mode with
DEBUG=1 npm start - Run health check:
node scripts/healthcheck.js - Verify prerequisites: Node.js 14+, CMake 3.18+, C++20 compiler
- Check ports: Ensure 8080 and 4173 are available
Benchmarked on Intel i7-12700K, 32GB RAM, Ubuntu 22.04:
| Component | Processing Time (20ms frame) | Throughput |
|---|---|---|
| Beamforming (4 mics) | 0.8ms | 25x real-time |
| Noise Suppression | 2.1ms | 9.5x real-time |
| Echo Cancellation | 0.5ms | 40x real-time |
| VAD | 0.1ms | 200x real-time |
| Full Pipeline | 4.2ms | 4.8x real-time |
| Launcher Overhead | < 50ms | N/A |
Memory usage:
- Backend: ~15MB (without TFLite models)
- Launcher: ~30MB
- Total: ~45MB baseline
Lightweight public facade for initializing the system and performing audio processing and text translation. Suitable for embedding.
APMCore core;
core.initialize(48000, 1); // sample_rate, num_channels
core.set_source_language("en");
core.set_target_language("es");
std::vector<float> out = core.process(input_samples);
APMCore::TextTranslationResult r = core.translate_text("Hello");
// r.success, r.translated_text, r.processing_time_msNote: The C++ text fallback in
APMCoresupports onlyen→esanden→fr. Full 200+ language support requires the Python translation bridge (scripts/translation_bridge.py).
Encapsulates audio data with metadata.
AudioFrame(size_t samples, int sample_rate, int channels);
std::span<float> samples(); // Access audio data
void compute_metadata(); // Calculate peak, RMS, clipping
std::vector<float> channel(int ch); // Extract single channelSpatial filtering for directional audio capture.
BeamformingEngine(int num_mics, float spacing_m);
AudioFrame delay_and_sum(
const std::vector<AudioFrame>& mic_array,
float azimuth_rad,
float elevation_rad
);
AudioFrame superdirective(
const std::vector<AudioFrame>& mic_array,
float azimuth_rad
);Deep learning-based noise reduction.
AudioFrame suppress(const AudioFrame& noisy);
void reset_state(); // Reset LSTM stateAdaptive echo cancellation with NLMS.
EchoCancellationEngine(int filter_length = 2048);
AudioFrame cancel_echo(
const AudioFrame& microphone,
const AudioFrame& speaker_reference
);
bool detect_double_talk(const AudioFrame& mic, const AudioFrame& ref);
void reset(); // Reset adaptive filter weights and reference bufferSpeech/non-speech classification.
struct VadResult {
bool speech_detected;
float confidence; // 0.0 to 1.0
float snr_db;
float energy_db;
};
VadResult detect(const AudioFrame& frame);
void adapt_threshold(float ambient_noise_db);
void reset(); // Reset hangover counterHigh-performance FFT using FFTW (only available when built with FFTW3; throws at runtime otherwise).
FFTProcessor(int size);
void forward(const std::vector<float>& input,
std::vector<std::complex<float>>& output);
void inverse(const std::vector<std::complex<float>>& input,
std::vector<float>& output);
static void apply_window(std::vector<float>& data, WindowType type);
// WindowType: Hann, Hamming, Blackman, KaiserComplete processing pipeline.
struct Config {
int num_microphones = 4;
float mic_spacing_m = 0.012f;
int num_speakers = 3;
float speaker_spacing_m = 0.015f;
int sample_rate = 48000;
std::string source_language = "en-US";
std::string target_language = "es-ES";
};
APMSystem(const Config& config);
APMSystem(); // Default constructor (uses Config defaults)
std::vector<AudioFrame> process(
const std::vector<AudioFrame>& microphone_array,
const AudioFrame& speaker_reference,
float target_direction_rad
);
std::future<std::vector<AudioFrame>> process_async(...);
void reset_all(); // Reset echo canceller, noise suppressor, and VAD state# Run all tests
cd build && ctest --output-on-failure
# Run specific test suite
./apm_test --gtest_filter=BeamformingTest.*
# Run with detailed output
./apm_test --gtest_output=xml:test_results.xml
# Memory leak check
valgrind --leak-check=full ./apm_test
# Performance profiling
perf record ./apm_bench
perf report
# Integration tests
node tests/integration/integration.test.jsTest coverage: 87% (lines), 92% (functions)
Microphone Array:
- Linear array: 4-8 microphones
- Spacing: 10-15mm (λ/2 at 11kHz)
- Recommended: omnidirectional electret or MEMS
Speaker Array:
- Linear array: 3-6 speakers
- Spacing: 15-20mm
- Recommended: full-range drivers, 85dB+ SPL
// Low-latency configuration
config.num_microphones = 4;
config.mic_spacing_m = 0.012f;
// High-quality configuration
config.num_microphones = 8;
config.mic_spacing_m = 0.010f;
// Language support
config.source_language = "en-US"; // English
config.target_language = "es-ES"; // Spanish
// APMSystem accepts any IETF language tag; 200+ languages available via Python bridge
// C++ text fallback (APMCore) supports: en→es, en→fr# Launcher configuration
export APM_BACKEND_PORT=8080 # Backend API port
export APM_UI_PORT=4173 # Dashboard UI port
export DEBUG=1 # Enable debug logging
# Backend configuration
export APM_NUM_MICS=4 # Number of microphones
export APM_MIC_SPACING=0.012 # Microphone spacing (meters)
export APM_NUM_SPEAKERS=3 # Number of speakersContributions welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow C++20 Core Guidelines
- Format with
clang-format(Google style) - Add unit tests for new features
- Update documentation
- Run
node scripts/healthcheck.jsbefore submitting
- CPU: 2 cores
- RAM: 512MB
- Disk: 100MB
- Node.js: 14.0.0+
- CMake: 3.18+
- CPU: 4+ cores with AVX2
- RAM: 2GB+
- Disk: 1GB+
- Node.js: 18.0.0+
- CMake: 3.20+
- OS: Linux or macOS (Windows supported via MSYS2/vcpkg)
Now that the full APM pipeline builds cleanly and launches reliably, the next phase begins. This document outlines the upcoming milestones that will take the system from a validated prototype to a production‑ready acoustic intelligence engine.
- Full DSP pipeline integrated (beamforming → echo cancellation → noise suppression → VAD → translation → projection)
- APMSystem orchestrator implemented and validated
- Clean Docker build with reproducible environment
- CI pipeline green across build and lint stages
- Production launcher with health monitoring
- Automated testing and validation
- Cross-platform startup scripts
- End-to-end encryption (ChaCha20-Poly1305 + X25519 via libsodium)
- Push-to-Talk (PTT) controller with keyboard/mouse/external/software modes
- UDP call signaling with peer discovery and session management
- PortAudio real audio I/O (live microphone and speaker support)
- Local Whisper + NLLB translation (200+ languages, fully offline)
See docs/ROADMAP.md for detailed roadmap including:
- Ring buffers & low-latency audio pipeline - Continuous real-time capture with PortAudio
- TTS integration - Complete ASR → NMT → TTS speech-to-speech chain
- DSP Optimization - SIMD acceleration, FFT-based beamforming
- Developer Experience - CLI tools, config presets, documentation
This project is available for non‑commercial use only under the terms of the included LICENSE file.
Commercial use requires a separate paid license — contact [email protected].
- Author: Don Michael Feeney Jr.
- Dedicated to: Marcel Krüger
- Enhanced with: Claude (Anthropic), Google (Jules), Microsoft (Copilot)
- FFT: FFTW library by Matteo Frigo and Steven G. Johnson
- ML Framework: TensorFlow Lite by Google
I would like to acknowledge Microsoft Copilot, Anthropic Claude, Google Jules, and OpenAI ChatGPT for their meaningful assistance in refining concepts, improving clarity, and strengthening the overall quality of this code.
- Van Trees, H. L. (2002). Optimum Array Processing. Wiley-Interscience.
- Benesty, J., et al. (2007). Springer Handbook of Speech Processing. Springer.
- Paliwal, K. K., et al. (2010). "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Modulation Magnitude Estimator." Speech Communication.
- Valin, J. M. (2018). "A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement." IEEE MMSP.
If you use this work in research, please cite:
@software{feeney2026apm,
author = {Feeney, Don Michael Jr.},
title = {Acoustic Projection Microphone System},
year = {2026},
publisher = {GitHub},
url = {https://github.com/dfeen87/Acoustic-Projection-Microphone-System}
}- Email: [email protected]
- GitHub: Discussion Board
- Health Check:
node scripts/healthcheck.js - Logs: Enable with
DEBUG=1 npm start
Status: Production Ready | Version: 7.0.0 | Last Updated: April 2026