Skip to content

An open-source platform for building and deploying real-time, low-latency AI voice agents for call automation for marketing.

License

Notifications You must be signed in to change notification settings

OpenVoiceX/Voice-Marketing-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

57 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Voice-Marketing-Agent logo
Voice-Marketing-Agent

This project is now OFFICIALLY accepted for:

GSSOC
Hacktoberfest

πŸŽ‰ Participating in GSSOC'25 & Hacktoberfest 2025! πŸŽ‰

Voice Marketing Agents Logo

Voice Marketing Agents πŸ€–

An open-source framework to build and deploy intelligent AI agents that can handle real-world phone calls using cutting-edge cloud APIs.

πŸš€ Get Started Β· πŸ› Report a Bug Β· ✨ Request a Feature

Stars License Forks

🌟 Stars 🍴 Forks πŸ› Issues πŸ”” Open PRs πŸ”• Closed PRs ⏱️ Last Commit πŸ› οΈ Languages πŸ‘₯ Contributors
Stars Forks Issues Open PRs Closed PRs Last Commit Languages Count Contributors Count

🌟 The Mission: Cloud-Powered Voice AI

Voice Marketing Agents leverages the power of Google Gemini, Groq, and ElevenLabs to deliver production-ready voice AI capabilities. No local models, no GPU infrastructure - just powerful cloud APIs.

πŸ”₯ Core Features

  • Lightning-Fast Responses: Groq's ultra-fast inference + ElevenLabs' low-latency TTS = natural conversations
  • Cloud-Powered AI: Gemini for intelligence, Groq for speed, ElevenLabs for studio-quality voice
  • Developer-First: Fully containerized with Docker - one command to start everything
  • Simple Management UI: Clean React dashboard for agent configuration
  • Extensible: Built with modern tech stack for easy customization
  • No Infrastructure Hassle: Everything via cloud APIs - no model management needed

πŸš€ The Tech Stack

Component Technology Why
Frontend React & Vite Fast, modern UI development
Backend Python & FastAPI Async performance for AI tasks
STT Google Gemini Voice API High-accuracy speech recognition
LLM Gemini & Groq Smart + Fast conversation engine
TTS ElevenLabs Studio-quality voice synthesis
Database PostgreSQL Reliable data storage
Deploy Docker Compose One-command deployment

πŸ› οΈ Quick Start (Under 5 Minutes)

Prerequisites

  1. Docker & Docker Compose - Get it here
  2. API Keys from:

Setup

  1. Clone:

    git clone https://github.com/OpenVoiceX/Voice-Marketing-Agent.git
    cd Voice-Marketing-Agent
  2. Configure .env:

    # Database
    DATABASE_URL=postgresql://user:password@db:5432/voicegenie_db
    
    # Gemini
    GEMINI_API_KEY=your_gemini_key
    GEMINI_MODEL=gemini-1.5-flash
    GEMINI_VOICE_MODEL=gemini-1.5-flash
    
    # Groq
    GROQ_API_KEY=your_groq_key
    GROQ_MODEL=llama-3.1-70b-versatile
    
    # LLM Provider (gemini or groq)
    LLM_PROVIDER=gemini
    
    # ElevenLabs
    ELEVENLABS_API_KEY=your_elevenlabs_key
    ELEVENLABS_VOICE_ID=21m00Tcm4TlvDq8ikWAM
    ELEVENLABS_MODEL_ID=eleven_monolingual_v1
    
    # Twilio
    TWILIO_ACCOUNT_SID=your_sid
    TWILIO_AUTH_TOKEN=your_token
    TWILIO_PHONE_NUMBER=your_number
    
    # App
    SECRET_KEY=your_secret_key
    AUDIO_DIR=/app/audio_files
    PUBLIC_URL=http://your-server:8000
  3. Launch:

    docker compose up --build -d
  4. Access:

    • Dashboard: http://localhost:3000
    • API Docs: http://localhost:8000/docs

🎯 Choosing Your LLM Provider

Gemini (LLM_PROVIDER=gemini)

  • Advanced reasoning & multimodal
  • ~100 tokens/sec
  • Free tier available

Groq (LLM_PROVIDER=groq)

  • Ultra-fast (up to 750 tokens/sec)
  • Perfect for real-time conversations
  • Free tier available

πŸ—οΈ System Architecture

Voice Marketing Agents Architecture Diagram

The platform is designed as a set of coordinated microservices, orchestrated by Docker Compose. This modular architecture allows for scalability, maintainability, and clear separation of concerns.

The Life of a Single Conversational Turn

  1. Telephony Gateway (External): A VoIP service handles the actual phone call connection. When it's the AI's turn to speak or listen, the VoIP server makes a webhook call to our backend.

  2. Audio Ingestion: The VoIP server sends the user's speech as a .wav file in a multipart/form-data request to the /webhook endpoint of our FastAPI Backend.

  3. STT Micro-Task (Speech-to-Text):

    • The backend receives the audio file.
    • It calls the STTService, which is powered by Google Gemini Voice API.
    • The API transcribes the audio to text in a few hundred milliseconds.
  4. LLM Micro-Task (Reasoning & Response Generation):

    • The transcribed text is passed to the LLMService.
    • This service constructs a prompt and sends it to either Gemini or Groq.
    • The LLM generates the text for the agent's response.
  5. TTS Micro-Task (Text-to-Speech):

    • The LLM's text response is sent to the TTSService.
    • ElevenLabs synthesizes this text into high-quality audio.
    • The resulting audio is saved as a temporary file.
  6. Webhook Response: The FastAPI backend responds to the initial webhook request from the Telephony Gateway, providing a URL to the newly generated audio file. The gateway then plays this audio to the user over the phone.

This entire end-to-end process is optimized to complete in under 2 seconds, which is crucial for maintaining a natural conversational rhythm.


πŸ’– Contributing

We love contributions! Check our open issues and see the Contribution Guide.


πŸ—ΊοΈ Roadmap

  • Visual call flow builder
  • Campaign management UI
  • Multi-language support
  • Voice cloning
  • CRM integrations
  • Kubernetes deployment
  • Analytics dashboard

🌟 Contributors

Thanks to these wonderful people:


πŸ“œ License

MIT License - See LICENSE file.


Built with ❀️ and powered by ☁️ cloud AI for GSSoC'25

Let's democratize voice AI! πŸš€

About

An open-source platform for building and deploying real-time, low-latency AI voice agents for call automation for marketing.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 6