Voice-Marketing-Agent

This project is now OFFICIALLY accepted for:

🎉 Participating in GSSOC'25 & Hacktoberfest 2025! 🎉

Voice Marketing Agents 🤖

An open-source framework to build and deploy intelligent AI agents that can handle real-world phone calls using cutting-edge cloud APIs.

🚀 Get Started · 🐛 Report a Bug · ✨ Request a Feature

🌟 Stars	🍴 Forks	🐛 Issues	🔔 Open PRs	🔕 Closed PRs	⏱️ Last Commit	🛠️ Languages	👥 Contributors

🌟 The Mission: Cloud-Powered Voice AI

Voice Marketing Agents leverages the power of Google Gemini, Groq, and ElevenLabs to deliver production-ready voice AI capabilities. No local models, no GPU infrastructure - just powerful cloud APIs.

🔥 Core Features

Lightning-Fast Responses: Groq's ultra-fast inference + ElevenLabs' low-latency TTS = natural conversations
Cloud-Powered AI: Gemini for intelligence, Groq for speed, ElevenLabs for studio-quality voice
Developer-First: Fully containerized with Docker - one command to start everything
Simple Management UI: Clean React dashboard for agent configuration
Extensible: Built with modern tech stack for easy customization
No Infrastructure Hassle: Everything via cloud APIs - no model management needed

🚀 The Tech Stack

Component	Technology	Why
Frontend	React & Vite	Fast, modern UI development
Backend	Python & FastAPI	Async performance for AI tasks
STT	Google Gemini Voice API	High-accuracy speech recognition
LLM	Gemini & Groq	Smart + Fast conversation engine
TTS	ElevenLabs	Studio-quality voice synthesis
Database	PostgreSQL	Reliable data storage
Deploy	Docker Compose	One-command deployment

🛠️ Quick Start (Under 5 Minutes)

Prerequisites

Docker & Docker Compose - Get it here
API Keys from:
- Google Gemini
- Groq
- ElevenLabs
- Twilio

Setup

Clone:

git clone https://github.com/OpenVoiceX/Voice-Marketing-Agent.git
cd Voice-Marketing-Agent

Configure .env:

# Database
DATABASE_URL=postgresql://user:password@db:5432/voicegenie_db

# Gemini
GEMINI_API_KEY=your_gemini_key
GEMINI_MODEL=gemini-1.5-flash
GEMINI_VOICE_MODEL=gemini-1.5-flash

# Groq
GROQ_API_KEY=your_groq_key
GROQ_MODEL=llama-3.1-70b-versatile

# LLM Provider (gemini or groq)
LLM_PROVIDER=gemini

# ElevenLabs
ELEVENLABS_API_KEY=your_elevenlabs_key
ELEVENLABS_VOICE_ID=21m00Tcm4TlvDq8ikWAM
ELEVENLABS_MODEL_ID=eleven_monolingual_v1

# Twilio
TWILIO_ACCOUNT_SID=your_sid
TWILIO_AUTH_TOKEN=your_token
TWILIO_PHONE_NUMBER=your_number

# App
SECRET_KEY=your_secret_key
AUDIO_DIR=/app/audio_files
PUBLIC_URL=http://your-server:8000

Launch:
```
docker compose up --build -d
```
Access:
- Dashboard: http://localhost:3000
- API Docs: http://localhost:8000/docs

🎯 Choosing Your LLM Provider

Gemini (LLM_PROVIDER=gemini)

Advanced reasoning & multimodal
~100 tokens/sec
Free tier available

Groq (LLM_PROVIDER=groq)

Ultra-fast (up to 750 tokens/sec)
Perfect for real-time conversations
Free tier available

🏗️ System Architecture

The platform is designed as a set of coordinated microservices, orchestrated by Docker Compose. This modular architecture allows for scalability, maintainability, and clear separation of concerns.

The Life of a Single Conversational Turn

Telephony Gateway (External): A VoIP service handles the actual phone call connection. When it's the AI's turn to speak or listen, the VoIP server makes a webhook call to our backend.
Audio Ingestion: The VoIP server sends the user's speech as a .wav file in a multipart/form-data request to the /webhook endpoint of our FastAPI Backend.
STT Micro-Task (Speech-to-Text):
- The backend receives the audio file.
- It calls the STTService, which is powered by Google Gemini Voice API.
- The API transcribes the audio to text in a few hundred milliseconds.
LLM Micro-Task (Reasoning & Response Generation):
- The transcribed text is passed to the LLMService.
- This service constructs a prompt and sends it to either Gemini or Groq.
- The LLM generates the text for the agent's response.
TTS Micro-Task (Text-to-Speech):
- The LLM's text response is sent to the TTSService.
- ElevenLabs synthesizes this text into high-quality audio.
- The resulting audio is saved as a temporary file.
Webhook Response: The FastAPI backend responds to the initial webhook request from the Telephony Gateway, providing a URL to the newly generated audio file. The gateway then plays this audio to the user over the phone.

This entire end-to-end process is optimized to complete in under 2 seconds, which is crucial for maintaining a natural conversational rhythm.

💖 Contributing

We love contributions! Check our open issues and see the Contribution Guide.

🗺️ Roadmap

🌟 Contributors

Thanks to these wonderful people:

📜 License

MIT License - See LICENSE file.

Built with ❤️ and powered by ☁️ cloud AI for GSSoC'25

Let's democratize voice AI! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github		.github
backend		backend
docs		docs
frontend		frontend
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
app.yml		app.yml
contacts.csv		contacts.csv
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice-Marketing-Agent

Voice Marketing Agents 🤖

🌟 The Mission: Cloud-Powered Voice AI

🔥 Core Features

🚀 The Tech Stack

🛠️ Quick Start (Under 5 Minutes)

Prerequisites

Setup

🎯 Choosing Your LLM Provider

🏗️ System Architecture

The Life of a Single Conversational Turn

💖 Contributing

🗺️ Roadmap

🌟 Contributors

📜 License

About

Uh oh!

Releases

Packages

Contributors 6

Uh oh!

Languages

License

OpenVoiceX/Voice-Marketing-Agent

Folders and files

Latest commit

History

Repository files navigation

Voice-Marketing-Agent

Voice Marketing Agents 🤖

🌟 The Mission: Cloud-Powered Voice AI

🔥 Core Features

🚀 The Tech Stack

🛠️ Quick Start (Under 5 Minutes)

Prerequisites

Setup

🎯 Choosing Your LLM Provider

🏗️ System Architecture

The Life of a Single Conversational Turn

💖 Contributing

🗺️ Roadmap

🌟 Contributors

📜 License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Languages

Packages