VocalFlow 🎙️

Empowering Everyone to Build with Voice

This is technology as it should be: intuitive, accessible, and empowering. Your voice is your code, your ideas are your blueprint, and your imagination is the only limit.

A voice-first AI assistant that helps users design and build web applications through natural conversation. Simply describe your app idea by speaking, and watch as the AI generates a complete, professionally designed web application with real-time visual progress feedback.

🌟 Latest Enhancements

🧠 Enhanced AI Interaction

Problem-Focused AI: The AI now acts as an innovative problem-solving partner that challenges your assumptions and pushes for breakthrough solutions
Provocative Questions: AI asks challenging questions like "What assumption is everyone making that you could prove wrong?" and "What would make this 10x better than existing solutions?"
Faster Conversations: Reduced from 7-8 exchanges to just 3-4 focused exchanges
Smart Responsiveness: AI detects readiness signals like "let's build this" and "I'm ready" to move forward quickly
No Repetition: AI never repeats questions, keeping conversations fresh and efficient

📊 Visual Progress System

Real-Time Progress Bar: Elegant oval progress indicator (200px x 20px) positioned next to the VocalFlow branding
Dynamic Status Words: Rotating status words that change every 1.5 seconds per phase:
- Ideation: "Discovering" → "Exploring" → "Analyzing" → "Investigating"
- Prompt Review: "Designing" → "Planning" → "Structuring" → "Crafting"
- Code Generation: "Building" → "Creating" → "Generating" → "Coding"
- Voice Editing: "Refining" → "Polishing" → "Enhancing" → "Perfecting"
Accelerated Progress: 10x message multiplier with 30% maximum progress within each phase
Always Visible: Minimum 10% progress shown, floating below the header for constant visibility

🔒 Advanced YAML Handling

Intelligent YAML Processing: System processes YAML specifications silently in the background
Clean User Experience: YAML content is completely hidden from users while still being processed by the system
Aggressive Filtering: Advanced detection patterns block any YAML content from being displayed or spoken
Seamless Flow: Users experience smooth transitions without technical interruptions

🛠️ Complete Technical Stack

Core Technologies

Runtime: Node.js 18+ (JavaScript/TypeScript execution environment)
Frontend Framework: Next.js 14 (React-based full-stack framework)
UI Library: React 18 (Component-based user interface)
Language: TypeScript (Type-safe JavaScript)
Styling: Tailwind CSS (Utility-first CSS framework)
Icons: Lucide React (Modern icon library)

AI & Voice Processing

Language Model: OpenAI GPT-5 (Primary for conversation and code generation)
Speech-to-Text: Deepgram Nova-3 (Real-time speech recognition)
Text-to-Speech: Deepgram Aura-2 (Natural voice synthesis)
Audio Processing: Web Audio API (Browser-based audio handling)
Sample Rate: 24kHz (High-quality audio streaming)

Backend & Communication

WebSocket Server: ws (Real-time bidirectional communication)
HTTP Server: Node.js built-in (Static file serving)
Code Generation: OpenAI API (Dynamic application creation via GPT-5)
Local Preview: Child process spawning (Development server management)

Development Tools

Build System: Next.js SWC (Fast TypeScript/JavaScript compiler)
Linting: ESLint (Code quality and consistency)
CSS Processing: PostCSS + Autoprefixer (CSS optimization)
Package Manager: npm (Dependency management)

Critical OS Packages & Dependencies

System Requirements:

Operating System: macOS, Linux, or Windows 10+
Node.js: Version 18.0.0 or higher
npm: Version 8.0.0 or higher (comes with Node.js)
Memory: Minimum 4GB RAM (8GB recommended)
Storage: 500MB free space for project files

Browser Requirements:

Modern Browser: Chrome 88+, Firefox 85+, Safari 14+, Edge 88+
Microphone Access: Required for voice input
JavaScript: Must be enabled
WebSocket Support: Required for real-time communication

Network Requirements:

Internet Connection: Required for AI API calls
Firewall: Allow connections to OpenAI and Deepgram APIs
Ports: 3000, 3001, and dynamic ports 4000+ for generated apps

🔄 Enhanced Workflow

Phase 1: Voice Ideation (Human ↔ AI Conversation)

User speaks → Deepgram STT → GPT-5 Processing → Deepgram TTS → User hears response

Audio Capture: Browser captures microphone input at 24kHz
Speech Recognition: Deepgram converts speech to text in real-time
AI Processing: GPT-5 challenges assumptions and drives innovation
Speech Synthesis: Deepgram converts AI responses back to speech
Accelerated Flow: 3-4 focused exchanges with smart readiness detection

Phase 2: Specification Generation (AI → Silent YAML)

Conversation Context → GPT-5 Analysis → Silent YAML Generation → Seamless Transition

Context Analysis: AI reviews entire conversation history
Requirement Extraction: Identifies key features, users, and technical needs
Silent YAML Creation: Generates structured specification document in background
Seamless Transition: Moves directly to code generation without user interruption

Phase 3: Code Generation (AI → Full Application)

YAML Specification → OpenAI GPT-5 Code Generation → File System Creation → Local Preview

Specification Processing: AI analyzes YAML requirements silently
Architecture Planning: Determines optimal file structure and components
Code Generation: Creates complete Next.js application with TypeScript
Visual Progress: Real-time progress bar with rotating status words
File System Setup: Writes all files to local directory structure
Development Server: Spawns local preview server for immediate testing

Phase 4: Preview & Iteration (Application → User)

Generated App → Local Server → Browser Preview → User Feedback → Refinements

Server Startup: Launches Next.js development server
Live Preview: Opens generated application in browser
Real-time Updates: Hot reloading for any changes
User Testing: Full interaction with generated application
Voice Refinements: Natural language modifications and improvements

📊 Enhanced System Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Browser UI    │    │  Voice Agent    │    │   AI Services   │
│ (React/Next.js) │◄──►│  (Node.js/WS)   │◄──►│ (OpenAI/Deepgram)│
│  Progress Bar   │    │  YAML Filter    │    │  GPT-5          │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│  Audio Stream   │    │  Code Generator │    │  Generated App  │
│ (WebAudio API)  │    │   (OpenAI API)  │    │  (Next.js App)  │
│ Status Updates  │    │ Progress Events │    │  Live Preview   │
└─────────────────┘    └─────────────────┘    └─────────────────┘

✨ Features

🗣️ Voice-First Interface: Speak naturally to describe your app idea
🧠 Enhanced AI: GPT-5 conversation and code generation
📊 Visual Progress: Real-time progress bar with dynamic status updates
🤖 AI-Powered Generation: Uses OpenAI GPT-5 for understanding and code generation (Claude optional)
🎨 Beautiful Design: Creates professionally designed applications, not plain templates
⚡ Real-Time Preview: See your app come to life instantly with live preview
📱 Responsive Design: Generated apps work perfectly on all devices
🔧 Full-Stack: Generates complete Next.js applications with TypeScript
🔄 Seamless Flow: Intelligent YAML processing without user interruption
📥 Code Download: Download your generated applications as ZIP files

🚀 How It Works

VocalFlow follows an enhanced 4-phase workflow:

💡 Ideation Phase: Have a focused conversation with problem-solving AI (3-4 exchanges)
📝 Silent Processing: AI processes specifications in the background seamlessly
⚡ Code Generation: Watch real-time progress as AI generates your complete application
🎙️ Voice Refinement: Make natural language improvements and refinements

🛠️ Tech Stack

Frontend: Next.js 14, React 18, TypeScript, Tailwind CSS
Backend: Node.js, WebSocket (ws)
AI: OpenAI GPT-5 (Primary), Deepgram (STT & TTS), Anthropic Claude (optional)
Voice Processing: Real-time audio streaming with 24kHz sampling
Progress System: Custom React components with WebSocket event streaming

📋 Prerequisites

Node.js 18+
npm or yarn
OpenAI API key (GPT-5 access)
Deepgram API key
(Optional) Anthropic API key (for Claude fallback)

🔧 Setup

Clone the repository

git clone <your-repo-url>
cd VoiceCreation

Install dependencies
```
npm install
```

Set up environment variables Create a .env file in the root directory:

OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-5
OPENAI_CODEGEN_MODEL=gpt-5
DEEPGRAM_API_KEY=your_deepgram_api_key_here
# Optional, only if using Claude for codegen fallback
ANTHROPIC_API_KEY=your_anthropic_api_key_here

Start the development server
```
npm run dev
```
This will start:
- Voice agent server on http://localhost:3000
- Frontend interface on http://localhost:3001
Enable Code Download Feature (Optional)
```
npm install archiver @types/archiver
```
This adds the ability to download your generated applications as ZIP files for local development or deployment.

🎯 Usage

Open your browser and navigate to http://localhost:3001
Click "Start Recording" to begin voice interaction
Describe your app idea naturally
Signal readiness by saying "Let's build this!"
Watch real-time progress as the AI generates your complete application

📁 Project Structure

VoiceCreation/
├── agents/                 # AI agents for different phases
│   ├── ideation.ts        # Enhanced voice conversation agent with GPT-5
│   └── codeGen.ts         # OpenAI (GPT-5) code generation orchestrator
├── pages/                 # Next.js frontend pages
│   ├── _app.tsx          # App wrapper with global styles
│   └── index.tsx         # Main interface with visual progress system
├── utils/                 # Utility functions
│   ├── claudeCodegen.ts  # Claude code generation (optional)
│   ├── openaiCodegen.ts  # OpenAI GPT-5 code generation (default)
│   └── localPreview.ts   # Local development server
├── test/                  # Test files
│   └── testCodeGen.ts    # Code generation tests
├── generated/             # Generated project files (gitignored)
├── styles/               # Global styles
│   └── globals.css       # Tailwind CSS with progress bar styles
└── package.json          # Project dependencies

🧪 Testing

Run code generation tests:

npm run test:codegen

Run with custom YAML:

npm run test:codegen:custom

🎨 Generated App Features

Every generated application includes:

🏠 Beautiful Landing Page: Professional hero section, features, testimonials
📱 Responsive Design: Mobile-first approach with modern UI
🎯 Conversion Focused: Clear CTAs and user journey
⚡ Modern Tech Stack: Next.js 14, TypeScript, Tailwind CSS
🔧 Ready to Deploy: Complete with package.json and config files
🎭 Custom Design: Tailored to your app's target audience and purpose

🔧 Available Scripts

npm run dev - Start both agent and frontend in development mode
npm run agent - Start only the voice agent server
npm run frontend - Start only the frontend development server
npm run build - Build the TypeScript project
npm run start - Start the production server
npm run test:codegen - Test code generation functionality

🌟 Example Generated Apps

The system can generate various types of applications:

📚 Educational Platforms: Learning management systems, tutoring marketplaces
💼 Business Tools: CRM systems, project management, analytics dashboards
🛒 E-commerce: Online stores, marketplaces, booking systems
🎮 Entertainment: Gaming platforms, social apps, content creators
🏥 Healthcare: Appointment booking, health tracking, telemedicine
💰 Fintech: Payment systems, expense trackers, investment platforms

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Troubleshooting

Voice not working?

Check microphone permissions in your browser
Ensure you're using HTTPS or localhost
Verify Deepgram API key is set correctly

AI not challenging assumptions?

Ensure OpenAI API key is valid and has GPT-5 access
Check that the enhanced system prompts are loaded correctly
Verify the ideation agent is using the GPT-5 configuration

Progress bar not updating?

Check WebSocket connection in browser developer tools
Verify progress events are being sent from the backend
Ensure the progress multiplier is configured correctly (10x)

Code generation failing?

Verify OpenAI API key is valid and has sufficient credits
Check console logs for detailed error messages
Ensure all dependencies are installed
Verify YAML processing is working in the background

Preview not loading?

Check if port 4000+ is available for generated apps
Look for build errors in the generation logs
Verify the generated package.json has correct dependencies

🔄 Alternative Setup: Using Claude Codegen

By default, VocalFlow uses OpenAI GPT-5 for code generation. If you prefer Anthropic Claude instead, switch as follows:

Prerequisites for Claude Setup

Anthropic API key with Claude Sonnet access

Environment Variables

Add to your .env file:

ANTHROPIC_API_KEY=your_anthropic_api_key_here

Code Generation Changes

Change the import and call in agents/codeGen.ts:

// Change from:
import { runOpenAICodegen } from '../utils/openaiCodegen';

// To:
import { runClaudeCodegen } from '../utils/claudeCodegen';

Then update the function call:

// Change from:
const result = await runOpenAICodegen(yamlPrompt, sessionId, events);

// To:
const result = await runClaudeCodegen(yamlPrompt, sessionId, events);

Performance Notes

GPT-5 offers unified model usage for both voice and codegen
Claude Sonnet remains a solid alternative depending on preferences and cost

VocalFlow - Built with ❤️ using AI and voice technology

Transforming ideas into reality, one voice at a time.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.cursor		.cursor
agents		agents
pages		pages
styles		styles
test		test
utils		utils
.gitignore		.gitignore
README.md		README.md
next-env.d.ts		next-env.d.ts
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

deepgram-devs/GPT5-Agent

Folders and files

Latest commit

History

Repository files navigation