A real-time audio transcription application built with FastAPI and WebSockets. It captures audio from your microphone, transcribes it using the Whisper model, and provides analysis capabilities using GPT-3.5.
- Real-time audio transcription in multiple languages
- Modern, responsive UI with Dark/Light theme support
- Transcription history management
- Download transcription history as text file
- Clear history
- AI-powered analysis of transcription history using GPT-3.5
- Support for 12 languages:
- English, Spanish, French, German
- Italian, Portuguese, Dutch, Polish
- Russian, Japanese, Korean, Chinese
- WebSocket-based communication for low-latency interaction
- Clean, modular codebase with separated concerns
- Python 3.8+
- CUDA-capable GPU (recommended for optimal performance)
- OpenAI API key for analysis features
- Clone the repository:
git clone https://github.com/zhaxal/live-transcription.git
cd live-transcription- Create a virtual environment and activate it:
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`- Install the required dependencies:
pip install -r requirements.txt- Set up your OpenAI API key:
export OPENAI_API_KEY='your-api-key-here' # Linux/Mac
set OPENAI_API_KEY=your-api-key-here # Windows- Start the FastAPI server:
uvicorn app:app --reload-
Open your browser and navigate to
http://localhost:8000 -
Select your desired language from the dropdown menu
-
Click "Start Transcription" to begin capturing and transcribing audio
└── ./
├── static/
│ ├── css/
│ │ └── styles.css # Application styling
│ ├── js/
│ │ └── app.js # Frontend JavaScript
│ └── index.html # Main HTML template
├── app.py # FastAPI application
├── config.py # Logging configuration
├── llm.py # OpenAI client initialization
├── models.py # Whisper model setup
└── requirements.txt # Project dependencies
- Semantic HTML5 structure
- Clean organization of UI components
- External CSS and JavaScript imports
- CSS variables for theme management
- Responsive design
- Dark/Light theme support
- Modern UI components styling
- Modular code organization
- WebSocket audio streaming
- Real-time UI updates
- Theme management
- History management
- Error handling
- WebSocket endpoint for audio streaming
- Static file serving
- Analysis endpoint using GPT-3.5
- Error handling and logging
- Whisper model initialization
- CUDA acceleration
- Optimized transcription settings
- GPT-3.5 client setup
- Analysis functionality
- Sample Rate: 16kHz
- Buffer Size: 4096 samples
- Chunk Size: 3 seconds
- Format: 16-bit PCM
- Binary audio data streaming
- JSON response format
- Automatic reconnection handling
- Error recovery
- Model: Whisper Medium
- Compute Type: float16
- Device: CUDA (when available)
- VAD Filter: Enabled (500ms silence threshold)
# Install development dependencies
pip install -r requirements.txt
# Start the development server
uvicorn app:app --reload --port 8000- Follow PEP 8 for Python code
- Use ES6+ features for JavaScript
- Maintain consistent indentation (2 spaces)
- Use meaningful variable and function names
The application includes comprehensive error handling for:
- WebSocket connection issues
- Audio capture problems
- Transcription failures
- Analysis errors
- Browser compatibility issues
Tested and supported on:
- Chrome 80+
- Firefox 75+
- Safari 13+
- Edge 80+
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License.