Skip to content

Jasonwill2004/Speech-to-Text

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎯 Speech-to-Text Search Implementation

🌟 Overview

A real-time speech recognition and search suggestion system built with FastAPI, OpenAI Whisper, and WebSocket support. Deployed on AWS ECS for scalability and reliability.

🚀 Key Features

  • Real-time speech recognition using OpenAI Whisper
  • WebSocket support for continuous audio streaming
  • Smart search suggestions with AI-based ranking
  • Noise-resilient audio processing
  • Containerized deployment on AWS ECS
  • Auto-scaling and high availability

💡 Technical Choices & Trade-offs

1. Speech Recognition: Whisper vs DeepSpeech

  • Chose Whisper because:
    • Better accuracy on noisy inputs
    • Smaller model size (tiny model: 39M parameters)
    • Faster inference time
    • Multi-language support out of the box
  • Trade-offs:
    • DeepSpeech offers better offline support
    • Whisper requires more RAM (mitigated by using tiny model)

2. Data Storage: In-Memory vs Redis

  • Chose In-Memory Storage because:
    • Simpler deployment architecture
    • Sufficient for demonstration purposes
    • Lower latency for small datasets
  • Trade-offs:
    • Redis would be better for production scale
    • Missing persistence across container restarts

3. Deployment: AWS ECS vs Lambda

  • Chose ECS because:
    • WebSocket support required
    • Better for long-running connections
    • More cost-effective for continuous workloads
  • Trade-offs:
    • Lambda would be cheaper for sporadic usage
    • ECS requires more configuration

📊 Performance Metrics

  • Speech recognition accuracy: 95%
  • Average response time: <500ms
  • WebSocket latency: ~100ms
  • Memory usage: ~800MB

🎯 Task Completion Screenshots

Task 1: Speech Recognition API

Screenshot 2025-03-26 at 10 08 54 AM
  • Implemented FastAPI endpoint
  • Achieved 95% accuracy on clean audio
  • Response time under 500ms

Task 2: Noisy Audio Handling

Screenshot 2025-03-26 at 10 11 08 AM
  • Implemented noise reduction
  • Improved accuracy from 75% to 92% on noisy audio
  • Processing time: 800ms

Task3: Smart Search Autocomplete

Screenshot 2025-03-26 at 10 12 10 AM
  • Implemented AI-based ranking
  • Response time: 200ms
  • Top suggestions match user intent

Task 4: WebSocket Implementation

Screenshot 2025-03-26 at 10 20 33 AM
  • Real-time audio streaming
  • Continuous transcription
  • Dynamic suggestions

Video Explanation

Video explanation

🛠️ API Endpoints

# REST Endpoints
POST /api/voice-to-text
GET /api/autocomplete?q={query}

# WebSocket Endpoint
ws://speech-search-alb-607098999.eu-north-1.elb.amazonaws.com:8000/ws/speech-to-search

📦 Deployment

  • Region: eu-north-1 (Stockholm)
  • Container Registry: Amazon ECR
  • Compute: AWS ECS Fargate
  • Load Balancer: Application Load Balancer

🧪 Testing Instructions

# Health check
curl http://speech-search-alb-607098999.eu-north-1.elb.amazonaws.com:8000/health

# WebSocket test
wscat -c ws://speech-search-alb-607098999.eu-north-1.elb.amazonaws.com:8000/ws/speech-to-search

📈 Future Improvements

  1. Implement Redis for persistent storage
  2. Add user authentication
  3. Implement SSL/TLS for secure WebSocket
  4. Add custom domain and CDN
  5. Implement rate limiting

🔗 Resources

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.3%
  • Dockerfile 2.7%