Skip to content

Conversation

@woziii
Copy link

@woziii woziii commented Nov 6, 2024

  • Replace STT with LightningWhisperMLX Medium for Apple Silicon
  • Switch LLM to Llama-3.2-3B-Instruct-8bit MLX format
  • Update TTS to Melo TTS
  • Keep Silero VAD for voice detection

Performance:

  • Optimize latency (~4s end-to-end)
  • Focus on French-to-English translation
  • Add video call platforms support (Teams, Zoom, FaceTime)
  • Test & validate on M2 chip with 22GB RAM

Changes:

  • Modify system prompt for translation tasks
  • Remove CUDA components
  • Streamline pipeline for Apple Silicon
  • Add real-time processing optimizations

Tested on MacBook Air M2, compatible with major video call platforms except Google Meet.

Based on original speech-to-speech project, inspired by Andrés Marafioti's work.

- Replace STT with LightningWhisperMLX Medium for Apple Silicon
- Switch LLM to Llama-3.2-3B-Instruct-8bit MLX format
- Update TTS to Melo TTS
- Keep Silero VAD for voice detection

Performance:
- Optimize latency (~4s end-to-end)
- Focus on French-to-English translation
- Add video call platforms support (Teams, Zoom, FaceTime)
- Test & validate on M2 chip with 22GB RAM

Changes:
- Modify system prompt for translation tasks
- Remove CUDA components
- Streamline pipeline for Apple Silicon
- Add real-time processing optimizations

Tested on MacBook Air M2, compatible with major video call platforms except Google Meet.

Based on original speech-to-speech project, inspired by Andrés Marafioti's work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants