This is an experiment in combining, speech to text with tex to speech and language generation.
You are super welcome to help out. <3
After cloning the repo you need the react server, python server and text-to-speech docker server to all be running. Follow the steps for each to get it working on your machine.
npm install
npm start
export NODE_OPTIONS=--openssl-legacy-provider
cd python
bash run_local.sh
This spins up a local server running GPT2 hosted on FastApi.
https://github.com/synesthesiam/docker-mozillatts
docker run -it -p 5002:5002 synesthesiam/mozillatts
https://www.npmjs.com/package/react-speech-recognition
import React from 'react'
import SpeechRecognition, { useSpeechRecognition } from 'react-speech-recognition'
import { playAudio } from './App.js'
const Dictaphone = () => {
const { transcript, resetTranscript } = useSpeechRecognition()
if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
return null
}
return (
<div>
<button onClick={SpeechRecognition.startListening}>Start</button>
<button onClick={SpeechRecognition.stopListening}>Stop</button>
<button onClick={resetTranscript}>Reset</button>
<button onClick={() => playAudio(transcript)}>Transcribe</button>
<p>{transcript}</p>
</div>
)
}
export default Dictaphone
Running inside react
export async function playAudio(text) {
var audio = new Audio(`http://localhost:5002/api/tts?text=${encodeURIComponent(text)}`);
audio.type = 'audio/wav';
try {
await audio.play();
console.log('Playing...');
} catch (err) {
console.log('Failed to play...' + err);
}
}