You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a working application with real-time transcription feature based on faster-whisper.
However, after applying diart pipeline to my existing application, I get transcription with no diarization.
I expect the output of this audio to be as follows:
Expected output
Speaker 0:
Hi. It's Pat. Can I help you?
Speaker 1:
Well, not really.
Speaker 0:
Okay. And what Is this Brandy?
Speaker 1:
Just say there's somebody on the line that needs help?
Speaker 0:
No. Is this Brandy?
Speaker 1:
Yeah?
Speaker 0:
Yeah. Hi. It's Pat.
Actual output:
hi it's pat can i help you uh well not really okay just say there's somebody on the line that needs help no is this brandy yeah yeah hi it's pat
It looks like the diart is not working as expected with faster-whisper, resulting in the output not being properly labeled with speaker information.
Can anybody confirm if this is the case?
The text was updated successfully, but these errors were encountered:
Hi @RustX2802, faster-whisper is not supported yet, I'm assuming you implemented it manually? Could you share the part of the code where you align the transcription and diarization?
Hi @RustX2802, faster-whisper is not supported yet, I'm assuming you implemented it manually? Could you share the part of the code where you align the transcription and diarization?
I have done it succesfully. It is not complicated, since the ASR part is quite indepedent. But I did not do it on the top of the existing blocks.SpeechRecognition class. To make it easy, I have added another new parallel class like FasterWhisperBatchedPlpExt .
I have a working application with real-time transcription feature based on faster-whisper.
However, after applying diart pipeline to my existing application, I get transcription with no diarization.
I expect the output of this audio to be as follows:
Expected output
Speaker 0:
Hi. It's Pat. Can I help you?
Speaker 1:
Well, not really.
Speaker 0:
Okay. And what Is this Brandy?
Speaker 1:
Just say there's somebody on the line that needs help?
Speaker 0:
No. Is this Brandy?
Speaker 1:
Yeah?
Speaker 0:
Yeah. Hi. It's Pat.
Actual output:
It looks like the diart is not working as expected with faster-whisper, resulting in the output not being properly labeled with speaker information.
Can anybody confirm if this is the case?
The text was updated successfully, but these errors were encountered: