Splitting Audio and Incomplete Words #15

Acquil · 2020-10-01T09:07:05Z

Context

As per #14 and #7, larger audio files (>4 min) are split into segments depending on the argument min_per_split (minute per split). This is required for faster transcription.

Problem

There is a high chance of words getting cut-off midway when the audio is being segmented.
audiosplitter.split_silence(min_per_split) can be used to circumvent this, however it leads to other problems:
1. All pauses are removed.
2. The transcripts, while being more accurate, now fail to match the timestamps of the source video/audio.

Audio files should be segmented without modifying the duration or dropping words mid-way.

Acquil added bug Something isn't working help wanted Extra attention is needed labels Oct 1, 2020

Acquil added this to the Prototype version 1 milestone Oct 1, 2020

Acquil assigned NAshwinKumar, harish-ganesh and sreedeepack Oct 1, 2020

Acquil pinned this issue Oct 1, 2020

Acquil removed this from the Prototype version 1 milestone Oct 23, 2020

Acquil unpinned this issue Nov 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Splitting Audio and Incomplete Words #15

Splitting Audio and Incomplete Words #15

Acquil commented Oct 1, 2020

Splitting Audio and Incomplete Words #15

Splitting Audio and Incomplete Words #15

Comments

Acquil commented Oct 1, 2020

Context

Problem