Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting Audio and Incomplete Words #15

Open
Acquil opened this issue Oct 1, 2020 · 0 comments
Open

Splitting Audio and Incomplete Words #15

Acquil opened this issue Oct 1, 2020 · 0 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@Acquil
Copy link
Owner

Acquil commented Oct 1, 2020

Context

As per #14 and #7, larger audio files (>4 min) are split into segments depending on the argument min_per_split (minute per split). This is required for faster transcription.

Problem

  • There is a high chance of words getting cut-off midway when the audio is being segmented.
  • audiosplitter.split_silence(min_per_split) can be used to circumvent this, however it leads to other problems:
    1. All pauses are removed.
    2. The transcripts, while being more accurate, now fail to match the timestamps of the source video/audio.

Audio files should be segmented without modifying the duration or dropping words mid-way.

@Acquil Acquil added bug Something isn't working help wanted Extra attention is needed labels Oct 1, 2020
@Acquil Acquil added this to the Prototype version 1 milestone Oct 1, 2020
@Acquil Acquil pinned this issue Oct 1, 2020
@Acquil Acquil removed this from the Prototype version 1 milestone Oct 23, 2020
@Acquil Acquil unpinned this issue Nov 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants