Skip to content

MarkZakelj/text_to_audio_alignment

Repository files navigation

Text to audio alignment

Env creation

  • use apt-get update && apt-get install -y libsndfile1 ffmpeg (nemo dependencies)
  • create conda env: conda create --file conda_env.yml
  • within new env, install pytorch 1.8.1 LTS (CUDA 10 or CPU version), from pytorch downloads
  • download and build kenlm binaries from github
  • Install kenlm python library with pip install https://github.com/kpu/kenlm/archive/master.zip

Usage

  • set config file config.txt
  • run main_segment_and_asr.py to generate segments and model output (CTC)
  • run main.py to decode CTC, align and eval alltogether OR: run main_decode.py then main_align.py then eval.py

File structure

audio_samples\  
+-- <sample_name>\  
|   +-- main_text.txt (required)  
|    +-- main_audio.wav (required)  
|    +-- alignment_reference.json (not required for usage, only for evaluation)  
|    +-- results\ (auto generated)  
|     +-- <model_name>\  
|        +-- output.npy  
|        +-- <ctc_mode>\
|          +-- transcript.txt     
|          +-- char_ms.txt  
|          +-- aligned_phrases.txt  
|          +-- aligned_words.txt  
|          +-- eval_results.txt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published