Wav2Vec2.0 pretraining docker

pretraining wav2vec docker for sagemaker. This docker is written with the assumption that it will be run by aws sagemaker.

Required resources

Unlabeled data (audios without transcriptions) of your own language is required. A good amount of unlabeled audios (e.g. 500 hours) will significantly reduce the amount of labeled data needed, and also boost up the model performance. Youtube/Podcast is a great place to collect the data for your own language. Prepare an s3 bucket with the audio data in it.

Install instruction

You can check pretrain.ipynb.

Wandb setup

Set WANDB_API_KEY in line 72 of Dockerfile.
And set wandb project name of wandb_project in wav2vec2_base_librispeech.yaml

Upload docker to your ECS

Before upload docker, you have to setup aws cli. Please check here: https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html

After you install aws cli, run aws configure. The region of the ecs where the docker will be uploaded must be the same region as the bucket where the dataset will be prepared.

You can upload docker to run build_and_push.sh. The first parameter of the shell script is the docker image name.

sh build_and_push.sh wav2vec2-pretrain

Define IAM role

from sagemaker import get_execution_role

role = get_execution_role()

Dataset

For example, we will have an s3 bucket with the following structure. There is no specification for naming the folders or wav files.

s3_backet
└── data
     ├── xxxx.wav
     ├── xxxx.wav
     ....

Define the path of the s3 bucket you prepared.

data_location = 's3://{backetname}/{audio_foldername}'

Create the session

import sagemaker as sage
from time import gmtime, strftime

sess = sage.Session()
sess.default_bucket()

Create an estimator and fit the model

import boto3

account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/wav2vec2-pretrain:latest'.format(account, region)

# https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html
model = sage.estimator.Estimator(image,
                       role, 1, 'ml.p3.16xlarge',
                       volume_size=1000,
                       output_path="s3://{}/output".format(sess.default_bucket()),
                       checkpoint_s3_uri="s3://{}/checkpoints".format(sess.default_bucket()),
                       checkpoint_local_path="/opt/ml/checkpoints/",
                       use_spot_instances=True,
                       max_run=320000,
                       max_wait=400000,
                       sagemaker_session=sess)

Run train!

model.fit(data_location)

How to convert huggingface model

You can check convert_huggingface_model.ipynb.

Clone huggingface transformers.

git clone https://github.com/huggingface/transformers.git
cd transformers/src/transformers/models/wav2vec2/convert_wav2vec2_original_pytorch_checkpoint_to_pytorch.py

Install datsasets of huggingface.

pip install datasets -U

Run converting script.

python convert_wav2vec2_original_pytorch_checkpoint_to_pytorch.py --pytorch_dump_folder_path ./output --checkpoint_path ./finetuning/wav2vec_small_960h.pt --dict_path ./dict

Reference:

Paper: wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations: https://arxiv.org/abs/2006.11477
Source code: https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md

self-supervised-speech-recognition: https://github.com/mailong25/self-supervised-speech-recognition

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
container		container
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
convert_huggingface_model.ipynb		convert_huggingface_model.ipynb
pretrain.ipynb		pretrain.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Wav2Vec2.0 pretraining docker

Required resources

Install instruction

Wandb setup

Upload docker to your ECS

Define IAM role

Dataset

Create the session

Create an estimator and fit the model

How to convert huggingface model

Reference:

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AIdeaLab/wav2vec2_docker

Folders and files

Latest commit

History

Repository files navigation

Wav2Vec2.0 pretraining docker

Required resources

Install instruction

Wandb setup

Upload docker to your ECS

Define IAM role

Dataset

Create the session

Create an estimator and fit the model

How to convert huggingface model

Reference:

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages