Skip to content

seqlrn/6-transformers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Transformers

BERT

To get you warmed up and familiar with some of the libararies, we start out easy with a BERT tutorial from J. Alammar. The tutorial builds a simple sentiment analysis model based on pretrained BERT models with the HuggingFace Library. It will get you familiarized with the libary and make the next exercise a bit easier. The tutorial has nice graphics and visualizations and will increase your general understanding of transformers and especially the BERT model even more.

Link to the tutorial

wav2vec 2.0 for keyword recognition

After the warm-up with BERT, this exercise is a bit more advanced and you will be mostly on your own.

The task in this exercise is to build a keyword recognition system based on wav2vec 2.0. There are a couple of options you will have to think about and decide which implementation path you want to follow.

For this exercise please use the speech-commands-dataset from google to train and evaluate your keyword recognition systems. The data can also be obtained using the HuggingFace api or you can use torchaudio.

There are a couple of options, that will lead to differnt performance on this problem. They vary in complexity as well as performance. You should be able to reason the desing and implementation choices you made.

Choose one the options that suits you best or the one that you think might yield the best performance.

  1. What model will you use? BASE vs. LARGE and what pretrained weights ASR vs BASE, XLSR53 vs ENGLISH?
  2. HuggingFace or torchaudio.pipelines?
  3. Use a simple neural classification head?
  4. Extract features and use them with some downstream classifier (e.g. SVM, Naive Bayes etc.)
    1. what pooling strategy will you use? (mean/ statistical ...)
    2. convolutional head?
    3. RNN?
    4. Should you use a dimeninsionality reduction method?
  5. Or use CTC loss and a greedy decoder? (closed vocab!)

What implementation do you think would work best in a real-world scenario?

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors