Skip to content

Commit f140ec0

Browse files
committed
[!165][RELEASE] Speech Recognition and Translation with ConfHyena (LREC-COLING 2024)
# Which work do we release? How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena (LREC-COLING 2024) # What changes does this release refer to? f63e461ac6cecbf1cad4b50e4919b13a61fcdbd2
1 parent d63681e commit f140ec0

File tree

2 files changed

+161
-0
lines changed

2 files changed

+161
-0
lines changed

README.md

+4
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,10 @@
33
This repository contains the open source code by the MT unit of FBK.
44
Dedicated README for each work can be found in the `fbk_works` directory.
55

6+
### 2024
7+
8+
- [[LREC-COLING 2024] **How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena**](fbk_works/HYENA_COLING2024.md)
9+
610
### 2023
711

812
- [[CLiC-IT 2023] **How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation**](fbk_works/MULTIGENDER_CLIC_2023.md)

fbk_works/HYENA_COLING2024.md

+157
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena
2+
3+
This README contains the instructions to replicate the training and evaluation of the models in the paper
4+
[How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena](https://arxiv.org/abs/2402.13208).
5+
In addition, we release the pre-trained models used in the paper.
6+
7+
8+
## Setup
9+
Clone this repository and install it as explained in the original [Fairseq(-py)](https://github.com/pytorch/fairseq).
10+
For the experiments we used MuST-C, make sure to [download the corpus](https://mt.fbk.eu/must-c/).
11+
Follow the [preprocessing steps of Speechformer](SPEECHFORMER.md#preprocessing) to preprocess the MuST-C data.
12+
13+
## Pretrained models
14+
15+
Below we release the dictionary/config files and the pre-trained checkpoints
16+
obtained in our experiments.
17+
The dictionary and config files are the same as those used for the Conformer baseline,
18+
whose checkpoints can be found [here](BUGFREE_CONFORMER.md#pretrained-models).
19+
20+
### Common files:
21+
- Source dictionary SentencePiece model and fairseq dictionary:
22+
[srcdict.model](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EdAgeZdaw5BEjv6PUPEycvoBZHeOMqZ69ciEAIHM0XoBbw?e=t2z5G1),
23+
[srcdict.txt](https://fbk-my.sharepoint.com/:t:/g/personal/mgaido_fbk_eu/EY6_YCFCDjxBlBvm2_8UQFEB9ehLmFoLiGj2r7GGe_pL0A?e=NhIhkz)
24+
- Target dictionary SentencePiece model and fairseq dictionary:
25+
- **en (ASR)**: same as srcdict.model and srcdict.txt
26+
- **en-de**:
27+
[tgtdict.model](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/Eamb-6DsnklHq-4CZOZA9nYBKZ0XXnz0UdeOb49UXYlLVQ?e=yroKIk),
28+
[tgtdict.txt](https://fbk-my.sharepoint.com/:t:/g/personal/mgaido_fbk_eu/EVOJ0yFgZZpEqvHUlzhjqOEBkV7U26iryO-bpobz_5q_fQ?e=i2gdi0)
29+
- **en-es**:
30+
[tgtdict.model](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EWmh3csXbEVPmBSI7xeemVMBHqlSEDJHl3JmUOXzPRwCAA?e=T53pKl),
31+
[tgtdict.txt](https://fbk-my.sharepoint.com/:t:/g/personal/mgaido_fbk_eu/EduV9z-HroFOgh2xQjhdShIBmCs-6PmvgqkzPfcQmXsXdQ?e=iehKch)
32+
- **en-fr**:
33+
[tgtdict.model](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EXQfn6DYxC1CskMO7lJMaxIB23Wa4xIWOtsX2SIukOOM9A?e=HyvZrB),
34+
[tgtdict.txt](https://fbk-my.sharepoint.com/:t:/g/personal/mgaido_fbk_eu/ETV367Z8xJ1Egz9E_cKBdykB9iYgDdEj1xLKBLRTANWCUA?e=Y5CUky)
35+
- **en-it**:
36+
[tgtdict.model](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EX_w-V-SN1dLkEEJWrXbK_UBxHQL0zJaJuzIM_ZzosICmg?e=Wf0VKk),
37+
[tgtdict.txt](https://fbk-my.sharepoint.com/:t:/g/personal/mgaido_fbk_eu/ERAhMZjPoJNHkPWih7v0GfoBus4jG0WD3XPRmK5CgaV3wA?e=lG50Ny)
38+
- **en-nl**:
39+
[tgtdict.model](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EZ8C2AySmHxLi7qDcf4PcvEBEg5tkVXK9jsB1t8v0F3Maw?e=6VCiwb),
40+
[tgtdict.txt](https://fbk-my.sharepoint.com/:t:/g/personal/mgaido_fbk_eu/EWvoJ9Lb97RGqaUaFgsWPlMBYgo9uTIxUUY6KidHnZErhw?e=986D7S)
41+
- **en-pt**:
42+
[tgtdict.model](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EX9u-0PII8JKpnNensFj5ygBqVZrcPYoE8RWC8VryspzTg?e=2LjDH5),
43+
[tgtdict.txt](https://fbk-my.sharepoint.com/:t:/g/personal/mgaido_fbk_eu/EZ2TMRgLtudCuvXcsjCzOtkBjWVSdsof1LGmt9bOtQn9gg?e=boCBtQ)
44+
- **en-ro**:
45+
[tgtdict.model](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/Ec_zzPD3sTtCkNmibsMUUQUBWQHxinzoNvSRCCx6c_JhzA?e=Q5pDs7),
46+
[tgtdict.txt](https://fbk-my.sharepoint.com/:t:/g/personal/mgaido_fbk_eu/EbkE3WFxh4lDiR7aB9wA6NoBaQIZnM6MnWscLKD-h5nMTw?e=QgoD95)
47+
- config yaml:
48+
```bash
49+
bpe_tokenizer:
50+
bpe: sentencepiece
51+
sentencepiece_model: tgtdict.model
52+
bpe_tokenizer_src:
53+
bpe: sentencepiece
54+
sentencepiece_model: srcdict.model
55+
input_channels: 1
56+
input_feat_per_channel: 80
57+
sampling_alpha: 1.0
58+
specaugment:
59+
freq_mask_F: 27
60+
freq_mask_N: 1
61+
time_mask_N: 1
62+
time_mask_T: 100
63+
time_mask_p: 1.0
64+
time_wrap_W: 0
65+
transforms:
66+
'*':
67+
- utterance_cmvn
68+
_train:
69+
- utterance_cmvn
70+
- specaugment
71+
vocab_filename: tgtdict.txt
72+
vocab_filename_src: srcdict.txt
73+
```
74+
### Checkpoints
75+
| Model | en (ASR) | en-de | en-es | en-fr | en-it | en-nl | en-pt | en-ro |
76+
|--------------------|------------|------------|------------|-------|-------|-------|-------|-------|
77+
| ConfHyena | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EU6Bhy_jGQxJm9fIS3DsJmwBxd-tBl5HsQBM2OCbvu5gQQ?e=cORIdz) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/ETuTNLx7_hNAooQ_U5yQh1oB3zae2fls2xv-K4enmCBMRw?e=2ENGAV) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EXWYMvNOEINMgeKStlW0peABybfiOIcOpInjpbFw3cRUBw?e=JyPdry) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/Eb3pv7C6zvJIqkH2nPa9w4YBvvO74khSX7s_uo6D_p7fzg?e=t7NypZ) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/Ee7Gvuo2iRJHsr2M_9G4KHQBkgrRkCmwCy5kS9jMlJVP6A?e=lbVxTr) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/ERpXg6Cbe3pDlL1gzGCoe7UBHcpCLw2JQXQKtK1vF05NGg?e=RSsHDJ) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/ETXG5TySDzZLmaWkaPbFMXgBIoxE3n54I-pclaRsmQQedg?e=JNdKaE) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EdR34a0DMMhGiRsIpGCxcQABIbjbICogJaTKZXOtGQa14w?e=MYRU3N) |
78+
| - non-causal Hyena | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/Efpj8KHH9oJDm6bPAJSdDNkB_JRcsmcxXC4ciaPE0U3kgg?e=yfGbhq) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EYHfCm2e4PBAoE-0jHkEm2MB1Wr-qBZAEaeAWJBUXl30Lg?e=ZuKaon) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EVlb_DmkG8VCg2JrddtHGOoB9be1IDpB2Q0aQavIe6hoAw?e=aL3SWY) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EawfiABptKBErrLwYJ5fjdUBAVSVv1gsWU-jwWlgj8qt_A?e=hnU9HB) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EQZSTWx_O8RFhFICdyE8swkBsrCmwkA0LouzRnX4cF7wHQ?e=0Ha4zB) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EXTo4vC_hMtAgijZ1TE7RWABZgwfI4wuXrZvlcHI_ah7Lg?e=3Baczg) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EUf-qg9uf_VBgAZDS5OW3DIBRK8gkxts-Ku067r00bb1VQ?e=2uNgZj) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/Ea8i1u8KIldLreB5531Fno8BCEpg7qiiHG2lCE8cE8qZXA?e=pTXNCC) |
79+
| Hybrid ConfHyena | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EbF2LjOz1MtLnX1gCHTQjsEBgLn_EAhKypyIDhu3Y7nuFQ?e=ZhFRyF) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/ETmOsR9Ie6hOrhM50B6wzioBvWuSLo6g55e_qIp88W13qQ?e=W5eK79) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EcYJBCcrJaNApvOlvWDnRSEBtue-fzMIYpISwMWqdRCPSQ?e=gWXrK5) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EfSjqbL1CwZOkquHsvwZnpQBswt469ymSW3uL_q8ro5xlg?e=GMHKPO) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EQXC8NAnPldElP_0WduGtGYB2lhKCCy-tOQQDBfeQMvC4A?e=0T63hZ) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EZin2xLqqLFBkFYPyh0X1rcBCFbdvB-Dpr567adjGkrpSQ?e=57imQ7) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EVjU7IhkWB5Dq7M09SzWpqABn18U_GbSGdj4biJoNWCaJw?e=vQpsEh) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EYROvhAPNTxEn9WDHgpIgPEBsKhWUWYTpEfydwFV9AXDIw?e=oaet0d) |
80+
| - non-causal Hyena | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EbNqXhyaUGVFheZ3FExAloEBPEZOG2jlpJv8ynnYnYpf2g?e=qe87Zq) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/Ef5HXS1LJvxNvYHv-bp-cNUBZ4DDGdWBAL_iBQNpl6JbcA?e=DX0ItZ) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EWQ_V6szbMdPp149zGa8tuoBLnN-nZ0tVnYc3ymBb9Ddcg?e=BByutz) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EVsXeLu_VkFKndxmUAShl1kB7ANPmdw19QOA87RUBP-TcQ?e=q8Royw) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EbTZHMRnb_BJobUxSK0dFScB3FD1_IvVcLvyfnIWFy6lPg?e=mqh2wK) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/Ef_gZRguYWJEmzyIMn9bzIUBzGgCt-lwb_5FPCSrUHv03A?e=LssC98) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EVhOSMYNNqlJibkYt85laRoBvwNrNzvXCAOX_CJYX13_MQ?e=9KfTZJ) | [ckp.pt](https://fbk-my.sharepoint.com/:u:/g/personal/mgaido_fbk_eu/EZ64nTKeOvhBmNzZgyx7LV8BJKJN0Qx0psoqLYaJ7lzPlg?e=Yq8tAv) |
81+
82+
83+
84+
## Training
85+
86+
For the Conformer baseline, please refer to the [bug-free Conformer README](BUGFREE_CONFORMER.md).
87+
88+
For the Hybrid ConfHyena models, our training has been executed with the following commands.
89+
90+
91+
```bash
92+
LANG=$1
93+
MUSTC_ROOT=$2
94+
TASK=$3
95+
SAVE_DIR=$4
96+
97+
mkdir -p $SAVE_DIR
98+
99+
python ${FBK_fairseq}/train.py ${MUSTC_ROOT} \
100+
--train-subset train_${TASK}_src --valid-subset dev_${TASK}_src \
101+
--user-dir examples/speech_to_text --seed 1 \
102+
--num-workers 2 --max-update 100000 --patience 10 --keep-last-epochs 12 \
103+
--max-tokens 40000 --update-freq 4 \
104+
--task speech_to_text_ctc --config-yaml config.yaml \
105+
--criterion ctc_multi_loss \
106+
--underlying-criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
107+
--arch confhyena --conformer-after-compression --stride 2 \
108+
--ctc-encoder-layer 8 --ctc-weight 0.5 --ctc-compress-strategy avg \
109+
--optimizer adam --adam-betas '(0.9, 0.98)' \
110+
--lr 2e-3 --lr-scheduler inverse_sqrt --warmup-updates 25000 \
111+
--clip-norm 10.0 \
112+
--skip-invalid-size-inputs-valid-test \
113+
--save-dir ${SAVE_DIR} \
114+
--log-format simple > $SAVE_DIR/train.log 2> $SAVE_DIR/train.err
115+
116+
python ${FBK_fairseq}/scripts/average_checkpoints.py \
117+
--input $SAVE_DIR --num-epoch-checkpoints 5 \
118+
--checkpoint-upper-bound $(ls $SAVE_DIR | head -n 5 | tail -n 1 | grep -o "[0-9]*") \
119+
--output $SAVE_DIR/avg5.pt
120+
121+
if [ -f $SAVE_DIR/avg5.pt ]; then
122+
rm $SAVE_DIR/checkpoint??.pt
123+
fi
124+
```
125+
126+
The ConfHyena models can be obtained by removing the `--conformer-after-compression` parameter.
127+
128+
129+
The causal version of the two architectures (`- non causal Hyena` in the paper and tables below)
130+
can be obtained by adding the parameter `--hyena-causal` to the command.
131+
132+
The command is meant to be executed on 2 A100 GPUs with 40GB VRAM.
133+
134+
135+
## Evaluation
136+
Once you downloaded the pretrained checkpoints and related config/dictionaries,
137+
generate the output with:
138+
```bash
139+
python ${FBK_fairseq}/fairseq_cli/generate.py ${MUSTC_ROOT} \
140+
--user-dir examples/speech_to_text \
141+
--config-yaml config.yaml --gen-subset tst-COMMON_st_src \
142+
--max-source-positions 10000 --max-target-positions 1000 \
143+
--task speech_to_text_ctc \
144+
--criterion ctc_multi_loss --underlying-criterion label_smoothed_cross_entropy \
145+
--beam 5 --no-repeat-ngram-size 5 --path ${PRETRAINED_CHECKPOINT} > ${OUTPUT_FILE}
146+
```
147+
148+
## Citation
149+
```bibtex
150+
@inproceedings{gaido-et-al-2024-hyena,
151+
title={{How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena}},
152+
author={Marco Gaido and Sara Papi and Matteo Negri and Luisa Bentivogli},
153+
year={2024},
154+
address="Turin, Italy",
155+
booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
156+
}
157+
```

0 commit comments

Comments
 (0)