Cleaning data with moses vs. source.max_seq_len #185

ghost · 2017-04-21T11:59:40Z

Hi

Could someone enlighten me as to what the exact difference is between cleaning your data and the use of the .max_seq_len parameter when starting training?

I'm aware that cleaning also involves removing some sentences that are ill-formed in some way and not only sentences that are too long, but what is the exact point of cleaning in the wmt16_en_de.sh script with moses from 1 to 80 if you're then using .max_seq_len afterwards with a value smaller than 80?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleaning data with moses vs. source.max_seq_len #185

Cleaning data with moses vs. source.max_seq_len #185

ghost commented Apr 21, 2017

Cleaning data with moses vs. source.max_seq_len #185

Cleaning data with moses vs. source.max_seq_len #185

Comments

ghost commented Apr 21, 2017