Skip to content
This repository has been archived by the owner on Dec 29, 2022. It is now read-only.

Cleaning data with moses vs. source.max_seq_len #185

Open
ghost opened this issue Apr 21, 2017 · 0 comments
Open

Cleaning data with moses vs. source.max_seq_len #185

ghost opened this issue Apr 21, 2017 · 0 comments

Comments

@ghost
Copy link

ghost commented Apr 21, 2017

Hi

Could someone enlighten me as to what the exact difference is between cleaning your data and the use of the .max_seq_len parameter when starting training?

I'm aware that cleaning also involves removing some sentences that are ill-formed in some way and not only sentences that are too long, but what is the exact point of cleaning in the wmt16_en_de.sh script with moses from 1 to 80 if you're then using .max_seq_len afterwards with a value smaller than 80?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants