-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Open
Labels
Description
In the language model example, it seems that during the evaluation, the code starts from computing the loss of the second word. Thus, skipping the loss of the first word.
examples/word_language_model/main.py
Line 136 in 537f697
| data, targets = get_batch(data_source, i) |
examples/word_language_model/main.py
Lines 121 to 125 in 537f697
| def get_batch(source, i): | |
| seq_len = min(args.bptt, len(source) - 1 - i) | |
| data = source[i:i+seq_len] | |
| target = source[i+1:i+1+seq_len].view(-1) | |
| return data, target |
Furthermore, the evaluation data is divided into 10 batches, hence, the losses of 10 words are skipped.
Am I right or I did miss something?
examples/word_language_model/main.py
Lines 85 to 88 in 537f697
| eval_batch_size = 10 | |
| train_data = batchify(corpus.train, args.batch_size) | |
| val_data = batchify(corpus.valid, eval_batch_size) | |
| test_data = batchify(corpus.test, eval_batch_size) |