Skip to content

Commit 938f8f7

Browse files
committed
more defensive
1 parent b4c346a commit 938f8f7

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

tokenizer.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ void tokenizer_init(Tokenizer *tokenizer, const char *filename) {
5959
if (version == 1) {
6060
// version 1 didn't include the EOT token id
6161
// so we assume it is 50256, the EOT in GPT-2
62+
assert(tokenizer->vocab_size == 50257); // let's be defensive here
6263
tokenizer->eot_token = 50256;
6364
} else if (version == 2) {
6465
tokenizer->eot_token = header[3];

0 commit comments

Comments
 (0)