Skip to content

v1.3.3 Minor bugfixes

Compare
Choose a tag to compare
@Natooz Natooz released this 19 Dec 18:20
· 341 commits to main since this release

Changes

  • 4f4e49e Magic method len bugfix with multi-vocal tokenizers, len is now also a property
  • 925c7ae & 5b4f410 Bugfix of token types initialization when loading tokenizer from params file
  • c873456 Removed hyphens from token types names, for better visibility. Be convention tokens types are all written in CamelCase.
  • 5e51e84 New multi_voc property
  • b3b0cc7 tokenize_dataset, progress bar now show the saving directory name

Compatibility

  • All good 🙌