v1.4.0 Data augmentation and optimization
This pretty big update brings data augmentation, some bug fixes and optimizations, allowing to write more elegant code.
Changes
- 8f201e0 308fb27 Data augmentation methods ! 🙌 They can be applied on both MIDI and tokens, to augment data by shifting the pitch, velocity and duration values.
- 1d8e903 You can perform data augmentation while tokenizing a dataset (
tokenize_midi_dataset
method) with thedata_augment_offsets
argument. This will be done at the token level, as its faster than augmenting MIDI objects. - 0634ade BPE is now implemented in the main tokenizer class! This means all tokenizers can benefit form it in a much prettier way!
- 0634ade
bpe
method renamed tolearn_bpe
, and now returns metrics (that are also showed in the progress bar during the learning) on the number of token combinations and sequence length reduction - 7b8c977 Retrocompatibility when loading tokenizer config files with BPE from older versions
- 3cea9aa @nturusin Example notebook of GPT2 Hugging Face music transformer: fixes in training
- 65afa6b The
tokens_to_midi
andsave_tokens
methods can now receive tokens as Tensors and numpy arrays. PyTorch, TensorFlow and Jax (numpy) tensors are supported. Theconvert_tokens_tensors_to_list
decorator will convert them to lists, you can use it on your custom methods. - aab64aa The
__call__
magic method now automatically route tomidi_to_tokens
ortokens_to_midi
following what you give it. You can now use more elegantly tokenizers astokenizer(midi_obj)
ortokenizer(generated_tokens)
. - e90b20a Bugfix in
Structured
causing a possible infinite while loop with illegal token types successions - 947af8c Big refactor of MuMIDI, which have now fixed vocab / type idx. It is easier to handle and use. (thanks @gonzaloarca)
- 947af8c CPWord "Ignore" tokens are all renamed
Ignore_None
by convention, making operations easier in data augmentation and other methods.
Compatibility
- code with BPE would have to updated: remove
bpe(tokenizer)
and just declare tokenizers normally, rename thebpe
method tolearn_bpe
- MuMIDI tokens and tokenizers will be incompatible with v1.4.0