Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][tests] add precomputation tests #234

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from
Draft

Conversation

sayakpaul
Copy link
Collaborator

@sayakpaul sayakpaul commented Jan 21, 2025

Adds precomputation tests.

Currently, I have changed the bare-minimum to show the approach taken for the tests. After I have some reviews, I will propagate the changes to the rest of the supported models and make the PR ready for further reviews.

Some further comments in-line.

To run the tests from DGX or any other internal CUDA machines without using CUDA, run:

CUDA_VISIBLE_DEVICES="" pytest tests/trainers/

Just LMK if you want something change before proceeding to review at this stage of the PR. I will make it happen.

TODOs

  • LTX
  • HunyuanVideo
  • Configure runner and action

finetrainers/args.py Outdated Show resolved Hide resolved
Comment on lines 18 to 21
try:
tokenizer = T5Tokenizer.from_pretrained(model_id, subfolder="tokenizer", revision=revision, cache_dir=cache_dir)
except:
tokenizer = AutoTokenizer.from_pretrained(model_id, subfolder="tokenizer", revision=revision, cache_dir=cache_dir)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super proud of this but we cannot do T5Tokenizer on the dummy T5 tokenizer checkpoint. Some sentencepiece error.

@sayakpaul sayakpaul requested a review from a-r-r-o-w January 21, 2025 08:30
Comment on lines +18 to +25
try:
tokenizer = T5Tokenizer.from_pretrained(
model_id, subfolder="tokenizer", revision=revision, cache_dir=cache_dir
)
except: # noqa
tokenizer = AutoTokenizer.from_pretrained(
model_id, subfolder="tokenizer", revision=revision, cache_dir=cache_dir
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not proud of the change but T5Tokenizer cannot be used on a dummy T5 tokenizer ckpt.

finetrainers/trainer.py Outdated Show resolved Hide resolved
@sayakpaul
Copy link
Collaborator Author

@a-r-r-o-w LMK what you think of the latest changes.

Copy link
Owner

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sayak. The changes look good

finetrainers/utils/memory_utils.py Outdated Show resolved Hide resolved
finetrainers/trainer.py Outdated Show resolved Hide resolved
root_dir = current_file.parents[3]
sys.path.append(str(root_dir))

import unittest # noqa
Copy link
Owner

@a-r-r-o-w a-r-r-o-w Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious as to why these imports have to be marked noqa?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved them. Should be good now.

@sayakpaul
Copy link
Collaborator Author

@a-r-r-o-w LMK if I can apply the tests to the rest of the models. I have taken care of addressing the rest of your feedback.

@a-r-r-o-w
Copy link
Owner

Yes please, lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants