-
Notifications
You must be signed in to change notification settings - Fork 44
LLM Finetuner Cleanup #376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments but looks good overall!
@@ -194,9 +198,13 @@ spec: | |||
- name: dataset_downloader_tag | |||
value: 'cd6408a' | |||
- name: finetuner_image | |||
value: 'gooseai/finetuner' | |||
value: 'docker.io/harubaru1/finetuner' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we package these images in ml-containers
instead of using public domains like docker.io
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these are just placeholders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to persist this change for the PR if it's meant to be a placeholder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the PR gets merged I'm thinking about making a PR against ml-containers
to build a finetuner image
Nothing stands out to me other than stuff that Rahul mentioned, looks good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, few questions/nits.
@@ -1054,6 +1076,7 @@ def collector(data): | |||
# At the end of it all, record to a `final` output. | |||
final_path = os.path.join(output_dir, "final") | |||
trainer.save_model(final_path) | |||
tensorizer_save(final_path) # Must be invoked manually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the number of steps per epoch is a multiple of save_steps
, unless I'm unaware of how Trainer
saves checkpoints, it will do a double save at the end, or two quick saves at the end if num_steps % num_steps_per_epoch
is close to 0. Do we want to possibly skip saving the final checkpoint if this is the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For predictability, there are pros to not skipping it; then something exists at the checkpoint file path and at the final_path
path, and either can be deleted safely, or otherwise processed programmatically. Though in a storage location where hardlinking is available, that could be used to reduce data duplication while keeping both paths (if it is safe to propagate modifications to either file to either other file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The checkpoint callback is not called if trainer.save_model()
is invoked manually.
…d for predictability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I would love to run this using Argo once merged to main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, add uint32 padding and you're good to go
--model=${MODEL_PATH} \ | ||
serialize \ | ||
--serialized-directory /{{workflow.parameters.pvc}} \ | ||
--suffix vllm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vllm
will already be part of the resulting path: pretty sure the full path created by that script will be /<serialized-directory>/vllm/<model>/<suffix>
. Suggest using vLLM's recommended convention of --suffix v1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make a change here, you'll need to also make same change here: https://github.com/coreweave/kubernetes-cloud/pull/376/files#r1695743927
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"-tokenizer-only", "{{inputs.parameters.tokenizer_only}}" ] | ||
"-tokenizer-only={{inputs.parameters.tokenizer_only}}" ] | ||
env: | ||
- name: HF_API_TOKEN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting deja vu here... this is a parameter specified by the model_downloader, yeah? If so then this suggestion is out of scope for this PR, but we should align this variable name with the environment variable used by the huggingface_hub, which is HF_TOKEN. see https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree! HF changed the env var name from HF_API_TOKEN
to HF_TOKEN
at some point. This change would have to go under our gpt_bpe
repository.
template: model-inference-service | ||
arguments: | ||
parameters: | ||
- name: model_uri | ||
value: "vllm/{{workflow.parameters.pvc}}/finetunes/results-{{workflow.parameters.run_name}}/final/vllm/model.tensors" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flagging to ensure consistency with https://github.com/coreweave/kubernetes-cloud/pull/376/files#r1695728991
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was discussed a while ago, the final output path being pretty wonky was a consequence of the serialization script used. However, I think the finetuner should have it's own vLLM serialization script which by the looks of things doesn't seem hard to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple of minor comments here and there which you can take or leave. LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. How have you tested these changes? Do you have unit tests somewhere or did you do an integration test?
@@ -194,9 +198,13 @@ spec: | |||
- name: dataset_downloader_tag | |||
value: 'cd6408a' | |||
- name: finetuner_image | |||
value: 'gooseai/finetuner' | |||
value: 'docker.io/harubaru1/finetuner' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to persist this change for the PR if it's meant to be a placeholder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be hosting a vLLM container on GHCR, not on Docker.
value: 'b32173f' | ||
value: '125' | ||
- name: inference_image | ||
value: 'docker.io/rtalaricw/vllm' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should create vLLM images hosted on GHA rather than do this.
port: 80 | ||
initialDelaySeconds: 300 | ||
initialDelaySeconds: 60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe bump these back up a little higher, since it timing out too early would cause more issues than taking a little too long to time out in the error case.
RUN apt-get update -y && \ | ||
apt-get install -y --no-install-recommends \ | ||
cuda-nvcc-11-8 cuda-nvml-dev-11-8 libcurand-dev-11-8 \ | ||
libcublas-dev-11-8 libcusparse-dev-11-8 \ | ||
libcusolver-dev-11-8 cuda-nvprof-11-8 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not needed with our torch
base images:
libcurand
libcublas
libcusparse
libcusolver
If you are using torch-extras
then cuda-nvcc
and cuda-nvprof
(and most likely cuda-nvml
unless we use it for other reasons than for building libraries) are probably all also not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, most of this stuff is not necessary when using our torch-extras
images; try using those as a base and we can probably get rid of most of this Dockerfile and the custom compilation steps in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes aren't relevant to this PR nor needed by it, so can this be reverted / cut out into another branch if we want to keep it?
@@ -436,7 +443,7 @@ def main_process_print(*args, **kwargs): | |||
if "eos_token" not in tokenizer.special_tokens_map: | |||
tokens_to_add["eos_token"] = "<|endoftext|>" | |||
if "pad_token" not in tokenizer.special_tokens_map: | |||
tokens_to_add["pad_token"] = "<|endoftext|>" | |||
tokens_to_add["pad_token"] = tokenizer.eos_token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to match gpt_bpe
's hardcoded default (which should currently be unsigned -1 / technically no text equivalent). This logic breaks if tokenizer.eos_token
is not set anyway (i.e. if the if
statement right before it was triggered).
checkpoint_model = checkpoint_model if isinstance(checkpoint_model, supported_classes) else unwrap_model( | ||
checkpoint_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unwrap_model
should be able to be called unconditionally here.
def tensorizer_save(checkpoint_dir: str) -> None: | ||
checkpoint_model = trainer.model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model should probably be an argument instead of pulling it from the trainer
global.
use_uint16 = len(tokenizer) < 0xffff | ||
self._padding_is_ambiguous = tokenizer.pad_token_id == tokenizer.eos_token_id | ||
self._pad_token_id = tokenizer.pad_token_id | ||
self._pad_token_id = 0xffff if use_uint16 else 0xffffffff | ||
self._token_dtype = numpy.uint16 if use_uint16 else numpy.uint32 | ||
if is_main_process(): | ||
logger.info(f"DATASET: {path}") | ||
logger.info( | ||
f"DATASET SIZE: {length_mb:,.2f}MiB, {num_tokens:,} tokens, " | ||
f"{self.length:,} contexts" | ||
) | ||
logger.info(f"DATASET PAD ID: {self._pad_token_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the uint32
functionality isn't working here yet, remove it (or comment it out with an explanation) before merging this PR, so it doesn't confuse people.
The primary motivation behind this PR is to not only clean up our LLM Finetuner example by fixing some issues but to also integrate more of Tensorizer's functionality into the finetuner to serve as an example of how Tensorizer can be conveniently used outside of inference usecases.
Change Overview
tensorizer_uri
workflow parameter.Minor TODO
tensorizer_uri
workflow parameter and other changes.train_ratio
argument.