Skip to content

LLM Finetuner Cleanup #376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 32 commits into
base: master
Choose a base branch
from
Open

LLM Finetuner Cleanup #376

wants to merge 32 commits into from

Conversation

harubaru
Copy link
Contributor

@harubaru harubaru commented Mar 21, 2024

The primary motivation behind this PR is to not only clean up our LLM Finetuner example by fixing some issues but to also integrate more of Tensorizer's functionality into the finetuner to serve as an example of how Tensorizer can be conveniently used outside of inference usecases.

Change Overview

  • Updated dependencies such as PyTorch, HF Transformers, DeepSpeed, and Accelerate.
  • Removed the previous inference service which was originally developed specifically for the LLM Finetuner in favor of our more up to date LLM inference example.
  • Addressed [finetuner-workflow]: issues encountered at download model step #353 by redoing the behavior of the tensorizer_uri workflow parameter.
  • Added support for checkpointing models with Tensorizer.
  • The HF LLM inference example now also supports loading models through PVC with a fallback of relying on HF's loading code.
  • Integrated vLLM as the inference server which consumes Tensorizer checkpoints.
  • Added a maximum batch size argument which caps the batch size estimate which generated a very high estimate for smaller models.
  • Support for Mistral models. Training Mistral models is quite slow due to the very large context window which makes the 7B Mistral model require ~96GiB of VRAM to train. In the future, we should be investigating methods to move away from our current form of parallelism.

Minor TODO

  • Checkpoint final artifact to object storage to remove reliance on PVC for inference.
  • Serialize trainer state such as LR Scheduler and Optimizer using Tensorizer.
  • Documentation regarding the tensorizer_uri workflow parameter and other changes.
  • Support for the train_ratio argument.

@harubaru harubaru requested review from wbrown, dmarx and Eta0 March 21, 2024 01:56
@harubaru harubaru marked this pull request as draft March 22, 2024 20:05
Copy link
Contributor

@rtalaricw rtalaricw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments but looks good overall!

@@ -194,9 +198,13 @@ spec:
- name: dataset_downloader_tag
value: 'cd6408a'
- name: finetuner_image
value: 'gooseai/finetuner'
value: 'docker.io/harubaru1/finetuner'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we package these images in ml-containers instead of using public domains like docker.io?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these are just placeholders

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to persist this change for the PR if it's meant to be a placeholder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the PR gets merged I'm thinking about making a PR against ml-containers to build a finetuner image

@Rexwang8
Copy link
Contributor

Nothing stands out to me other than stuff that Rahul mentioned, looks good!

Copy link
Contributor

@sangstar sangstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, few questions/nits.

@@ -1054,6 +1076,7 @@ def collector(data):
# At the end of it all, record to a `final` output.
final_path = os.path.join(output_dir, "final")
trainer.save_model(final_path)
tensorizer_save(final_path) # Must be invoked manually.
Copy link
Contributor

@sangstar sangstar Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the number of steps per epoch is a multiple of save_steps, unless I'm unaware of how Trainer saves checkpoints, it will do a double save at the end, or two quick saves at the end if num_steps % num_steps_per_epoch is close to 0. Do we want to possibly skip saving the final checkpoint if this is the case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For predictability, there are pros to not skipping it; then something exists at the checkpoint file path and at the final_path path, and either can be deleted safely, or otherwise processed programmatically. Though in a storage location where hardlinking is available, that could be used to reduce data duplication while keeping both paths (if it is safe to propagate modifications to either file to either other file).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checkpoint callback is not called if trainer.save_model() is invoked manually.

@harubaru harubaru marked this pull request as ready for review June 11, 2024 18:27
@harubaru harubaru requested a review from rtalaricw July 25, 2024 00:47
rtalaricw
rtalaricw previously approved these changes Jul 25, 2024
Copy link
Contributor

@rtalaricw rtalaricw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I would love to run this using Argo once merged to main

Rexwang8
Rexwang8 previously approved these changes Jul 29, 2024
Copy link
Contributor

@Rexwang8 Rexwang8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, add uint32 padding and you're good to go

--model=${MODEL_PATH} \
serialize \
--serialized-directory /{{workflow.parameters.pvc}} \
--suffix vllm
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm will already be part of the resulting path: pretty sure the full path created by that script will be /<serialized-directory>/vllm/<model>/<suffix>. Suggest using vLLM's recommended convention of --suffix v1

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make a change here, you'll need to also make same change here: https://github.com/coreweave/kubernetes-cloud/pull/376/files#r1695743927

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"-tokenizer-only", "{{inputs.parameters.tokenizer_only}}" ]
"-tokenizer-only={{inputs.parameters.tokenizer_only}}" ]
env:
- name: HF_API_TOKEN
Copy link

@dmarx dmarx Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting deja vu here... this is a parameter specified by the model_downloader, yeah? If so then this suggestion is out of scope for this PR, but we should align this variable name with the environment variable used by the huggingface_hub, which is HF_TOKEN. see https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! HF changed the env var name from HF_API_TOKEN to HF_TOKEN at some point. This change would have to go under our gpt_bpe repository.

template: model-inference-service
arguments:
parameters:
- name: model_uri
value: "vllm/{{workflow.parameters.pvc}}/finetunes/results-{{workflow.parameters.run_name}}/final/vllm/model.tensors"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was discussed a while ago, the final output path being pretty wonky was a consequence of the serialization script used. However, I think the finetuner should have it's own vLLM serialization script which by the looks of things doesn't seem hard to do.

dmarx
dmarx previously approved these changes Jul 29, 2024
Copy link

@dmarx dmarx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of minor comments here and there which you can take or leave. LGTM

sangstar
sangstar previously approved these changes Jul 30, 2024
Copy link
Contributor

@sangstar sangstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. How have you tested these changes? Do you have unit tests somewhere or did you do an integration test?

@@ -194,9 +198,13 @@ spec:
- name: dataset_downloader_tag
value: 'cd6408a'
- name: finetuner_image
value: 'gooseai/finetuner'
value: 'docker.io/harubaru1/finetuner'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to persist this change for the PR if it's meant to be a placeholder?

Copy link
Contributor

@wbrown wbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be hosting a vLLM container on GHCR, not on Docker.

value: 'b32173f'
value: '125'
- name: inference_image
value: 'docker.io/rtalaricw/vllm'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should create vLLM images hosted on GHA rather than do this.

port: 80
initialDelaySeconds: 300
initialDelaySeconds: 60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe bump these back up a little higher, since it timing out too early would cause more issues than taking a little too long to time out in the error case.

Comment on lines +11 to 15
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
cuda-nvcc-11-8 cuda-nvml-dev-11-8 libcurand-dev-11-8 \
libcublas-dev-11-8 libcusparse-dev-11-8 \
libcusolver-dev-11-8 cuda-nvprof-11-8 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not needed with our torch base images:

  • libcurand
  • libcublas
  • libcusparse
  • libcusolver

If you are using torch-extras then cuda-nvcc and cuda-nvprof (and most likely cuda-nvml unless we use it for other reasons than for building libraries) are probably all also not needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, most of this stuff is not necessary when using our torch-extras images; try using those as a base and we can probably get rid of most of this Dockerfile and the custom compilation steps in here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes aren't relevant to this PR nor needed by it, so can this be reverted / cut out into another branch if we want to keep it?

@@ -436,7 +443,7 @@ def main_process_print(*args, **kwargs):
if "eos_token" not in tokenizer.special_tokens_map:
tokens_to_add["eos_token"] = "<|endoftext|>"
if "pad_token" not in tokenizer.special_tokens_map:
tokens_to_add["pad_token"] = "<|endoftext|>"
tokens_to_add["pad_token"] = tokenizer.eos_token
Copy link
Contributor

@Eta0 Eta0 Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to match gpt_bpe's hardcoded default (which should currently be unsigned -1 / technically no text equivalent). This logic breaks if tokenizer.eos_token is not set anyway (i.e. if the if statement right before it was triggered).

Comment on lines +506 to +507
checkpoint_model = checkpoint_model if isinstance(checkpoint_model, supported_classes) else unwrap_model(
checkpoint_model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unwrap_model should be able to be called unconditionally here.

Comment on lines +503 to +504
def tensorizer_save(checkpoint_dir: str) -> None:
checkpoint_model = trainer.model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model should probably be an argument instead of pulling it from the trainer global.

Comment on lines +677 to +687
use_uint16 = len(tokenizer) < 0xffff
self._padding_is_ambiguous = tokenizer.pad_token_id == tokenizer.eos_token_id
self._pad_token_id = tokenizer.pad_token_id
self._pad_token_id = 0xffff if use_uint16 else 0xffffffff
self._token_dtype = numpy.uint16 if use_uint16 else numpy.uint32
if is_main_process():
logger.info(f"DATASET: {path}")
logger.info(
f"DATASET SIZE: {length_mb:,.2f}MiB, {num_tokens:,} tokens, "
f"{self.length:,} contexts"
)
logger.info(f"DATASET PAD ID: {self._pad_token_id}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the uint32 functionality isn't working here yet, remove it (or comment it out with an explanation) before merging this PR, so it doesn't confuse people.

@harubaru harubaru dismissed stale reviews from sangstar, dmarx, Rexwang8, and rtalaricw via 6c9f690 March 10, 2025 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants