LLM Finetuner Cleanup #376

harubaru · 2024-03-21T01:56:36Z

The primary motivation behind this PR is to not only clean up our LLM Finetuner example by fixing some issues but to also integrate more of Tensorizer's functionality into the finetuner to serve as an example of how Tensorizer can be conveniently used outside of inference usecases.

Change Overview

Updated dependencies such as PyTorch, HF Transformers, DeepSpeed, and Accelerate.
Removed the previous inference service which was originally developed specifically for the LLM Finetuner in favor of our more up to date LLM inference example.
Addressed [finetuner-workflow]: issues encountered at download model step #353 by redoing the behavior of the tensorizer_uri workflow parameter.
Added support for checkpointing models with Tensorizer.
The HF LLM inference example now also supports loading models through PVC with a fallback of relying on HF's loading code.
Integrated vLLM as the inference server which consumes Tensorizer checkpoints.
Added a maximum batch size argument which caps the batch size estimate which generated a very high estimate for smaller models.
Support for Mistral models. Training Mistral models is quite slow due to the very large context window which makes the 7B Mistral model require ~96GiB of VRAM to train. In the future, we should be investigating methods to move away from our current form of parallelism.

Minor TODO

Checkpoint final artifact to object storage to remove reliance on PVC for inference.
Serialize trainer state such as LR Scheduler and Optimizer using Tensorizer.
Documentation regarding the tensorizer_uri workflow parameter and other changes.
Support for the train_ratio argument.

…nting

rtalaricw

Few comments but looks good overall!

rtalaricw · 2024-03-22T21:03:34Z

finetuner-workflow/finetune-workflow.yaml

@@ -194,9 +198,13 @@ spec:
    - name: dataset_downloader_tag
      value: 'cd6408a'
    - name: finetuner_image
-      value: 'gooseai/finetuner'
+      value: 'docker.io/harubaru1/finetuner'


Should we package these images in ml-containers instead of using public domains like docker.io?

Yes, these are just placeholders

Do we want to persist this change for the PR if it's meant to be a placeholder?

After the PR gets merged I'm thinking about making a PR against ml-containers to build a finetuner image

finetuner-workflow/finetune-workflow.yaml

finetuner-workflow/finetuner/requirements.txt

Rexwang8 · 2024-03-23T22:32:37Z

Nothing stands out to me other than stuff that Rahul mentioned, looks good!

sangstar

LGTM, few questions/nits.

online-inference/hf-llm/service/service.py

finetuner-workflow/finetuner/finetuner.py

sangstar · 2024-03-25T14:24:13Z

finetuner-workflow/finetuner/finetuner.py

@@ -1054,6 +1076,7 @@ def collector(data):
 # At the end of it all, record to a `final` output.
 final_path = os.path.join(output_dir, "final")
 trainer.save_model(final_path)
+tensorizer_save(final_path) # Must be invoked manually.


If the number of steps per epoch is a multiple of save_steps, unless I'm unaware of how Trainer saves checkpoints, it will do a double save at the end, or two quick saves at the end if num_steps % num_steps_per_epoch is close to 0. Do we want to possibly skip saving the final checkpoint if this is the case?

For predictability, there are pros to not skipping it; then something exists at the checkpoint file path and at the final_path path, and either can be deleted safely, or otherwise processed programmatically. Though in a storage location where hardlinking is available, that could be used to reduce data duplication while keeping both paths (if it is safe to propagate modifications to either file to either other file).

The checkpoint callback is not called if trainer.save_model() is invoked manually.

finetuner-workflow/finetune-workflow.yaml

online-inference/hf-llm/service/service.py

finetuner-workflow/finetuner/requirements.txt

…d for predictability

…argument

…ivate models

Co-authored-by: Eta <[email protected]>

finetuner-workflow/finetune-workflow.yaml

finetuner-workflow/finetuner/finetuner.py

rtalaricw

LGTM. I would love to run this using Argo once merged to main

Rexwang8

Looks good, add uint32 padding and you're good to go

finetuner-workflow/finetuner/finetuner.py

dmarx · 2024-07-29T19:00:21Z

finetuner-workflow/finetune-workflow.yaml

+          --model=${MODEL_PATH} \
+          serialize \
+          --serialized-directory /{{workflow.parameters.pvc}} \
+          --suffix vllm


vllm will already be part of the resulting path: pretty sure the full path created by that script will be /<serialized-directory>/vllm/<model>/<suffix>. Suggest using vLLM's recommended convention of --suffix v1

If you make a change here, you'll need to also make same change here: https://github.com/coreweave/kubernetes-cloud/pull/376/files#r1695743927

Addressed here

dmarx · 2024-07-29T19:01:58Z

finetuner-workflow/finetune-workflow.yaml

-              "-tokenizer-only", "{{inputs.parameters.tokenizer_only}}" ]
+              "-tokenizer-only={{inputs.parameters.tokenizer_only}}" ]
+      env:
+      - name: HF_API_TOKEN


Getting deja vu here... this is a parameter specified by the model_downloader, yeah? If so then this suggestion is out of scope for this PR, but we should align this variable name with the environment variable used by the huggingface_hub, which is HF_TOKEN. see https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables

I agree! HF changed the env var name from HF_API_TOKEN to HF_TOKEN at some point. This change would have to go under our gpt_bpe repository.

dmarx · 2024-07-29T19:14:52Z

finetuner-workflow/finetune-workflow.yaml

        template: model-inference-service
        arguments:
          parameters:
+            - name: model_uri
+              value: "vllm/{{workflow.parameters.pvc}}/finetunes/results-{{workflow.parameters.run_name}}/final/vllm/model.tensors"


flagging to ensure consistency with https://github.com/coreweave/kubernetes-cloud/pull/376/files#r1695728991

This was discussed a while ago, the final output path being pretty wonky was a consequence of the serialization script used. However, I think the finetuner should have it's own vLLM serialization script which by the looks of things doesn't seem hard to do.

dmarx

a couple of minor comments here and there which you can take or leave. LGTM

sangstar

Generally LGTM. How have you tested these changes? Do you have unit tests somewhere or did you do an integration test?

sangstar · 2024-07-30T17:59:26Z

finetuner-workflow/finetune-workflow.yaml

@@ -194,9 +198,13 @@ spec:
    - name: dataset_downloader_tag
      value: 'cd6408a'
    - name: finetuner_image
-      value: 'gooseai/finetuner'
+      value: 'docker.io/harubaru1/finetuner'


Do we want to persist this change for the PR if it's meant to be a placeholder?

finetuner-workflow/finetune-workflow.yaml

wbrown

We should be hosting a vLLM container on GHCR, not on Docker.

wbrown · 2024-08-15T20:11:52Z

finetuner-workflow/finetune-workflow.yaml

-      value: 'b32173f'
+      value: '125'
+    - name: inference_image
+      value: 'docker.io/rtalaricw/vllm'


We should create vLLM images hosted on GHA rather than do this.

Eta0 · 2024-08-15T20:47:49Z

finetuner-workflow/finetune-workflow.yaml

                    port: 80
-                  initialDelaySeconds: 300
+                  initialDelaySeconds: 60


Maybe bump these back up a little higher, since it timing out too early would cause more issues than taking a little too long to time out in the error case.

Eta0 · 2024-08-15T20:58:59Z

finetuner-workflow/finetuner/Dockerfile

+RUN  apt-get update -y && \
+    apt-get install -y --no-install-recommends \
        cuda-nvcc-11-8 cuda-nvml-dev-11-8 libcurand-dev-11-8 \
        libcublas-dev-11-8 libcusparse-dev-11-8 \
        libcusolver-dev-11-8 cuda-nvprof-11-8 \


Probably not needed with our torch base images:

libcurand

libcublas

libcusparse

libcusolver

If you are using torch-extras then cuda-nvcc and cuda-nvprof (and most likely cuda-nvml unless we use it for other reasons than for building libraries) are probably all also not needed.

Eta0 · 2024-08-15T21:05:05Z

finetuner-workflow/finetuner/Dockerfile

Overall, most of this stuff is not necessary when using our torch-extras images; try using those as a base and we can probably get rid of most of this Dockerfile and the custom compilation steps in here.

Eta0 · 2024-08-15T21:08:00Z

online-inference/hf-llm/service/service.py

These changes aren't relevant to this PR nor needed by it, so can this be reverted / cut out into another branch if we want to keep it?

Eta0 · 2024-08-15T21:17:01Z

finetuner-workflow/finetuner/finetuner.py

@@ -436,7 +443,7 @@ def main_process_print(*args, **kwargs):
    if "eos_token" not in tokenizer.special_tokens_map:
        tokens_to_add["eos_token"] = "<|endoftext|>"
    if "pad_token" not in tokenizer.special_tokens_map:
-        tokens_to_add["pad_token"] = "<|endoftext|>"
+        tokens_to_add["pad_token"] = tokenizer.eos_token


This needs to match gpt_bpe's hardcoded default (which should currently be unsigned -1 / technically no text equivalent). This logic breaks if tokenizer.eos_token is not set anyway (i.e. if the if statement right before it was triggered).

Eta0 · 2024-08-15T21:30:55Z

finetuner-workflow/finetuner/finetuner.py

+    checkpoint_model = checkpoint_model if isinstance(checkpoint_model, supported_classes) else unwrap_model(
+        checkpoint_model)


unwrap_model should be able to be called unconditionally here.

Eta0 · 2024-08-15T21:31:54Z

finetuner-workflow/finetuner/finetuner.py

+def tensorizer_save(checkpoint_dir: str) -> None:
+    checkpoint_model = trainer.model


The model should probably be an argument instead of pulling it from the trainer global.

Eta0 · 2024-08-15T21:36:25Z

finetuner-workflow/finetuner/finetuner.py

+        use_uint16 = len(tokenizer) < 0xffff
        self._padding_is_ambiguous = tokenizer.pad_token_id == tokenizer.eos_token_id
-        self._pad_token_id = tokenizer.pad_token_id
+        self._pad_token_id = 0xffff if use_uint16 else 0xffffffff
+        self._token_dtype = numpy.uint16 if use_uint16 else numpy.uint32
        if is_main_process():
            logger.info(f"DATASET: {path}")
            logger.info(
                f"DATASET SIZE: {length_mb:,.2f}MiB, {num_tokens:,} tokens, "
                f"{self.length:,} contexts"
            )
+            logger.info(f"DATASET PAD ID: {self._pad_token_id}")


If the uint32 functionality isn't working here yet, remove it (or comment it out with an explanation) before merging this PR, so it doesn't confuse people.

harubaru added 3 commits March 13, 2024 11:53

chore(finetuner): update dependencies

9e2c6cc

feat(finetuner): allow tensorized model loading from local directories

0e113e2

feat(finetuner): replace inference server and add tensorizer checkpoi…

bd27241

…nting

harubaru requested review from wbrown, dmarx and Eta0 March 21, 2024 01:56

harubaru marked this pull request as draft March 22, 2024 20:05

rtalaricw reviewed Mar 22, 2024

View reviewed changes

sangstar reviewed Mar 25, 2024

View reviewed changes

Eta0 requested changes Mar 26, 2024

View reviewed changes

harubaru added 16 commits April 17, 2024 11:38

fix(finetuner): update package list and remove non-existent copy

7431c86

fix(finetuner): use standard object storage url instead of accelerate…

e535af8

…d for predictability

fix(finetuner): remove redundant conditional for downloader step

d795de4

chore(finetuner): update curl image

462999d

refactor(finetuner): tidy up check-model step in workflow

2ccb82d

refactor(finetuner): clarify inference precision env var

cc65544

refactor(finetuner): remove redundant newline

4c32747

refactor(finetuner): constrain tensorizer dependency more

48fcdc2

feat(finetuner): add sentencepiece as a dependency

dd902ae

Merge branch 'master' into amercurio/finetuner-cleanup

19e3c4c

feat(finetuner): initial vLLM integration for inference server

a25b011

feat(finetuner): add a max batch size that can be estimated

e722cc7

fix(finetuner): handle check-model case to substitute tensorizer_uri …

61fd970

…argument

feat(finetuner): add hf api key workflow parameter for downloading pr…

0c80cc2

…ivate models

fix(finetuner): handle placeholder padding tokens generated by gpt_bpe

beb9b0d

style(finetuner): unnecessary capitalization

15cdb01

harubaru marked this pull request as ready for review June 11, 2024 18:27

harubaru and others added 2 commits June 11, 2024 11:42

fix(finetuner): handle additional path cases for hf llm inference server

f8faef3

Co-authored-by: Eta <[email protected]>

fix(finetuner): pin setuptools version

c4835b1

harubaru requested a review from rtalaricw July 25, 2024 00:47

rtalaricw reviewed Jul 25, 2024

View reviewed changes

finetuner-workflow/finetune-workflow.yaml Outdated Show resolved Hide resolved

rtalaricw reviewed Jul 25, 2024

View reviewed changes

finetuner-workflow/finetuner/finetuner.py Outdated Show resolved Hide resolved

rtalaricw previously approved these changes Jul 25, 2024

View reviewed changes

harubaru added 3 commits July 25, 2024 14:01

fix(finetuner): bump downloader step ram to 2Gi

e4dce80

docs(finetuner): add note to change padding id placeholder to uint32 max

ddc2968

refactor(finetuner): increase health check initial delay to 60 seconds

0211af1

Rexwang8 previously approved these changes Jul 29, 2024

View reviewed changes

finetuner-workflow/finetuner/finetuner.py Outdated Show resolved Hide resolved

finetuner-workflow/finetuner/finetuner.py Outdated Show resolved Hide resolved

finetuner-workflow/finetuner/finetuner.py Show resolved Hide resolved

dmarx reviewed Jul 29, 2024

View reviewed changes

dmarx previously approved these changes Jul 29, 2024

View reviewed changes

sangstar previously approved these changes Jul 30, 2024

View reviewed changes

harubaru added 4 commits July 31, 2024 15:29

feat(finetuner): uint32 token support

25e9c94

docs(finetuner): add readme that directs to our documentation

ad7acdd

ci(finetuner): update image tag

8d7fedc

fix(finetuner): the batch size max was still too high

6c9f690

wbrown requested changes Aug 15, 2024

View reviewed changes

Eta0 reviewed Aug 15, 2024

View reviewed changes

Eta0 requested changes Aug 15, 2024

View reviewed changes

Eta0 reviewed Aug 15, 2024

View reviewed changes

Eta0 requested changes Aug 15, 2024

View reviewed changes

Eta0 reviewed Aug 15, 2024

View reviewed changes

harubaru dismissed stale reviews from sangstar, dmarx, Rexwang8, and rtalaricw via 6c9f690 March 10, 2025 17:22

		checkpoint_model = checkpoint_model if isinstance(checkpoint_model, supported_classes) else unwrap_model(
		checkpoint_model)

		def tensorizer_save(checkpoint_dir: str) -> None:
		checkpoint_model = trainer.model

LLM Finetuner Cleanup #376

Are you sure you want to change the base?

LLM Finetuner Cleanup #376

Uh oh!

Conversation

harubaru commented Mar 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Overview

Minor TODO

Uh oh!

rtalaricw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rexwang8 commented Mar 23, 2024

Uh oh!

sangstar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sangstar Mar 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rtalaricw left a comment

Choose a reason for hiding this comment

Uh oh!

Rexwang8 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmarx Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmarx left a comment

Choose a reason for hiding this comment

Uh oh!

sangstar left a comment

Choose a reason for hiding this comment

Uh oh!

harubaru commented Mar 21, 2024 •

edited

Loading

sangstar Mar 25, 2024 •

edited

Loading

dmarx Jul 29, 2024 •

edited

Loading

Eta0 Aug 15, 2024 •

edited

Loading