Skip to content

Commit

Permalink
Finetuning code refactor (opea-project#1081)
Browse files Browse the repository at this point in the history
Signed-off-by: Ye, Xinyu <[email protected]>
  • Loading branch information
XinyuYe-Intel authored Jan 9, 2025
1 parent 2587a29 commit efd9578
Show file tree
Hide file tree
Showing 23 changed files with 331 additions and 280 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/docker/compose/finetuning-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
services:
finetuning:
build:
dockerfile: comps/finetuning/Dockerfile
dockerfile: comps/finetuning/src/Dockerfile
image: ${REGISTRY:-opea}/finetuning:${TAG:-latest}
finetuning-gaudi:
build:
dockerfile: comps/finetuning/Dockerfile.intel_hpu
dockerfile: comps/finetuning/src/Dockerfile.intel_hpu
image: ${REGISTRY:-opea}/finetuning-gaudi:${TAG:-latest}
242 changes: 0 additions & 242 deletions comps/finetuning/handlers.py

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,17 @@ RUN python -m pip install --no-cache-dir --upgrade pip && \
python -m pip install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu && \
python -m pip install --no-cache-dir intel-extension-for-pytorch && \
python -m pip install --no-cache-dir oneccl_bind_pt --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/cn/ && \
python -m pip install --no-cache-dir -r /home/user/comps/finetuning/requirements.txt
python -m pip install --no-cache-dir -r /home/user/comps/finetuning/src/requirements.txt

ENV PYTHONPATH=$PYTHONPATH:/home/user

WORKDIR /home/user/comps/finetuning/
WORKDIR /home/user/comps/finetuning/src

RUN echo PKGPATH=$(python3 -c "import pkg_resources; print(pkg_resources.get_distribution('oneccl-bind-pt').location)") >> run.sh && \
echo 'export LD_LIBRARY_PATH=$PKGPATH/oneccl_bindings_for_pytorch/opt/mpi/lib/:$LD_LIBRARY_PATH' >> run.sh && \
echo 'source $PKGPATH/oneccl_bindings_for_pytorch/env/setvars.sh' >> run.sh && \
echo ray start --head --dashboard-host=0.0.0.0 >> run.sh && \
echo export RAY_ADDRESS=http://localhost:8265 >> run.sh && \
echo python finetuning_service.py >> run.sh
echo python opea_finetuning_microservice.py >> run.sh

CMD bash run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,11 @@ USER user
ENV PATH=$PATH:/home/user/.local/bin

RUN python -m pip install --no-cache-dir --upgrade pip && \
python -m pip install --no-cache-dir -r /home/user/comps/finetuning/requirements.txt && \
python -m pip install --no-cache-dir -r /home/user/comps/finetuning/src/requirements.txt && \
python -m pip install --no-cache-dir optimum-habana

ENV PYTHONPATH=$PYTHONPATH:/home/user

WORKDIR /home/user/comps/finetuning/
WORKDIR /home/user/comps/finetuning/src

ENTRYPOINT ["/bin/bash", "launch.sh"]
8 changes: 4 additions & 4 deletions comps/finetuning/README.md → comps/finetuning/src/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Build docker image with below command:
```bash
export HF_TOKEN=${your_huggingface_token}
cd ../../
docker build -t opea/finetuning:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy --build-arg HF_TOKEN=$HF_TOKEN -f comps/finetuning/Dockerfile .
docker build -t opea/finetuning:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy --build-arg HF_TOKEN=$HF_TOKEN -f comps/finetuning/src/Dockerfile .
```

#### 2.1.2 Run Docker with CLI
Expand All @@ -72,7 +72,7 @@ Build docker image with below command:

```bash
cd ../../
docker build -t opea/finetuning-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/finetuning/Dockerfile.intel_hpu .
docker build -t opea/finetuning-gaudi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/finetuning/src/Dockerfile.intel_hpu .
```

#### 2.2.2 Run Docker with CLI
Expand Down Expand Up @@ -244,8 +244,8 @@ curl http://${your_ip}:8015/v1/finetune/list_checkpoints -X POST -H "Content-Typ

### 3.4 Leverage fine-tuned model

After fine-tuning job is done, fine-tuned model can be chosen from listed checkpoints, then the fine-tuned model can be used in other microservices. For example, fine-tuned reranking model can be used in [rerankings](../rerankings/src/README.md) microservice by assign its path to the environment variable `RERANK_MODEL_ID`, fine-tuned embedding model can be used in [embeddings](../embeddings/src/README.md) microservice by assign its path to the environment variable `model`, LLMs after instruction tuning can be used in [llms](../llms/src/text-generation/README.md) microservice by assign its path to the environment variable `your_hf_llm_model`.
After fine-tuning job is done, fine-tuned model can be chosen from listed checkpoints, then the fine-tuned model can be used in other microservices. For example, fine-tuned reranking model can be used in [reranks](../../rerankings/src/README.md) microservice by assign its path to the environment variable `RERANK_MODEL_ID`, fine-tuned embedding model can be used in [embeddings](../../embeddings/src/README.md) microservice by assign its path to the environment variable `model`, LLMs after instruction tuning can be used in [llms](../../llms/text-generation/README.md) microservice by assign its path to the environment variable `your_hf_llm_model`.

## 🚀4. Descriptions for Finetuning parameters

We utilize [OpenAI finetuning parameters](https://platform.openai.com/docs/api-reference/fine-tuning) and extend it with more customizable parameters, see the definitions at [finetune_config](https://github.com/opea-project/GenAIComps/blob/main/comps/finetuning/finetune_config.py).
We utilize [OpenAI finetuning parameters](https://platform.openai.com/docs/api-reference/fine-tuning) and extend it with more customizable parameters, see the definitions at [finetune_config](https://github.com/opea-project/GenAIComps/blob/main/comps/finetuning/src/integrations/finetune_config.py).
2 changes: 2 additions & 0 deletions comps/finetuning/src/integrations/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from pydantic_yaml import parse_yaml_raw_as
from transformers import TrainerCallback, TrainerControl, TrainerState, TrainingArguments

from comps.finetuning.finetune_config import FinetuneConfig
from comps.finetuning.src.integrations.finetune_config import FinetuneConfig


class FineTuneCallback(TrainerCallback):
Expand All @@ -29,7 +29,7 @@ def main():
callback = FineTuneCallback()
finetune_config["Training"]["callbacks"] = [callback]

from comps.finetuning.llm_on_ray.finetune.finetune import main as llm_on_ray_finetune_main
from comps.finetuning.src.integrations.llm_on_ray.finetune.finetune import main as llm_on_ray_finetune_main

llm_on_ray_finetune_main(finetune_config)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@
from transformers import Trainer, TrainingArguments

from comps import CustomLogger
from comps.finetuning.finetune_config import FinetuneConfig
from comps.finetuning.llm_on_ray import common
from comps.finetuning.llm_on_ray.finetune.data_process import (
from comps.finetuning.src.integrations.finetune_config import FinetuneConfig
from comps.finetuning.src.integrations.llm_on_ray import common
from comps.finetuning.src.integrations.llm_on_ray.finetune.data_process import (
DPOCollator,
DPODataProcessor,
EmbedCollator,
Expand All @@ -35,7 +35,7 @@
TrainDatasetForCE,
TrainDatasetForEmbedding,
)
from comps.finetuning.llm_on_ray.finetune.modeling import BiEncoderModel, CrossEncoder
from comps.finetuning.src.integrations.llm_on_ray.finetune.modeling import BiEncoderModel, CrossEncoder

logger = CustomLogger("llm_on_ray/finetune")

Expand Down Expand Up @@ -394,7 +394,7 @@ def get_trainer(config: Dict, model, ref_model, tokenizer, tokenized_dataset, da
if task == "dpo":
lora_config = config["General"].get("lora_config", None)
peft_config = LoraConfig(**lora_config)
from comps.finetuning.llm_on_ray.finetune.dpo_trainer import DPOTrainer
from comps.finetuning.src.integrations.llm_on_ray.finetune.dpo_trainer import DPOTrainer

trainer = DPOTrainer(
model,
Expand Down Expand Up @@ -431,7 +431,7 @@ def get_trainer(config: Dict, model, ref_model, tokenizer, tokenized_dataset, da
if task == "dpo":
lora_config = config["General"].get("lora_config", None)
peft_config = LoraConfig(**lora_config)
from comps.finetuning.llm_on_ray.finetune.dpo_trainer import GaudiDPOTrainer
from comps.finetuning.src.integrations.llm_on_ray.finetune.dpo_trainer import GaudiDPOTrainer

trainer = GaudiDPOTrainer(
model,
Expand Down
Loading

0 comments on commit efd9578

Please sign in to comment.