Empowering AI with Determined Precision and Speed
Built with the tools and technologies:
- Overview
- Features
- Project Structure
- Getting Started
- Project Roadmap
- Contributing
- License
- Acknowledgments
The GRAG-HessianAI-Training-Pipeline project streamlines deep learning model training by orchestrating dataset processing, hyperparameter tuning, and distributed computing. It offers seamless integration with external libraries and custom utilities, optimizing model training efficiency. Targeting AI researchers and developers, it simplifies the training workflow for enhanced model performance and evaluation.
Feature | Summary | |
---|---|---|
βοΈ | Architecture |
|
π© | Code Quality |
|
π | Documentation |
|
π | Integrations |
|
𧩠| Modularity |
|
π§ͺ | Testing |
|
β‘οΈ | Performance |
|
π‘οΈ | Security |
|
π¦ | Dependencies |
|
βββ GRAG-HessianAI-Training-Pipeline/
βββ GRAG_Hessian_AI_Determined_Training_Pipeline
βββ Orpo_attendee.yaml
βββ README.md
βββ chat_format.py
βββ config.yaml
βββ cpt.yaml
βββ cpt_finetune.py
βββ cptold.txt
βββ dpo.yaml
βββ dpo_finetune.py
βββ ds_configs
βββ inference.py
βββ lora.yaml
βββ lora_finetune.py
βββ lora_utils.py
βββ metadata.json
βββ old_startup-hook.sh
βββ orpo.yaml
βββ orpo_finetune.py
βββ requirements.txt
βββ sft.yaml
βββ sft_finetune.py
βββ startup-hook.sh
βββ untitled.txt
βββ utils.py
βββ utils_lora_old.py
GRAG-HESSIANAI-TRAINING-PIPELINE/
__root__
GRAG_Hessian_AI_Determined_Training_Pipeline
Orpo_attendee.yaml - Defines training pipeline configuration for LLAMA_8B_ORPO_attendee, specifying resources, hyperparameters, and environment settings
- Sets up deep learning model training with specific dataset subsets and training arguments, including batch size, learning rate, and evaluation strategy
- Configures deepspeed for optimization and gradient checkpointing.cpt_finetune.py - The code file `cpt_finetune.py` orchestrates the training pipeline by loading datasets, setting up special tokens, and initializing the training process
- It leverages distributed computing and fine-tuning techniques to train a model based on specified hyperparameters
- The file integrates with external libraries and custom utilities to streamline the training workflow within the project architecture.requirements.txt - Facilitates project dependencies management by specifying required packages and versions
- This file ensures the project can leverage essential libraries like transformers, datasets, and scikit-learn for seamless integration and functionality within the codebase architecture.lora_finetune.py - The code file orchestrates the loading and processing of datasets for training a conversational AI model
- It ensures the datasets are in the correct format and applies necessary transformations
- Additionally, it sets up special tokens for the model and initiates the training process with the specified training arguments and callbacks.sft_finetune.py - The code file orchestrates the training pipeline for fine-tuning a language model using a self-feeding chat dataset
- It loads the dataset, sets up special tokens, formats prompts, and initiates training with specific configurations
- The file integrates with external libraries and tools to facilitate efficient model training and evaluation within the project architecture.chat_format.py - Generate chat ML templates for user, system, and assistant messages based on predefined roles within the chat messages
- The code defines templates for different message roles and formats them accordingly for ML processing
- Additionally, it provides functions to retrieve assistant prompts and template IDs for responses, enhancing the chat generation process.metadata.json - Tracks the progress and identification of a specific trial within the training pipeline, capturing the number of completed steps and the unique trial ID
- This metadata file plays a crucial role in monitoring and managing the training process within the project architecture.config.yaml Define container bind mounts, environment configurations, and resource allocations for the training pipeline in the project architecture. cptold.txt - Define training pipeline configuration for Qwen1.5 model with specific hyperparameters and resources
- Specifies dataset subsets, model details, and training settings for the AI model.sft.yaml - Facilitates training a deep learning model with Determined AI, leveraging a specific dataset and hyperparameters configuration
- Manages resource allocation, environment setup, and training parameters for the training pipeline.untitled.txt Patch the HF callback script to handle additional metric types for improved training pipeline functionality. lora.yaml - Facilitates training a language model using a specific dataset and hyperparameters
- Manages resource allocation, environment setup, and training configuration for the Nemo_12B_Lora_ORPO_attendee project.utils.py - Facilitates model retrieval and tokenization for the AI training pipeline
- Handles model loading based on inference mode, customizes tokenization parameters, and downloads model checkpoints
- Integrates with Determined for distributed training.dpo.yaml - Facilitates training pipeline configuration for deep learning model fine-tuning with Determined AI
- Specifies resources, hyperparameters, and environment settings for the training job
- Manages data subsets, loss function, and training strategies
- Enables efficient model training and evaluation.dpo_finetune.py - The code file orchestrates the training pipeline for a Determined AI model by loading datasets, processing conversation formats, and training a model using specified hyperparameters
- It ensures data compatibility, tokenization, and model training with distributed support, ultimately facilitating efficient model training and evaluation.cpt.yaml - Facilitates training of a custom AI model using DeepSpeed with specific hyperparameters and configurations
- Manages data subsets, batch sizes, mixed precision, and gradient accumulation steps for efficient training
- Enables fine-tuning of the Mistral-Nemo-Base-2407 model on the GRAG-CPT-Hessian-AI dataset for various language tasks.utils_lora_old.py - Facilitates model retrieval, tokenizer setup, and checkpoint downloading for the AI training pipeline
- Handles model variations, including Lora integration, and ensures proper tokenization
- Enables seamless model deployment and maintenance within the project architecture.inference.py - The code in `inference.py` orchestrates model inference using Determined AI, generating responses based on input data
- It leverages a pre-trained model to process conversations and produce corresponding outputs
- The script facilitates the deployment of the model for real-time inference tasks, handling input processing, model generation, and result storage.orpo_finetune.py - The code file orchestrates the training pipeline for fine-tuning a conversational AI model using the ORPO technique
- It handles dataset processing, model setup, and training execution
- The code integrates with Determined AI for distributed training and leverages Hugging Face Transformers for model management
- The main function initiates the training process based on specified parameters and hyperparameters.lora_utils.py - Enables retrieval of pre-trained language models and tokenizers, facilitating model inference and training
- Supports custom configurations for model architecture and tokenization
- Additionally, provides functionality for downloading model checkpoints and defining tokenization functions.startup-hook.sh Patch startup script to upgrade dependencies and fix a bug in the Hugging Face callback module for the AI training pipeline. orpo.yaml - Facilitates training of a mini ORPO model with attendee-specific data subsets
- Utilizes a specific deep learning model and dataset for training, with customized training arguments and hyperparameters
- Implements a deepspeed configuration for efficient training.old_startup-hook.sh Patch the startup script to upgrade dependencies and modify a specific condition for training metrics handling. ds_configs
ds_config_stage_1.json - Define training pipeline configurations for stage 1 with automatic settings for mixed precision, optimizer, scheduler, zero optimization, gradient accumulation, and gradient clipping
- Includes options for batch sizes and FLOPs profiling.ds_config_stage_2_cpu_offload.json - Define CPU offload configuration for stage 2 training pipeline in ds_config_stage_2_cpu_offload.json
- Configure FP16, AdamW optimizer, WarmupLR scheduler, and zero optimization settings for gradient accumulation and clipping
- Fine-tune training batch size and micro-batch size per GPU
- Optionally enable FLOPs profiler for detailed performance analysis.ds_config_stage_2.json - Define configuration settings for stage 2 of the training pipeline, specifying optimization parameters, gradient accumulation, and zero optimization strategies
- This file plays a crucial role in fine-tuning training performance and resource utilization within the project architecture.ds_config_stage_3.json - Optimizes training pipeline by configuring mixed precision, optimizer settings, and zero optimization parameters for efficient model training
- Fine-tunes batch sizes, gradient accumulation, and clipping for improved performance.
Before getting started with GRAG-HessianAI-Training-Pipeline, ensure your runtime environment meets the following requirements:
- Programming Language: Python
- Package Manager: Pip
Install GRAG-HessianAI-Training-Pipeline using one of the following methods:
Build from source:
- Clone the GRAG-HessianAI-Training-Pipeline repository:
β― git clone ../GRAG-HessianAI-Training-Pipeline
- Navigate to the project directory:
β― cd GRAG-HessianAI-Training-Pipeline
- Install the project dependencies:
β― pip install -r GRAG_Hessian_AI_Determined_Training_Pipeline/requirements.txt
Run GRAG-HessianAI-Training-Pipeline using the following command:
Using pip
Β
β― python {entrypoint}
Run the test suite using the following command:
Using pip
Β
β― pytest
-
Task 1
:Implement feature one. -
Task 2
: Implement feature two. -
Task 3
: Implement feature three.
- π¬ Join the Discussions: Share your insights, provide feedback, or ask questions.
- π Report Issues: Submit bugs found or log feature requests for the
GRAG-HessianAI-Training-Pipeline
project. - π‘ Submit Pull Requests: Review open PRs, and submit your own PRs.
Contributing Guidelines
- Fork the Repository: Start by forking the project repository to your LOCAL account.
- Clone Locally: Clone the forked repository to your local machine using a git client.
git clone /Users/soumyapaul/Downloads/GRAG-HessianAI-Training-Pipeline
- Create a New Branch: Always work on a new branch, giving it a descriptive name.
git checkout -b new-feature-x
- Make Your Changes: Develop and test your changes locally.
- Commit Your Changes: Commit with a clear message describing your updates.
git commit -m 'Implemented new feature x.'
- Push to LOCAL: Push the changes to your forked repository.
git push origin new-feature-x
- Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.
- Review: Once your PR is reviewed and approved, it will be merged into the main branch. Congratulations on your contribution!
This project is protected under the Apache License, Version 2.0 License.
--- Contributors: Marcel Rosiak, Soumya Paul, Siavash Mollaebrahim, Zain Ul Haq