In this repository, we explore the usage of Retrieval-Augmented Generation (RAG) on a dental dataset. Although the dataset is not publicly available, significant improvements in factual correctness were observed by fine-tuning on a small subset.
The implementation of the entire pipeline is located in the src/experiment_pipeline.ipynb
file, while various experiments are housed in the src/experiments
directory. The src/core
directory contains the core functional pipelines, including fine-tuning the embeddings and fine-tuning the LLaMA2 model on a closed-source dataset using QLoRA.
Important: This repository heavily depends on the Hugging Face API key. You must generate an access token from Hugging Face and ensure that the account holder has access to the LLaMA2 model, as it is gated.
To use the dental chat LLM pipeline, follow these steps:
-
Clone the repository:
git clone https://github.com/sifat-ahmed/dental-chat-llm.git
-
Install the required dependencies:
pip install -r requirements.txt
-
Obtain a Hugging Face API key:
- Create an account on Hugging Face.
- Generate an API token.
-
Set the Hugging Face API key as an environment variable:
export HF_TOKEN=YOUR_API_KEY
-
Explore the implementation:
- Access the
src/experiment_pipeline.ipynb
file to view the entire pipeline implementation.
- Access the
-
Explore experiments:
- Navigate to the
src/experiments
directory to view different experiments.
- Navigate to the
-
Modify core pipelines:
- In the
src/core
directory, modify pipelines to fine-tune the embedding and the LLaMA2 model on your own dataset.
Note: The LLaMA2 model requires access granted by the Hugging Face account holder.
- In the
-
Run the pipeline:
- Execute the necessary scripts and notebooks to run the pipelines.
For additional guidance, refer to the documentation.
We welcome contributions to the Dental Chat LLM project. If you would like to contribute, follow these steps:
- Fork the repository and create a new branch for your changes.
- Make your changes and ensure that the code is properly formatted.
- Write tests to validate your changes.
- Submit a pull request with a clear description of your changes and the problem it solves.
- Project maintainers will review your pull request and provide feedback as necessary.
- Once approved, your pull request will be merged into the main branch.
Thank you for contributing!
This project is licensed under the Apache License Version 2.0 License. For more details, see the LICENSE file.
For any questions or further assistance, feel free to reach out:
Faisal Ahmed Sifat
Email: [email protected]
We would like to acknowledge the following individuals and organizations for their contributions to the Dental Chat LLM project:
- Md Sahadul Hasan Arian
- Acme Corporation