This project allows developers to analyze coding projects locally while maintaining data privacy. It is ideal for those who need to gain insights into another team's codebase without exposing sensitive information.
- meta-llama/Llama-3.2-1B-Instruct
- meta-llama/Llama-3.2-3B-Instruct
- meta-llama/Llama-3.1-8B-Instruct
The project consists of two main scripts:
project_analyser.py
: Examines the structural and content aspects of coding projects.pdfs_similarity.py
: Compares the structure and content of multiple PDF files and converts them to markdown format.
Both scripts leverage a text-generation model (meta-llama
) to process data and generate insights. The entire process is conducted offline, ensuring the privacy and security of your data. System prompts are predefined in the code, and user prompts are generated for each file individually.
- Create a HuggingFace account and obtain an access token. Save this token for later use.
- Install CUDA 12.1, CUDNN 9.3, PyTorch '2.4.1+cu121', and transformers 4.47.0, and ensure you're using Python 3.10.
- Run
pip install -r requirements.txt
to install necessary dependencies. - Execute
pip install huggingface_hub
to enable logging in from the terminal or command prompt. - Log into your HuggingFace account via the terminal or command prompt by executing
huggingface-cli login
and providing your access token. - In
utils.py
, select yourDEFAULT_MODEL
from the available options. Note that other models have not been tested but may work. - Request access to the gated repository on HuggingFace based on your chosen
DEFAULT_MODEL
by submitting your details.
To analyze a coding project, execute the following command:
python project_analyser.py
usage: project_analyser.py [-h] --project_folder PROJECT_FOLDER [--suffix SUFFIX]
-
-h
,--help
Display help information and exit. -
--project_folder PROJECT_FOLDER
Specify the path to the folder containing the coding project. -
--suffix SUFFIX
Define a suffix for the output file that stores the analysis results. If omitted, a timestamp will be used. -
--interactive
Engage in an interactive loop for additional inputs after the initial analysis.
To analyze PDFs, use the following command:
python pdfs_similarity.py
In this case, the system instruction is to find structural similarities between multiple PDF files by converting them to markdown/HTML format. You may change the system instruction in this python file in case you want to do something else with the PDFs, such as summarizing them.
usage: pdfs_similarity.py [-h] [--pdf_folder PDF_FOLDER] [--suffix SUFFIX] [--output_format {markdown,html}] [--interactive]
-
-h
,--help
Display help information and exit. -
--pdf_folder PDF_FOLDER
Specify the path to the folder containing input PDF files. -
--suffix SUFFIX
Define a suffix for the output file that stores the analysis results. If omitted, a timestamp will be used. The output file does not store analysis from interactive mode. -
--output_format {markdown,html}
Choose the format for extracting PDF content to be used in user prompts. Options include:markdown
html
-
--interactive
Engage in an interactive loop for additional inputs after the initial analysis. Type "exit" to quit.
generate_readymade_prompt.py
and pdf_to_markdown.py
are utility scripts that can be used to generate prompts for analysis for both the above usecases and to convert PDFs to markdown format, respectively, in case your PC is not powerful enough and you have access to a proprietary Generative AI service.