This project includes the code used in the Unipar Paper
Multi-agent system for translating code between parallel programming APIs (e.g., CUDA to OpenMP) using language models with a feedback loop for error correction.
The system is comprised of:
- A pipeline for evaluating the LLaMA model (or model run with vllm)
- A similar pipeline for running GPT models using the API
- A multi-agent pipeline that can be run after the initial model is run
- A script for comparing compilation rate
- A script for running validation rate
The multi agent pipeline consists of three main components:
- QuestionerAgent: Formulates translation requests for the model, including optional few-shot examples.
- ModelAgent: Interfaces with the language model API to generate translations and fix code errors.
- ExecutionAgent: Tests if the translated code compiles correctly, providing error feedback.
- The QuestionerAgent sends the source code to the ModelAgent for translation
- The ExecutionAgent attempts to compile and run the translated code
- If compilation fails, the ExecutionAgent sends the error to the ModelAgent
- The ModelAgent tries to fix the code based on the error
- The cycle repeats until successful compilation or max iterations reached
-
Clone the repository:
git clone https://github.com/Scientific-Computing-Lab/UniPar_AI.git cd UniPar
-
Create and activate the conda environment:
conda env create -f env.yaml conda activate unipar
Ensure you have all the required dependencies listed in env.yaml before running the program.
Note: The HeCBench dataset is located in the multiagent pipeline folder.
Basic use of inference python script
- When given the 'gpt' argument the script will run with the GPT inference otherwise it uses the standart inference.
- To use different models using the python llama inference script change the model parameter in the code (and the dataset name to label resulting folder correctly),
Usage:
./full_run.sh [model_name]
This script is an example of how to run all the nesesery scripts in order to convert LLM output into code, then compile and run it.
- Notice the parameters at the start of the that can be changed to evaluate different configurations.
- when using each script individually you can change the run_names parameter for result_analysis and eval_run to list multiple paths to run.
Usage:
./full_evaluation_script.sh
Executes the translation task with topic-specific configurations:
- Runs translation for all listed programming domains
- When given the 'gpt' argument the script will run with the GPT inference otherwise it uses the standart inference.
Usage:
./full_run_topic.sh [model_name]
Executed the initial inference from the questioner model and then calls on the remaining pipline steps.
- this includes evaluation for the agent which is slightly diferent than the one for the base model.
Usage:
./full_run_with_agent.sh [dataset_type] [model_name] [max_iterations]
If you encounter memory issues:
- Reduce
--max_tokens
to a lower value - Increase
--num_workers
if you have more GPUs available - Process smaller batches of the dataset at a time
If many kernels fail to compile:
- Check the error messages in the compilation result files
- Adjust the
--temperature
parameter (lower for more conservative translations) - Increase
--max_iterations
to allow more attempts at fixing compilation errors
If translated code compiles but fails at runtime:
- Check the runtime error messages
- Ensure the execution environment has the necessary libraries installed
- Verify that the target hardware supports the target API (e.g., OpenMP)