Graph-R1 Inference

To set up Graph-R1 inference, we still use Graph-R1 as the working directory.

1. First, you need to merge the model weights.

python3 verl/scripts/model_merger.py --backend fsdp --hf_model_path Qwen/Qwen2.5-3B-Instruct --local_dir checkpoints/Graph-R1/Qwen2.5-3B-Instruct_2WikiMultiHopQA_grpo/global_step_40/actor --target_dir checkpoints/Graph-R1/Qwen2.5-3B-Instruct_2WikiMultiHopQA_grpo/model

2. Then, you can start the vLLM server with the merged model weights at 8002 port.

CUDA_VISIBLE_DEVICES=0 nohup vllm serve checkpoints/Graph-R1/Qwen2.5-3B-Instruct_2WikiMultiHopQA_grpo/model --served-model-name agent --port 8002 > result_modelapi_Qwen2.5-3B-Instruct_2WikiMultiHopQA_grpo.log 2>&1 &

3. Set up retrieve server at 8001 port.

fuser -k 8001/tcp
nohup python -u script_api.py --data_source 2WikiMultiHopQA > result_api_2WikiMultiHopQA.log 2>&1 &

4. Now, you can run the inference script to ask questions.

python3 agent/vllm_infer/run.py --question "Which magazine came out first, Tit-Bits or Illustreret Nyhedsblad?"

5. When you finish the inference, you can stop the vLLM and retrieve server by killing port 8002 and 8001.

pkill -TERM -P $(lsof -t -i :8002); kill -9 $(lsof -t -i :8002)
fuser -k 8001/tcp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph-R1 Inference

1. First, you need to merge the model weights.

2. Then, you can start the vLLM server with the merged model weights at 8002 port.

3. Set up retrieve server at 8001 port.

4. Now, you can run the inference script to ask questions.

5. When you finish the inference, you can stop the vLLM and retrieve server by killing port 8002 and 8001.

Example

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Graph-R1 Inference

1. First, you need to merge the model weights.

2. Then, you can start the vLLM server with the merged model weights at 8002 port.

3. Set up retrieve server at 8001 port.

4. Now, you can run the inference script to ask questions.

5. When you finish the inference, you can stop the vLLM and retrieve server by killing port 8002 and 8001.

Example