To set up Graph-R1 inference, we still use Graph-R1 as the working directory.
python3 verl/scripts/model_merger.py --backend fsdp --hf_model_path Qwen/Qwen2.5-3B-Instruct --local_dir checkpoints/Graph-R1/Qwen2.5-3B-Instruct_2WikiMultiHopQA_grpo/global_step_40/actor --target_dir checkpoints/Graph-R1/Qwen2.5-3B-Instruct_2WikiMultiHopQA_grpo/modelCUDA_VISIBLE_DEVICES=0 nohup vllm serve checkpoints/Graph-R1/Qwen2.5-3B-Instruct_2WikiMultiHopQA_grpo/model --served-model-name agent --port 8002 > result_modelapi_Qwen2.5-3B-Instruct_2WikiMultiHopQA_grpo.log 2>&1 &fuser -k 8001/tcp
nohup python -u script_api.py --data_source 2WikiMultiHopQA > result_api_2WikiMultiHopQA.log 2>&1 &python3 agent/vllm_infer/run.py --question "Which magazine came out first, Tit-Bits or Illustreret Nyhedsblad?"5. When you finish the inference, you can stop the vLLM and retrieve server by killing port 8002 and 8001.
pkill -TERM -P $(lsof -t -i :8002); kill -9 $(lsof -t -i :8002)
fuser -k 8001/tcp