I tried to reproduce the results on GLUE benchmark. I got F1 score 66 on MPRC, which is much lower than the one reported in the paper(78). I also got accuracy 77 on SST-2. which is also abnormal.
Running script:
python ./run_glue_discrete_LM.py \
--task_name=mrpc \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 16 \
--weight_decay=0.1 --seed=42 \
--k_shot 16 --prompt_learning_rate 1e-4 \
--sample_size 20 --prompt_length 10 \
--prompt_search_space 200 \
--api_limit 8000 --ce_loss True
May I ask how to fix it?
I tried to reproduce the results on GLUE benchmark. I got F1 score 66 on MPRC, which is much lower than the one reported in the paper(78). I also got accuracy 77 on SST-2. which is also abnormal.
Running script:
May I ask how to fix it?