Skip to content

summary_metrics #518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
LiuJinzhe-Keepgoing opened this issue Apr 12, 2025 · 5 comments
Closed

summary_metrics #518

LiuJinzhe-Keepgoing opened this issue Apr 12, 2025 · 5 comments
Labels
question Further information is requested

Comments

@LiuJinzhe-Keepgoing
Copy link

@JizhanFang 实在是不好意思了,由于我对知识编辑任务理解较浅,我想再跟您请教一下,wise中的评价方式中的Rel和alphaedit原论文中的Efficacy (efficiency success)计算方式有什么不同呢?

我用alphaedit在easyedit上进行2000次编辑,rewrite_acc测试结果为:

    "summary_metrics": {
        "pre": {
            "rewrite_acc": 0.27166825396825395,
            "rephrase_acc": 0.25886884920634917
        },
        "post": {
            "rewrite_acc": 0.23760357142857141,
            "rephrase_acc": 0.20395674603174602,
            "locality": {
                "neighborhood_acc": 0.060382942332522206
            }
        }
    }

在alphaedit原论文的的github代码进行200次编辑,rewrite_acc结果为:

{
  "time": [
    853.607296705246,
    0.0
  ],
  "post_rewrite_acc": [
    94.56,
    13.4
  ],
  "post_paraphrase_acc": [
    90.96,
    19.17
  ],
  "post_neighborhood_acc": [
    32.68,
    22.18
  ],
  "run_dir": "results/AlphaEdit/run_000",
  "num_cases": 2000
}

我理解的是这两个都是在ZSRE 数据集上计算Top-1 accuracy 。想向您请教一下是什么原因造成这个差距的呢?

非常抱歉,我的问题有点多,感谢回复!

Originally posted by @LiuJinzhe-Keepgoing in #502

@JizhanFang
Copy link
Collaborator

请问你在EasyEdit上持续编辑2000条是直接编辑2000条然后再评测2000条吗(--sequential_edit)?我刚刚看了alphaedit评测部分的代码,他们的实现是设定一个batch size(原论文是说设定成100条),虽然是持续编辑,但他们的做法是每编辑一个batch然后就评测一个batch。这两种实现方式其实会造成非常大的差距。
其次两者的评测虽然都是teacher forcing取logits再取argmax,也就是你说的top1 accuracy,但是在具体实现上仍然有细微差别,可以参考双方的评测代码,这可能也会造成一定的差距。

@zxlzr
Copy link
Contributor

zxlzr commented Apr 13, 2025

请问您还有其他问题吗?

@zxlzr zxlzr added the question Further information is requested label Apr 13, 2025
@LiuJinzhe-Keepgoing
Copy link
Author

@JizhanFang 感谢您的回复,在easyedit中我是用的sequential edit,运行的是run_alphaedit.py文件,编辑2000条,然后进行eval。这会造成这么大的差距吗?那是不是用easyedit这种评价方式更加合理一些,在整体编辑后进行测评,这样能整体观察编辑后模型对知识的掌握能力。

@WanliYoung
Copy link
Collaborator

@LiuJinzhe-Keepgoing 一方面EasyEdit这种全部编辑后再进行测试的评价方式更合理,另一方面您可以考虑batch size的影响。按照我的理解,EasyEdit中如果你设置的是sequential edit的话,batch size应该是1,也就是每次只编辑一条知识。但是AlphaEdit按照他们论文中的说法,batch size=100,一般来说用更大的batch size效果确实也会更好一些。个人看法,仅供参考哈

@LiuJinzhe-Keepgoing
Copy link
Author

好的。谢谢🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants