summary_metrics #518

LiuJinzhe-Keepgoing · 2025-04-12T11:53:13Z

@JizhanFang 实在是不好意思了，由于我对知识编辑任务理解较浅，我想再跟您请教一下，wise中的评价方式中的Rel和alphaedit原论文中的Efficacy (efficiency success)计算方式有什么不同呢？

我用alphaedit在easyedit上进行2000次编辑，rewrite_acc测试结果为：
    "summary_metrics": {
        "pre": {
            "rewrite_acc": 0.27166825396825395,
            "rephrase_acc": 0.25886884920634917
        },
        "post": {
            "rewrite_acc": 0.23760357142857141,
            "rephrase_acc": 0.20395674603174602,
            "locality": {
                "neighborhood_acc": 0.060382942332522206
            }
        }
    }
在alphaedit原论文的的github代码进行200次编辑，rewrite_acc结果为：
{
  "time": [
    853.607296705246,
    0.0
  ],
  "post_rewrite_acc": [
    94.56,
    13.4
  ],
  "post_paraphrase_acc": [
    90.96,
    19.17
  ],
  "post_neighborhood_acc": [
    32.68,
    22.18
  ],
  "run_dir": "results/AlphaEdit/run_000",
  "num_cases": 2000
}
我理解的是这两个都是在ZSRE 数据集上计算Top-1 accuracy 。想向您请教一下是什么原因造成这个差距的呢？

非常抱歉，我的问题有点多，感谢回复!

Originally posted by @LiuJinzhe-Keepgoing in #502

The text was updated successfully, but these errors were encountered:

JizhanFang · 2025-04-12T14:01:03Z

请问你在EasyEdit上持续编辑2000条是直接编辑2000条然后再评测2000条吗（--sequential_edit）？我刚刚看了alphaedit评测部分的代码，他们的实现是设定一个batch size（原论文是说设定成100条），虽然是持续编辑，但他们的做法是每编辑一个batch然后就评测一个batch。这两种实现方式其实会造成非常大的差距。
其次两者的评测虽然都是teacher forcing取logits再取argmax，也就是你说的top1 accuracy，但是在具体实现上仍然有细微差别，可以参考双方的评测代码，这可能也会造成一定的差距。

zxlzr · 2025-04-13T05:08:24Z

请问您还有其他问题吗？

LiuJinzhe-Keepgoing · 2025-04-13T05:17:49Z

@JizhanFang 感谢您的回复，在easyedit中我是用的sequential edit，运行的是run_alphaedit.py文件，编辑2000条，然后进行eval。这会造成这么大的差距吗？那是不是用easyedit这种评价方式更加合理一些，在整体编辑后进行测评，这样能整体观察编辑后模型对知识的掌握能力。

WanliYoung · 2025-04-14T01:59:23Z

@LiuJinzhe-Keepgoing 一方面EasyEdit这种全部编辑后再进行测试的评价方式更合理，另一方面您可以考虑batch size的影响。按照我的理解，EasyEdit中如果你设置的是sequential edit的话，batch size应该是1，也就是每次只编辑一条知识。但是AlphaEdit按照他们论文中的说法，batch size=100，一般来说用更大的batch size效果确实也会更好一些。个人看法，仅供参考哈

LiuJinzhe-Keepgoing · 2025-04-14T02:39:40Z

好的。谢谢🙏

zxlzr added the question Further information is requested label Apr 13, 2025

LiuJinzhe-Keepgoing closed this as completed Apr 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

summary_metrics #518

summary_metrics #518

LiuJinzhe-Keepgoing commented Apr 12, 2025

JizhanFang commented Apr 12, 2025

zxlzr commented Apr 13, 2025

LiuJinzhe-Keepgoing commented Apr 13, 2025

WanliYoung commented Apr 14, 2025

LiuJinzhe-Keepgoing commented Apr 14, 2025

summary_metrics #518

summary_metrics #518

Comments

LiuJinzhe-Keepgoing commented Apr 12, 2025

JizhanFang commented Apr 12, 2025

zxlzr commented Apr 13, 2025

LiuJinzhe-Keepgoing commented Apr 13, 2025

WanliYoung commented Apr 14, 2025

LiuJinzhe-Keepgoing commented Apr 14, 2025