Skip to content

关于 Personal Agent GSM8K 实验中 Student/Teacher 场景训练方式 #113

@ZhifuWei

Description

@ZhifuWei

我在复现 Personal Agent Optimization / GSM8K 实验时,想确认论文中 Student / Teacher 两个 personalization scenario 的口径:

论文报告的 Student / Teacher 分数,是同一个 policy 经过 mixed training 后分别评测得到的,还是分别训练 student-only / teacher-only 两个 policy 后得到的?

我看到仓库中 gsm8k_personal_agent.py 默认似乎是 --scenario mixed,所以目前理解是前者。想请作者确认一下,谢谢!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions