Skip to content

Quesetions about the evaluation and the details of using SAC algorithm as the backbone for PEX. #2

@whalewang410

Description

@whalewang410

Thanks for releasing the codes. I have 2 questions about the algorithm implementation.

  1. The PEX codes are building on the IQL(offline and online). When online fine-tuning, the algorithm uses the "dist.sample" to choose w and action_2 for interaction with environment. But I want to know when evaluating, why do you choose epsilon-greedy to choose w and action_2 instead of purely greedy operation.
  2. I am going to use the SAC as the backbone algorithm for EPX like you did in the paper. Due to the lack of SAC version, I want to know how to transfer offline training Q to online SAC. Because the Q of SAC incorporating the entropy term,I don't know if it is reasonable to directly use the offline training Q as the SAC's Q and use the soft bellman equation to update the Q.
  3. Furthermore, I don't understand the adaption for the SAC actor training, showing in the picture below. Could you give me more descriptions about the adaption?
    1702384663551

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions