Quesetions about the evaluation and the details of using SAC algorithm as the backbone for PEX.

Thanks for releasing the codes. I have 2 questions about the algorithm implementation.
1. The PEX codes are building on the IQL(offline and online). When online fine-tuning, the algorithm uses the "dist.sample" to choose w and action_2 for interaction with environment.  But I want to know when evaluating, why do you choose epsilon-greedy to choose w and action_2 instead of purely greedy operation.
2. I am going to use the SAC as the backbone algorithm for EPX like you did in the paper. Due to the lack of SAC version, I want to know how to transfer offline training Q to online SAC. Because the Q of SAC incorporating the entropy term，I don't know if it is reasonable to directly use the offline training Q as the SAC's Q and use the soft bellman equation to update the Q.
3.  Furthermore, I don't understand the adaption for the SAC actor training, showing in the picture below. Could you give me more descriptions about the adaption?
![1702384663551](https://github.com/Haichao-Zhang/PEX/assets/63293902/76836bfb-2f57-48bc-8ec4-1b6513cb428c)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quesetions about the evaluation and the details of using SAC algorithm as the backbone for PEX. #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Quesetions about the evaluation and the details of using SAC algorithm as the backbone for PEX. #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions