Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce results following docs/notebooks/robustdg_getting_started.ipynb #20

Closed
Ardor-Wu opened this issue Apr 18, 2021 · 2 comments

Comments

@Ardor-Wu
Copy link

I have followed instruction from docs/notebooks/robustdg_getting_started.ipynb, but I encountered CUDA out of memory, so I have halved the batch size from 256 to 128. The accuracy is slightly lower than 96.1, like 95.575 or 96.02. That is OK, but my T-SNE Plots are very bad (attached) compared to the given figure. My MIA accuracy is much higher than that on the notebook, typically 65-70, and matchdg has slightly higher MIA accuracy. It is also strange that I ran out of memory with 16 GB memory.
image

I managed to modify the code using nn.DataParallel to run on two GPUs with batch size 256, but the results are similar as above. Another T-SNE plot is attached.
image

I really appreciate any help you can provide.

@divyat09
Copy link
Collaborator

Thanks, Ardor! Typically for batch size 256, you need 32 GB GPU memory. However, for smaller batch sizes like 128, the validation checks frequency for MatchDG Phase-I needs to be changed, and we should be able to achieve similar performance. I will work on it and make the results more easily reproducible with smaller batch sizes.

Regarding MI attacks, the results need to be updated and I suggest you can follow this paper for results. I will soon update them in this repository. But your observations are consistent with Figure 1 of our paper on privacy, ERM gets between 65-70 percent MI attack accuracy (using classifier attack) and MatchDG is slightly higher.

Regarding the T-SNE plots, I am not sure and will take a look. However, I think match function metrics are better to evaluate the representation learnt. Did you observe any discrepancy using the match function metrics? You can find the exact commands to run in the reproducing results notebook. I have also mentioned them below for convenience:

python test.py --dataset rot_mnist --method_name matchdg_ctr --match_case 0.01 --match_flag 1 --pos_metric cos --test_metric match_score

@divyat09
Copy link
Collaborator

divyat09 commented Oct 4, 2021

Hi Ardor, the code has been updated to follow the new results in our paper. The issues that you mentioned above should be resolved now.

@divyat09 divyat09 closed this as completed Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants