- The repository contains implementaions of adversarial traning methods on CIFAR-10 by PyTorch
- The basic experiment setting refers to the setting used in Madry Laboratory.
- Dataset: CIFAR-10 ( 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. )
- Attack method: PGD attack (L2 PGD for Basic Training with Robust Dataset and Non-Robust Dataset; L-infinity PGD for the rest)
- The basic training method adopts ResNet-18 architecture proposed by Kaiming He in CVPR 2016.
python3 basic_training.py
This repository | |
---|---|
Benign accuracy | 94.91% |
Robust accuracy (L-infinity PGD) | 0.07% |
- The method was proposed by Aleksander Madry in ICLR 2018.
- Note: I only trained the model for 50 epochs, it should have a closer result to original paper if you train it for more than 150 epochs.
python3 pgd_adversarial_training.py
This repository | Original paper | |
---|---|---|
Benign accuracy | 76.86% | 87.30% |
Robust accuracy (L-infinity PGD) | 48.43% | 50.00% |
- This defense method was proposed by Alex Lamb in AISec 2019.
- Note: I only trained the model for 50 epochs, it should have a closer result to original paper if you train it for more than 150 epochs.
python3 interpolated_adversarial_training.py
This repository | Original paper | |
---|---|---|
Benign accuracy | 83.72% | 89.88% |
Robust accuracy (L-infinity PGD) | 42.17% | 44.57% |
- The method is proposed by Andrew Ilyas in NIPS 2019.
- They treat the adversarial problem by splitting the dataset into robust and non-robust datasets.
- The robust dataset is constructed from an L2 adversarially trained model (epsilon = 0.5).
- Dataset download: Robust Dataset
- Dataset download: Non-robust Dataset
- Note: I only trained the model for 50 epochs, it should have a closer result to original paper if you train it for more than 150 epochs.
python3 basic_training_robust_dataset.py python3 basic_training_nonrobust_dataset.py
Robust Dataset | Original paper (wide) | |
---|---|---|
Benign accuracy | 78.34% | 84.10% |
Robust accuracy (L2 PGD 0.25) | 34.87% | 48.27% |
Non-Robust Dataset | Original paper (wide) | |
---|---|---|
Benign accuracy | 82.55% | 87.68% |
Robust accuracy (L2 PGD 0.25) | 0.48% | 0.82% |
[1] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks, https://arxiv.org/abs/1706.06083
[2] Alex Lamb, Vikas Verma, Kenji Kawaguchi, Savya Khosla, Juho Kannala, Yoshua Bengio. Interpolated Adversarial Training: Achieving Robust Neural Networks without Sacrificing Too Much Accuracy, https://arxiv.org/abs/1906.06784
[3] Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, Aleksander Madry. Adversarial Examples Are Not Bugs, They Are Features, https://arxiv.org/abs/1905.02175
[4] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition, Computer Vision and Pattern Recognition (CVPR), 2016
[5] https://github.com/ndb796/Pytorch-Adversarial-Training-CIFAR
[6] https://github.com/BorealisAI/advertorch
[7] https://github.com/MadryLab/cifar10_challenge