Train a baseline classifier #279

marco-c · 2018-12-07T17:19:52Z

We need to find some good options for the classifier to reach a baseline acceptable accuracy:

Try vgg16, pretrained on imagenet
Try vgg16, training from scratch
Try vgg16, pretrained with pretrain.py

We can start with the RMSProp optimizer.

sdv4 · 2018-12-10T23:12:37Z

Class balance:
Y: 53%
D+N : 47%

Network - vgg16
Pretrained - none
Image size: (48,48)
Optimiser - sgd
Epochs - 50
Test Set Accuracy - 84.54%
N+D Prediction Precision: 83.33%
Confusion matrix:
[[215 50]
[43 300]]

122658933ee9_19_38_2018_12_17.txt

Network - vgg16
Pretrained - none
Image size: (48,48)
Optimiser - rmsprop
Epochs - 50
Test Set Accuracy - 82.57%
N+D Prediction Precision: 84.78%
Confusion matrix:
[[195 70]
[35 308]]

2678b9ed874b_05_31_2018_12_17.txt

marco-c · 2018-12-16T22:18:56Z

What exactly did you do in the pretrain.py case?

marco-c · 2018-12-16T22:19:13Z

Also, could you try using rmsprop instead of sgd?

marco-c · 2018-12-16T22:19:39Z

Also, could you try using rmsprop instead of sgd?

Actually, both, to compare exactly their results.

sdv4 · 2018-12-16T22:26:47Z

What exactly did you do in the pretrain.py case?

I set target_size=(48,48) in utils.py/load_image, then ran my notebook with !python3 pretrain.py -n=vgg16 -o=sgd

Actually, both, to compare exactly their results.

Will do.

marco-c · 2018-12-16T22:28:10Z

I set target_size=(48,48) in utils.py/load_image, then ran my notebook with !python3 pretrain.py -n=vgg16 -o=sgd

OK. So, pretrain.py is meant to only pretrain the network. In theory, we should first pretrain with pretrain.py, then use the pretrained model to run train.py.

sdv4 · 2018-12-16T22:32:03Z

Ahh, I was thinking that this might be the case. Now it makes sense why there was no .txt file, etc.
Do you have info on what is going on in pretrain.py; in particular the "slightly different problem (for which we know the solution)"?

marco-c · 2018-12-16T22:39:38Z

This explains it all:
https://github.com/marco-c/autowebcompat/blob/master/pretrain.py#L58

The goal of the classifier in the pretrain case is to detect when two screenshots belong to the same website.

sdv4 · 2018-12-16T22:41:33Z

This explains it all:
https://github.com/marco-c/autowebcompat/blob/master/pretrain.py#L58

Indeed it does, thanks!

sdv4 · 2018-12-20T23:00:37Z

Class balance:
Y: 53%
D+N : 47%

Network - vgg16
Pretrained - imagenet
Image size: (224,224)
Optimiser - sgd
Epochs - 50
Test Set Accuracy - 84.9%
N+D Prediction Precision: 85.5%
Confusion matrix:
[[254 47]
[43 264]]

tensorflow-1-vm_21_49_2018_12_20.txt

Network - vgg16
Pretrained - imagenet
Image size: (224,224)
Optimiser - rmsprop
Epochs - 50
Test Set Accuracy - 80.9%
N+D Prediction Precision: 81.6%
Confusion matrix:
[[222 73]
[ 50 263]]

da78b06ff04a_00_41_2019_01_27.txt

sdv4 · 2019-01-27T20:16:46Z

@marco-c I also performed hyper-parameter optimization with random search, and Hyperband. The best configuration found was via random search:

Network - vgg16
Pretrained - no
Test Set Accuracy - 80.1%
N+D Prediction Precision: 88.1%
Image size: (48,48)
Optimizer: Adam
Learning rate: 1.7e-5
Decay: 1e-6
Momentum: 0.74
Epsilon: 8.4e-8
fc1 L2 regularization strength: 2.32e-2
fc2 L2 regularization strength: 5.19e-3
fc1 dropout: 3.37e-7
fc2 dropout: 1.71e-7

marco-c · 2019-01-28T09:36:19Z

Not bad! We should improve the labeling to have better/more precise results

marco-c added this to the 1. Classifier with baseline accuracy/precision milestone Dec 7, 2018

sdv4 self-assigned this Dec 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train a baseline classifier #279

Train a baseline classifier #279

marco-c commented Dec 7, 2018 •

edited by sdv4

Loading

sdv4 commented Dec 10, 2018 •

edited

Loading

marco-c commented Dec 16, 2018

marco-c commented Dec 16, 2018

marco-c commented Dec 16, 2018

sdv4 commented Dec 16, 2018

marco-c commented Dec 16, 2018

sdv4 commented Dec 16, 2018

marco-c commented Dec 16, 2018

sdv4 commented Dec 16, 2018

sdv4 commented Dec 20, 2018 •

edited

Loading

sdv4 commented Jan 27, 2019

marco-c commented Jan 28, 2019

Train a baseline classifier #279

Train a baseline classifier #279

Comments

marco-c commented Dec 7, 2018 • edited by sdv4 Loading

sdv4 commented Dec 10, 2018 • edited Loading

marco-c commented Dec 16, 2018

marco-c commented Dec 16, 2018

marco-c commented Dec 16, 2018

sdv4 commented Dec 16, 2018

marco-c commented Dec 16, 2018

sdv4 commented Dec 16, 2018

marco-c commented Dec 16, 2018

sdv4 commented Dec 16, 2018

sdv4 commented Dec 20, 2018 • edited Loading

sdv4 commented Jan 27, 2019

marco-c commented Jan 28, 2019

marco-c commented Dec 7, 2018 •

edited by sdv4

Loading

sdv4 commented Dec 10, 2018 •

edited

Loading

sdv4 commented Dec 20, 2018 •

edited

Loading