Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problems if classes are more than six #4

Closed
johnrobertus opened this issue May 10, 2019 · 16 comments
Closed

problems if classes are more than six #4

johnrobertus opened this issue May 10, 2019 · 16 comments

Comments

@johnrobertus
Copy link

johnrobertus commented May 10, 2019

I am using: Multiclass_classification.ipynb
I have seen there is a output_classes = 4 which means you have four classes.

Now i want to use this Code for 10 classes. So I changed output_classes = 10. And I got this error:
ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

However, I tried this code with six classes (output_classes = 6) and it works fine. What should I change for more than six classes?

@MuhammedBuyukkinaci
Copy link
Owner

Thanks for using my repository.

I guess it is due to the fact that your batch size (steps = 8) is lower than 10. You should change steps=8 to 16 or 32 (maybe more). Also, can you give me more details on your class distribution, or could you provide countplot of your target for me?

@johnrobertus
Copy link
Author

  1. increasing the step size does not help unfortunately

  2. class distribution:

import glob
#Train data
train = []
train_labels = []
files = glob.glob ("/Users/johnrobertus/Desktop/100DAGM/class5_non_anomaly_100/*.PNG") # your image path
for myFile in files:
image = cv2.imread (myFile)

image = cv2.resize(image, (IMG_SIZE_ALEXNET,IMG_SIZE_ALEXNET))
image = np.resize(image, (IMG_SIZE_ALEXNET, IMG_SIZE_ALEXNET, 3))
train.append (image)
train_labels.append([1, 0, 0, 0, 0, 0, 0, 0, 0, 0])

files = glob.glob ("/Users/johnrobertus/Desktop/100DAGM/class5_anomaly_100/*.png")
for myFile in files:
image = cv2.imread (myFile)

image = cv2.resize(image, (IMG_SIZE_ALEXNET,IMG_SIZE_ALEXNET))
image = np.resize(image, (IMG_SIZE_ALEXNET, IMG_SIZE_ALEXNET, 3))
train.append (image)
train_labels.append([0, 1, 0, 0, 0, 0, 0, 0, 0, 0])

files = glob.glob ("/Users/johnrobertus/Desktop/100DAGM/class4_non_anomaly/*.png")
for myFile in files:
image = cv2.imread (myFile)

image = cv2.resize(image, (IMG_SIZE_ALEXNET,IMG_SIZE_ALEXNET))
image = np.resize(image, (IMG_SIZE_ALEXNET, IMG_SIZE_ALEXNET, 3))
train.append (image)
train_labels.append([0, 0, 1, 0, 0, 0, 0, 0, 0, 0])

files = glob.glob ("/Users/johnrobertus/Desktop/100DAGM/class4_anomaly_100/*.png")
for myFile in files:
image = cv2.imread (myFile)

image = cv2.resize(image, (IMG_SIZE_ALEXNET,IMG_SIZE_ALEXNET))
image = np.resize(image, (IMG_SIZE_ALEXNET, IMG_SIZE_ALEXNET, 3))
train.append (image)
train_labels.append([0, 0, 0, 1, 0, 0, 0, 0, 0, 0])

files = glob.glob ("/Users/johnrobertus/Desktop/100DAGM/class3_non_anomaly_100/*.png")
for myFile in files:
image = cv2.imread (myFile)

image = cv2.resize(image, (IMG_SIZE_ALEXNET,IMG_SIZE_ALEXNET))
image = np.resize(image, (IMG_SIZE_ALEXNET, IMG_SIZE_ALEXNET, 3))
train.append (image)
train_labels.append([0, 0, 0, 0, 1, 0, 0, 0, 0, 0])

files = glob.glob ("/Users/johnrobertus/Desktop/100DAGM/class3_anomaly_100/*.png")
for myFile in files:
image = cv2.imread (myFile)

image = cv2.resize(image, (IMG_SIZE_ALEXNET,IMG_SIZE_ALEXNET))
image = np.resize(image, (IMG_SIZE_ALEXNET, IMG_SIZE_ALEXNET, 3))
train.append (image)
train_labels.append([0, 0, 0, 0, 0, 1, 0, 0, 0, 0])


files = glob.glob ("/Users/johnrobertus/Desktop/100DAGM/class2_non_anomaly_100/*.png")

for myFile in files:
image = cv2.imread (myFile)

image = cv2.resize(image, (IMG_SIZE_ALEXNET,IMG_SIZE_ALEXNET))
image = np.resize(image, (IMG_SIZE_ALEXNET, IMG_SIZE_ALEXNET, 3))
train.append (image)
train_labels.append([0, 0, 0, 0, 0, 0, 1, 0, 0, 0])

files = glob.glob ("/Users/johnrobertus/Desktop/100DAGM/class2_anomaly_100/*.png")

for myFile in files:
image = cv2.imread (myFile)

image = cv2.resize(image, (IMG_SIZE_ALEXNET,IMG_SIZE_ALEXNET))
image = np.resize(image, (IMG_SIZE_ALEXNET, IMG_SIZE_ALEXNET, 3))
train.append (image)
train_labels.append([0, 0, 0, 0, 0, 0, 0, 1, 0, 0])

files = glob.glob ("/Users/johnrobertus/Desktop/100DAGM/class1_non_anomaly_100/*.PNG")

for myFile in files:
image = cv2.imread (myFile)

image = cv2.resize(image, (IMG_SIZE_ALEXNET,IMG_SIZE_ALEXNET))
image = np.resize(image, (IMG_SIZE_ALEXNET, IMG_SIZE_ALEXNET, 3))
train.append (image)
train_labels.append([0, 0, 0, 0, 0, 0, 0, 0, 1, 0])

files = glob.glob ("/Users/johnrobertus/Desktop/100DAGM/class1_anomaly_100/*.png")

for myFile in files:
image = cv2.imread (myFile)

image = cv2.resize(image, (IMG_SIZE_ALEXNET,IMG_SIZE_ALEXNET))
image = np.resize(image, (IMG_SIZE_ALEXNET, IMG_SIZE_ALEXNET, 3))
train.append (image)
train_labels.append([0, 0, 0, 0, 0, 0, 0, 0, 0, 1])

train = np.array(train,dtype='float32') #as mnist
train_labels = np.array(train_labels,dtype='int32') #as mnist

@MuhammedBuyukkinaci
Copy link
Owner

MuhammedBuyukkinaci commented May 10, 2019

If your class distribution is balanced(which I understood), you don't have to use ROC for evaluation. Accuracy and LogLoss are fine. I just used it to use it.

@johnrobertus
Copy link
Author

thank you that worked for me

@MuhammedBuyukkinaci
Copy link
Owner

Great to hear it worked properly. Please like the repository if you like it.

@johnrobertus
Copy link
Author

johnrobertus commented May 10, 2019

I have another unrelated question:

I ran the same code on a dataset of 50k images per class with a total amount of six classes in a high configuration cluster. After a couple of minutes I received a memory error:

Traceback (most recent call last):
File "/var/lib/condor/execute/dir_23348/condor_exec.exe", line 97, in
x, x_test, y, y_test = train_test_split(train,train_labels,test_size=0.2,train_size=0.8, random_state = 42)
File "/usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py", line 2212, in train_test_split
safe_indexing(a, test)) for a in arrays))
File "/usr/local/lib/python3.6/dist-packages/sklearn/model_selection/_split.py", line 2212, in
safe_indexing(a, test)) for a in arrays))
File "/usr/local/lib/python3.6/dist-packages/sklearn/utils/init.py", line 216, in safe_indexing
return X.take(indices, axis=0)
MemoryError

As a consequence I reduced the amount per class gradually and it only worked with an amount of under 500 images per class. I would've liked to run this code with more data. Is there something I have to change in the code for larger datasets?

@MuhammedBuyukkinaci
Copy link
Owner

MuhammedBuyukkinaci commented May 11, 2019

As I understood, you became out of memory, which means your RAM isn't enough to load. You can reduce your data or you may use
import gc; gc.collect()
in your code to clean cache continuously.

@HostGuest
Copy link

please i wanna ask , i have image classification using cnn for 5 classes ( cloudy , snowy , foggy , sunny, rainy ) in each folder i have about 1300 images , what does it mean by image label ( is that mean : rainy1.jpg , rainy2.jpg ) the names of images or somthing else like ( the real category of image ( snowy ,foggy,rainy......... ) sorry i'm begginer

@iamsuzank
Copy link

i cant find the code to convert training and testing images ( you have provided the link of kaggle) into numpy array.please provide this code

@MuhammedBuyukkinaci
Copy link
Owner

Hi, I hope the code below is what you need.

#Labeling data
def label_img(img):
    word_label = img.split('_')[0]
    if word_label == 'glass': return [1,0]
    elif word_label == 'table': return [0,1]

#Function for importing data(images) from train directory.
def create_train_data():
    training_data = []
    for img in tqdm(os.listdir(TRAIN_DIR)):
        label = label_img(img)
        path = os.path.join(TRAIN_DIR,img)
        img = cv2.imread(path,1)
        img = cv2.resize(img, (IMG_SIZE,IMG_SIZE))
        training_data.append([np.array(img),np.array(label)])
    shuffle(training_data)
    np.save('train_data_bi.npy', training_data)
    return training_data

@naufalirfani
Copy link

How to make test_data_mc.npy?

@MuhammedBuyukkinaci
Copy link
Owner

Download data from kaggle. Put them in a directory. Use the function below to create test_data_mc.npy

#Function for importing data(images) from train directory.
def create_train_data():
    training_data = []
    for img in tqdm(os.listdir(TRAIN_DIR)):
        label = label_img(img)
        path = os.path.join(TRAIN_DIR,img)
        img = cv2.imread(path,1)
        img = cv2.resize(img, (IMG_SIZE,IMG_SIZE))
        training_data.append([np.array(img),np.array(label)])
    shuffle(training_data)
    np.save('train_data_bi.npy', training_data)
    return training_data

@naufalirfani
Copy link

naufalirfani commented Jun 23, 2020

I try to train with 42 classes so i change the step_size to 42, but i get an error "Tensorflow Serving | Input to reshape is a tensor with 10452 values, but the requested shape has 9216." What should i do?

@MuhammedBuyukkinaci
Copy link
Owner

MuhammedBuyukkinaci commented Jun 28, 2020

If your target has 42 classes, output_classes variable should be 42 .

@naufalirfani
Copy link

naufalirfani commented Jun 29, 2020

what can i do to increase the accuracy? because i train with 42 classes, my accuracy is very low

@MuhammedBuyukkinaci
Copy link
Owner

You can augment your training data using ImageDataGenerator, which is under tensorflow.keras. You can use a pretrained neural network to initialize your training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants