Skip to content

CS 5542 BigData Lab Report #04

Amy Lin edited this page Mar 29, 2017 · 2 revisions

SPARK PROGRAMMING - Machine Learning Tasks : Image Classification ( Decision Tree Algorithm )


[ QUESTION ]

Create your own dataset for Image Classification Problem. Use the workflow as discussed in the Tutorial 4 Session using Decision Tree Algorithm. Report the accuracy and confusion matrix obtained. Include a brief description of your dataset and purpose behind image classification problem.


[ IMPLEMENTATION ]

  • Import needed library & access to Spark.
  • Create a vector for cluster. Declare a list of categories for images.
  • Start extracting information from training data sets.
  • Save features from the images.
  • Process clusters by using K-Means training.
  • Load & Parse the feature data.
  • Cluster the data into 400 classes using K-Means.
  • Calculate Within Set Sum of Squared Error (WSSSE) to evaluate clustering data.
  • Create histograms for all images & reduce features.
  • Build random forest model using the histograms.
  • Split data into 70% training and 30% testing.
  • Train the random forest model. ( 4-10 trees )
  • Use the test data to get the most accurate the model.
  • Print out best error and parameters.
  • Train the random forest model once again.
  • Save & load the mode.
  • Test image classification by using test images and histogram size to determine the prediction of such image.
  • Run the above process from the client side.
  • Generate Confusion Matrix & print out the accuracy of the model.

< DATASET > Training Data: BubbleTea x 40 | Chocolate x 30 | Coffee x 30 | MochaTea x 40 | ShavedIce x 30 || Test Data: 5 for each category.


[ The Purpose of Image Classification ]

  • Determine if the training data is sufficient.
  • Whether the training error is low. (Take out noise in the data)
  • Whether the result of the classifier will be too complicated to build a model.
  • Utilize a relative small set of data for large amount of processing & obtain more training along the way.
  • For further prediction.
Clone this wiki locally