Skip to content

CS 5542 BigData Lab Report #05

Amy Lin edited this page Mar 29, 2017 · 2 revisions

SPARK PROGRAMMING - Machine Learning Tasks : Image Classification (Exclude Decision Tree)


[ QUESTION ]

Define your own data sets for Image Classification Problem. Use the workflow as discussed in the Tutorial 4 Session using any classification algorithm(e.g. Random Forest, Naive Bayes...) excluding Decision Tree. Report the accuracy and Confusion Matrix Obtained.


[ IMPLEMENTATION ]

  • Import needed library & access to Spark.
  • Create a vector for cluster. Declare a list of categories for images.
  • Start extracting information from training data sets.
  • Save features from the images.
  • Process clusters by using K-Means training.
  • Load & Parse the feature data.
  • Set number of trees for the total of your data sets.
  • Cluster the data into 400 classes using K-Means.
  • Calculate Within Set Sum of Squared Error (WSSSE) to evaluate clustering data.
  • Create histograms for all images & reduce features.
  • Build random forest model using the histograms.
  • Split data into 70% training and 30% testing.
  • Train the random forest model. ( 4-10 trees )
  • Use the test data to get the most accurate the model.
  • Print out best error and parameters.
  • Train the random forest model once again.
  • Save & load the mode.
  • Test image classification by using test images and histogram size to determine the prediction of such image.
  • Run the above process from the client side.
  • Generate Confusion Matrix & print out the accuracy of the model.

< DATASET > Training Data: BubbleTea x 40 | Chocolate x 30 | Coffee x 30 | MochaTea x 40 || Test Data: 4 for each category.

< PURPOSE OF IMAGE CLASSIFICATION PROBLEM > Obtain/extract features and keypoints from training data sets. Build up model & provide learning mechanism for the machine. Label data sets so we can use them to do further predictions. Goal: best accuracy from training+testing data sets without clashing into problems such as overfitting.


Client Application - Spark API

[ QUESTION ]

Write a client application using the Spark API to connecting between Spark and your client. Your client can be either Web application or Android application. Refer to Tutorial 5 Spark API tutorial.


[ IMPLEMENTATION ]

  • Run Spark Program for identifying images using user defined data sets in real-time. ( Models are all set. )
  • Connect the server with client using superstatic from command line.
  • Copy and paste the link to google Chrome using JetBrain's extension to run & connect Spark Program.

Google Conversion Actions API

[ QUESTION ]

Build a simple application to have a conversion using Google Conversation Actions API about the summary you had generated about your video. Refer to Tutorial 5 Conversion Actions API tutorial.


[ IMPLEMENTATION ]

  • Run Spark Program for identifying images using user defined data sets in real-time. ( Models are all set. )
  • Connect the server with client using superstatic from command line.
  • Copy and paste the link to google Chrome using JetBrain's extension to run & connect Spark Program.

Clone this wiki locally