-
Notifications
You must be signed in to change notification settings - Fork 0
CS 5542 BigData Lab Report #05
Amy Lin edited this page Mar 29, 2017
·
2 revisions
Define your own data sets for Image Classification Problem. Use the workflow as discussed in the Tutorial 4 Session using any classification algorithm(e.g. Random Forest, Naive Bayes...) excluding Decision Tree. Report the accuracy and Confusion Matrix Obtained.
- Import needed library & access to Spark.
- Create a vector for cluster. Declare a list of categories for images.
- Start extracting information from training data sets.
- Save features from the images.
- Process clusters by using K-Means training.
- Load & Parse the feature data.
- Set number of trees for the total of your data sets.
- Cluster the data into 400 classes using K-Means.
- Calculate Within Set Sum of Squared Error (WSSSE) to evaluate clustering data.
- Create histograms for all images & reduce features.
- Build random forest model using the histograms.
- Split data into 70% training and 30% testing.
- Train the random forest model. ( 4-10 trees )
- Use the test data to get the most accurate the model.
- Print out best error and parameters.
- Train the random forest model once again.
- Save & load the mode.
- Test image classification by using test images and histogram size to determine the prediction of such image.
- Run the above process from the client side.
- Generate Confusion Matrix & print out the accuracy of the model.
< DATASET > Training Data: BubbleTea x 40 | Chocolate x 30 | Coffee x 30 | MochaTea x 40 || Test Data: 4 for each category.
< PURPOSE OF IMAGE CLASSIFICATION PROBLEM > Obtain/extract features and keypoints from training data sets. Build up model & provide learning mechanism for the machine. Label data sets so we can use them to do further predictions. Goal: best accuracy from training+testing data sets without clashing into problems such as overfitting.
Write a client application using the Spark API to connecting between Spark and your client. Your client can be either Web application or Android application. Refer to Tutorial 5 Spark API tutorial.
- Run Spark Program for identifying images using user defined data sets in real-time. ( Models are all set. )
- Connect the server with client using superstatic from command line.
- Copy and paste the link to google Chrome using JetBrain's extension to run & connect Spark Program.
Build a simple application to have a conversion using Google Conversation Actions API about the summary you had generated about your video. Refer to Tutorial 5 Conversion Actions API tutorial.
- Run Spark Program for identifying images using user defined data sets in real-time. ( Models are all set. )
- Connect the server with client using superstatic from command line.
- Copy and paste the link to google Chrome using JetBrain's extension to run & connect Spark Program.