CS 5542 BigData Lab Report #03

SPARK PROGRAMMING - Machine Learning Tasks : Chimpanzee's Daily Activities

[ QUESTION ]

Q1. Build a Linear Regression Model for selected 2 parameters for chimpanzee's daily movement, activities & interactions. Define your own data sets.

[ IMPLEMENTATION ]

Import needed library & access to Spark.
Create an object called "LinearRegressionwithSGD" and define main as String Array.
Initialize Spark -> setMaster and AppName.
Turn off Info Logger for Consolexxx -> Reduce Spark's runtime output.
Read in data from the user defined data set "Chimpanzee_data.data. -> set this dataset for only 6 chimpanzees (Chimpanzee Label, Activity Level, Location X-Axis, Location Y-Axis)
Parse the Chimpanzee data.
Split data randomly into 95% Training & 5% Testing data.
Start building the model by using training data, number of iterations and step size. -> val model = LinearRegressionWithSGD.train(training, numIterations, stepSize)
Evaluate the model based on the training samples/testing samples and calculate the training mean squared error.
Save the Linear Regression Model. -> model.save(sc, "data\\LinearRegressionChimpanzees")
Load the model. -> val sameModel = LinearRegressionModel.load(sc, "data\\LinearRegressionChimpanzees")

I also include another way of doing Linear Regression in Scala Class. Person represents chimpanzees for a total of 6. Data sets is the same in the above approach. Person("Chimpanzee Label", Activity Level, Location X-Axis, Location Y-Axis)

Q2. Implement K-Mean Clustering for the clusters of the chimpanzee's activities. Define your own data sets.

[ IMPLEMENTATION ]

Import needed library & access to Spark.
Create an object called "kMeansClustering" and define main as String Array.
Initialize Spark -> setMaster and AppName.
Turn off Info Logger for Consolexxx -> Reduce Spark's runtime output.
Load data sets from "chimapanzee_KmreanData.txt" (Activity Level, Location X-Axis, Location Y-Axis ) & parse the data
Set number of clusters and iterations then cluster the data into 2 classes using K-Means. -> val clusters = KMeans.train(parsedData, numClusters, numIterations)
Evaluate clustering by computing Within Set Sum of Squared Errors (WSSSE) -> val WSSSE = clusters.computeCost(parsedData)
Make predictions based on training data in the cluster. -> clusters.predict(parsedData).zip(parsedData).foreach(f=>println(f._2,f._1))
Save the file "KMeansModelChimpanzee" and load the model. -> clusters.save(sc, "data/KMeansModelChimpanzee")

Clarifai API - Video Annotation

[ QUESTION ]

Build a simple application to give the summary of a video by using Clarifai API. Use OpenImg Library to the key-frame images from the Clarifai API.

##[ IMPLEMENTATION : KeyFrameDetection.java ]

Import needed library & how to access Spark, Clarifai API and OpenImj.
Create public class KeyFrameDetection -> Get all the frames from the video.
Frames Extraction: Iterate over video frames. -> public static void Frames(String path) & Error handling for outputting the images.
Go through all frames and pick out the main frames in this video. -> public static void MainFrames() then perform:

Shot Transition Detection: Compare SIFT features with adjacent images; common features < Threshold = shot transition
Find number of key points.

Collect all the main points together and output the results to the mainframe file.

[ IMPLEMENTATION : ImageAnnotation.java ]

Connect the Clarifai API server by using your own API key and access code to get the token.
Access mainframes file.
Start doing detail scanned on the image and predict what possible information contains in this image.
Print out possible contents in each image in the Console.
Output the possible info onto the image.

[ EXTRA EXAMPLE : Image Annotation ] -- A simple image annotation example. Use the Clarafai API to predict the image "animal.jpg". Output all possible informations on that image and display to the user.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CS 5542 BigData Lab Report #03

SPARK PROGRAMMING - Machine Learning Tasks : Chimpanzee's Daily Activities

[ QUESTION ]

Q1. Build a Linear Regression Model for selected 2 parameters for chimpanzee's daily movement, activities & interactions. Define your own data sets.

[ IMPLEMENTATION ]

Q2. Implement K-Mean Clustering for the clusters of the chimpanzee's activities. Define your own data sets.

[ IMPLEMENTATION ]

Clarifai API - Video Annotation

[ QUESTION ]

[ IMPLEMENTATION : ImageAnnotation.java ]

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally