-
Notifications
You must be signed in to change notification settings - Fork 0
CS 5542 BigData Lab Report #03
Amy Lin edited this page Mar 29, 2017
·
2 revisions
Q1. Build a Linear Regression Model for selected 2 parameters for chimpanzee's daily movement, activities & interactions. Define your own data sets.
- Import needed library & access to Spark.
- Create an object called "LinearRegressionwithSGD" and define main as String Array.
- Initialize Spark -> setMaster and AppName.
- Turn off Info Logger for Consolexxx -> Reduce Spark's runtime output.
- Read in data from the user defined data set "Chimpanzee_data.data. -> set this dataset for only 6 chimpanzees (Chimpanzee Label, Activity Level, Location X-Axis, Location Y-Axis)
- Parse the Chimpanzee data.
- Split data randomly into 95% Training & 5% Testing data.
- Start building the model by using training data, number of iterations and step size.
->
val model = LinearRegressionWithSGD.train(training, numIterations, stepSize)
- Evaluate the model based on the training samples/testing samples and calculate the training mean squared error.
- Save the Linear Regression Model. ->
model.save(sc, "data\\LinearRegressionChimpanzees")
- Load the model. ->
val sameModel = LinearRegressionModel.load(sc, "data\\LinearRegressionChimpanzees")
I also include another way of doing Linear Regression in Scala Class. Person represents chimpanzees for a total of 6. Data sets is the same in the above approach.
Person("Chimpanzee Label", Activity Level, Location X-Axis, Location Y-Axis)
Q2. Implement K-Mean Clustering for the clusters of the chimpanzee's activities. Define your own data sets.
- Import needed library & access to Spark.
- Create an object called "kMeansClustering" and define main as String Array.
- Initialize Spark -> setMaster and AppName.
- Turn off Info Logger for Consolexxx -> Reduce Spark's runtime output.
- Load data sets from "chimapanzee_KmreanData.txt" (Activity Level, Location X-Axis, Location Y-Axis ) & parse the data
- Set number of clusters and iterations then cluster the data into 2 classes using K-Means. ->
val clusters = KMeans.train(parsedData, numClusters, numIterations)
- Evaluate clustering by computing Within Set Sum of Squared Errors (WSSSE) ->
val WSSSE = clusters.computeCost(parsedData)
- Make predictions based on training data in the cluster. ->
clusters.predict(parsedData).zip(parsedData).foreach(f=>println(f._2,f._1))
- Save the file "KMeansModelChimpanzee" and load the model. ->
clusters.save(sc, "data/KMeansModelChimpanzee")
Build a simple application to give the summary of a video by using Clarifai API. Use OpenImg Library to the key-frame images from the Clarifai API.
##[ IMPLEMENTATION : KeyFrameDetection.java ]
- Import needed library & how to access Spark, Clarifai API and OpenImj.
- Create
public class KeyFrameDetection
-> Get all the frames from the video. - Frames Extraction: Iterate over video frames. ->
public static void Frames(String path)
& Error handling for outputting the images. - Go through all frames and pick out the main frames in this video. ->
public static void MainFrames()
then perform:
- Shot Transition Detection: Compare SIFT features with adjacent images; common features < Threshold = shot transition
- Find number of key points.
- Collect all the main points together and output the results to the mainframe file.
- Connect the Clarifai API server by using your own API key and access code to get the token.
- Access mainframes file.
- Start doing detail scanned on the image and predict what possible information contains in this image.
- Print out possible contents in each image in the Console.
- Output the possible info onto the image.
[ EXTRA EXAMPLE : Image Annotation ] -- A simple image annotation example. Use the Clarafai API to predict the image "animal.jpg". Output all possible informations on that image and display to the user.