Skip to content
This repository was archived by the owner on Nov 24, 2018. It is now read-only.

K-means image/video data clustering via. MapReduce using Apache Spark. SOFTENG751 High Performance Computing (A+)

License

Notifications You must be signed in to change notification settings

will-molloy/MapReduce-K-means-image-processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

"Big Data" processing with MapReduce framework

K-means (main) implementation

  • This processes images to determine the most common colour values:
  • The source code for the k-means implementation is found under the k-means directory.
  • This includes instructions to run.

reddit comment implementation

  • This calculates the average comment score per sub reddit and was used to compare frameworks.
  • The source code for the reddit comment implementations is found under the reddit-comments directory.
  • This has been grouped by framework (couchDB, Hadoop, Spark, Cloud Haskell).
  • The sequential Java version is found within the Hadoop source code or here.
  • The data set is taken from here. We uncompressed it and took the first 20,000,000 lines (approx 11GB of JSON).

Runnables

  • The latest binaries for all implementations are found zipped on the releases page.
  • This includes input images/video (see the resources directory) and instructions to run so you can reproduce our results.


Image credit: http://www.well-typed.com/blog/73/