"Big Data" processing with MapReduce framework

This processes images to determine the most common colour values:
The source code for the k-means implementation is found under the k-means directory.
This includes instructions to run.

This calculates the average comment score per sub reddit and was used to compare frameworks.
The source code for the reddit comment implementations is found under the reddit-comments directory.
This has been grouped by framework (couchDB, Hadoop, Spark, Cloud Haskell).
The sequential Java version is found within the Hadoop source code or here.
The data set is taken from here. We uncompressed it and took the first 20,000,000 lines (approx 11GB of JSON).

Runnables

The latest binaries for all implementations are found zipped on the releases page.
This includes input images/video (see the resources directory) and instructions to run so you can reproduce our results.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
images		images
k-means/spark-scala-kmeans		k-means/spark-scala-kmeans
reddit-comments		reddit-comments
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md