CS 5542 BigData Lab Report #02

SPARK PROGRAMMING - Text Data

[ QUESTION ]

Write a spark program with an interesting use case using text data as the input and program should have at least 2 Spark Transformations and 2 Spark Actions.

Present your use case in map reduce paradigm as shown below ( for word count ).

[ IMPLEMENTATION ]

Import needed library & how to access Spark-> import org.apache.spark.{SparkConf, SparkContext}
Create an object called "TextDataProcess" and define main as String Array.
Initialize Spark -> setMaster and AppName
Read in text data from an external file by using sc.textFile("textdata.txt")
Split and map the texts by words. Using cache to support pulling data sets into a cluster-wide in-memory cache. -> textdata.flatMap(line => {line.split(" ")}).map(word => (word, 1)).cache()
Shuffling and reducing -> wc.reduceByKey(_ + _)
Output the result as a text file format and balance the data by repartitioning -> output.repartition(1).saveAsTextFile("output.txt")
Print out the formatted result in the IntelliJ shell -> var s: String = "----------------\n Words : Count \n---------------- \n" result.foreach { case (word, count) => { s += word + " : " + count + "\n" }} println(s)

<< Words in BOLD are Actions, Transformations, Shuffle Operations Commands. >>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CS 5542 BigData Lab Report #02

SPARK PROGRAMMING - Text Data

[ QUESTION ]

[ IMPLEMENTATION ]

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally