Big_Data_In_R_SparkR

What is SparkR An R package distributed with Apache Spark: Provides R fronted to Spark Convenient interperability between R and Spark DataFrams SparkR is the sum of distributed/ robust processing, data sources, off memory data structures and Dynamic environment, interactive, packages, visualization

Overview of SparkR API

IO

read.df
write.df
createDataFrame
collect

Caching

cache
persist
unpersist
cacheTable
uncacheTable

Utility Functons

dim/ head/ take
rand/ sample

MLLib

glm/ predict

DataFrameAPI

select
subset *groupBy
head
showDF
unionAll
agg
avg
column

SQL

sql
table
saveAsTable
registerTamTable
tables

Transformation in Apache Spark

map
flatmap
filter
distict
sample
union
intersection
subtract
cartesian

Actions in Apache spark

collect()
count()
take()
takeOrdered(num)(ordering)
reduce(function)
aggregate(zeroValue)(seqOp, combOp)
foreach(function)

Machine Learning Model in H2O

Deep Leaning
Naive Bayes
Linear Regression
Principal Components Analysis (PCA)
K Means
Stacked Ensembles
Gradient Boosting Machine
Generalised Linear Model
Generalized Low Rank Model
Distributed Random Forest
Word2vec

Supervised Learning

Generalised Linear Model
Distribution Random Forest
Gradient Boosting
Deep learing
Naive Bayes
Stacked Ensembles
XGBoost

Unsupervised Learning

K-Means
Anomaly Detection

Miscellaneouse

Word2vec : Takes a text corpus as an inuput and product the word vectors as output. The result is an H20 word2vec model that can be exported as a binary model or as a MOJO

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
Big_Query		Big_Query
R		R
SQL		SQL
Connection_In_R.R		Connection_In_R.R
LICENSE		LICENSE
Machine_Learning_Spark.R		Machine_Learning_Spark.R
README.md		README.md
Read_Write.R		Read_Write.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big_Data_In_R_SparkR

IO

Caching

Utility Functons

MLLib

DataFrameAPI

SQL

Transformation in Apache Spark

Actions in Apache spark

Machine Learning Model in H2O

Supervised Learning

Unsupervised Learning

Miscellaneouse

About

Releases

Packages

Languages

License

djdhiraj/Big_Data_In_R_SparkR

Folders and files

Latest commit

History

Repository files navigation

Big_Data_In_R_SparkR

IO

Caching

Utility Functons

MLLib

DataFrameAPI

SQL

Transformation in Apache Spark

Actions in Apache spark

Machine Learning Model in H2O

Supervised Learning

Unsupervised Learning

Miscellaneouse

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages