-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathMachine_Learning_Spark.R
24 lines (19 loc) · 1.06 KB
/
Machine_Learning_Spark.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#Machine Learning
#You can orchestrate machine learning algorithms in a Spark cluster via the machine learning functions within sparklyr.
#These functions connect to a set of high-level APIs built on top of DataFrames that help you create and tune machine learning workflows.
#Here’s an example where we use ml_linear_regression to fit a linear regression model.
#We’ll use the built-in mtcars dataset, and see if we can predict a car’s fuel consumption (mpg) based on its weight (wt),
#and the number of cylinders the engine contains (cyl). We’ll assume in each case that the
#relationship between mpg and each of our features is linear.
# copy mtcars into spark
mtcars_tbl <- copy_to(sc, mtcars)
# transform our data set, and then partition into 'training', 'test'
partitions <- mtcars_tbl %>%
filter(hp >= 100) %>%
mutate(cyl8 = cyl == 8) %>%
sdf_partition(training = 0.5, test = 0.5, seed = 1099)
# fit a linear model to the training dataset
fit <- partitions$training %>%
ml_linear_regression(response = "mpg", features = c("wt", "cyl"))
fit
summary(fit)