forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_template.Rmd
79 lines (67 loc) · 2.42 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
Peer Assignment 1
=============================
```{r}
setwd("C:/Users/ece.o/OneDrive/Coursera/Johns Hopkins University Data Science/Reproducible Research/RepData_PeerAssessment1")
```
## Loading and preprocessing the data
```{r}
data <- read.csv("activity.csv")
```
## What is mean total number of steps taken per day?
1. Total number of steps taken per day
```{r}
dailySteps <- tapply(data[,1], data[,2], sum, na.rm = TRUE)
```
2. Histogram of the total number of steps taken each day
```{r}
hist(dailySteps)
```
3. Mean of the total number of steps taken per day is `r round(mean(dailySteps), 0)` and median is `r median(dailySteps)`
## What is the average daily activity pattern?
1. Time series plot of the 5-minute interval and the average number of steps taken, averaged across all days
```{r}
intSteps <- tapply(data[,1], data[,3], mean, na.rm = TRUE)
plot(names(intSteps), intSteps, type = "l", xlab = "5-Minute Interval", ylab = "# of Steps")
```
2. Interval `r names(which.max(intSteps))` contains the maximum number of steps.
## Imputing missing values
1. The total number of rows with NAs is `r sum(is.na(data))`.
2. Fill missing data with interval mean
```{r}
filled <- data
for (i in 1:nrow(filled)) {
if (is.na(data[i,1])) {
filled[i, 1] = intSteps[names(intSteps) == filled[i, 3]]
}
}
```
4.
```{r}
dailySteps2 <- tapply(filled[,1], filled[,2], sum)
hist(dailySteps2)
```
Mean of the total number of steps taken per day is `r as.integer(mean(dailySteps2))` and median is `r as.integer(median(dailySteps2))`, too.
The frequencies in the middle part of the histogram increased because the empty values are now replaced with average values. The mean and median are slightly higher, but not too high that can distort the analysis in a bad way.
## Are there differences in activity patterns between weekdays and weekends?
1. Convert to PosIX first, then apply `weekdays()`.
```{r}
filled$weekday <- as.factor(weekdays(as.POSIXct(filled[,2])))
filled$weekDE <- ifelse((filled[,4] == "Saturday") | (filled[,4] == "Sunday"), "weekend", "weekday")
filled$weekDE <- as.factor(filled$weekDE)
```
2. Group the data using `data.table`.
```{r}
library(data.table)
filled <- data.table(filled)
weekMean <- filled[, lapply(.SD, mean), by = "interval,weekDE"]
```
Create line chart.
```{r}
library(lattice)
xyplot(steps ~ interval|weekDE,
data = weekMean,
type = "l",
xlab = "Interval",
ylab = "Number of steps",
layout=c(1,2))
```