-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathp8105_hw5_zdz2101.Rmd
90 lines (78 loc) · 3.24 KB
/
p8105_hw5_zdz2101.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
title: "p8105_hw5_zdz2101"
author: "Zelos Zhu"
date: "11/3/2018"
output: github_document
---
#Load packages
```{r, message = FALSE}
library(tidyverse)
library(purrr)
library(knitr)
library(broom)
library(ggthemes)
```
#Problem 1
```{r, message = FALSE}
setwd("./data/")
patient_df <- tibble(filenames = list.files()) %>%
mutate(weekly_data = map(filenames, read_csv)) %>%
unnest() %>%
mutate(arm = substring(filenames, 1, 3),
subject_id = substring(filenames, 5, 6)) %>%
gather(week, measure, 2:9) %>%
mutate(week = as.numeric(str_replace(week, "week_","")))
patient_df
setwd("..") #move back to original repo, don't want to continue working in the /data folder
ggplot(patient_df, aes(x = week, y = measure, group = filenames, color = arm)) +
geom_line(alpha = 0.5) +
scale_x_discrete(name ="Week", limits=1:8) +
ylab("Measure") +
ggtitle("Patient Measurements by Week") +
geom_smooth(aes(group = arm))
```
The trend for control group patients' measurements seems stagnant and doesn't seem to change over time. I would suspect this is a placebo of some sort. On the other hand, patients in the experimental arm, their measurements generally increase over time. There is not enough information provided to interpret whether this increase is a good or bad thing. I would also suspect this difference in trends would be statistically significant just from a visual standpoint.
#Problem 2
```{r, message = FALSE}
homicide_df <- read_csv("https://raw.githubusercontent.com/washingtonpost/data-homicides/master/homicide-data.csv")
homicide_df <- homicide_df %>%
mutate(state = ifelse(city == "Tulsa" & state == "AL", "OK", state), #There is a typo in the data set based on lat/long
city_state = str_c(city, ", ", state),
homicide_status = ifelse(disposition == "Closed by arrest", "Solved", "Unsolved"))
#Homicide Case Counts
homicide_df %>%
group_by(city_state) %>%
count(homicide_status) %>%
spread(homicide_status, n) %>%
kable()
#Just Baltimore
baltimore_prop_df <- filter(homicide_df, city_state == "Baltimore, MD") %>%
mutate(homicide_status = factor(homicide_status, levels = c("Unsolved", "Solved")))
baltimore_proptest <- prop.test(table(baltimore_prop_df$homicide_status))
tidy(baltimore_proptest) %>%
select(estimate, conf.low, conf.high) %>%
mutate(city_state = "Baltimore, MD") %>%
kable()
#doing it for all cities
city_props <- homicide_df %>%
group_by(city_state) %>%
count(homicide_status) %>%
spread(homicide_status, n) %>% #stopping here would get us the case counts from earlier
mutate(total = Unsolved + Solved,
test = map(map2(.x = Unsolved, .y = total, ~prop.test(x = .x, n = .y)), tidy)) %>% #make a total variable to make map easier
unnest() %>%
select(city_state, estimate, conf.low, conf.high)
kable(city_props)
city_props %>%
ungroup()%>%
arrange(estimate) %>%
mutate(city_state = factor(city_state, levels = city_state)) %>%
ggplot(aes(x = city_state, y = estimate)) +
geom_point() +
geom_errorbar(aes(x = city_state, ymin = conf.low, ymax = conf.high), width=0.2, size=1, color="blue") +
coord_flip() +
ylab("Estimated Proportion of Unsolved Homicides") +
xlab("City, State") +
ggtitle("Estimated Proportion of Unsolved Homicides by City/State") +
theme_few()
```