-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path2022-06.qmd
153 lines (109 loc) · 3.48 KB
/
2022-06.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
# Week 6
## Requirements
- [Input the data](https://docs.google.com/spreadsheets/d/1kePTJ4RLkibqwb6cGBIlRT6mvR9T0E5q/edit?usp=sharing&ouid=113142875990517226017&rtpof=true&sd=true)
- Parse out the information in the Scrabble Scores Input so that there are 3 fields:
- Tile
- Frequency
- Points
- Calculate the % Chance of drawing a particular tile and round to 2 decimal places
- Frequency / Total number of tiles
- Split each of the 7 letter words into individual letters and count the number of occurrences of each letter
- Join each letter to its scrabble tile
- Update the % chance of drawing a tile based on the number of occurrences in that word
- If the word contains more occurrences of that letter than the frequency of the tile, set the probability to 0 - it is impossible to make this word in Scrabble
- Remember for independent events, you multiple together probabilities i.e. if a letter appears more than once in a word, you will need to multiple the % chance by itself that many times
- Calculate the total points each word would score
- Calculate the total % chance of drawing all the tiles necessary to create each word
- Filter out words with a 0% chance
- Rank the words by their % chance (dense rank)
- Rank the words by their total points (dense rank)
- [Output the data](https://drive.google.com/file/d/1e2vsddddubIHApNXTB2O-AxQxOF9c2-T/view?usp=sharing)
::: callout-note
## Scaffold
Not exactly sure why the scaffold table is needed in Tableau. We do not use it here.
:::
## Setup
```{r}
#| message: false
#| results: hide
library(tidyverse)
googledrive::drive_download("https://docs.google.com/spreadsheets/d/1kePTJ4RLkibqwb6cGBIlRT6mvR9T0E5q")
raw_1 <- readxl::read_xlsx("7 letter words.xlsx", sheet = "7 letter words")
raw_2 <- readxl::read_xlsx("7 letter words.xlsx", sheet = "Scrabble Scores")
raw_3 <- readxl::read_xlsx("7 letter words.xlsx", sheet = "Scaffold")
file.remove("7 letter words.xlsx")
```
## EDA
::: panel-tabset
## Table 1
```{r}
#| echo: false
raw_1
```
## Summary 1
```{r}
#| echo: false
skimr::skim(raw_1)
```
## Table 2
```{r}
#| echo: false
raw_2
```
## Summary 2
```{r}
#| echo: false
skimr::skim(raw_2)
```
## Table 3
```{r}
#| echo: false
raw_3
```
## Summary 3
```{r}
#| echo: false
skimr::skim(raw_3)
```
:::
## Preppin
```{r}
mid_2 <- raw_2 |>
extract(Scrabble, c("Points", "other"), "(^[0-9]+) points?:(.*)", convert = T) |>
mutate(other = str_squish(other)) |>
separate_rows(other, sep = ", ") |>
separate(other, into = c("Tiles", "Frequency"), sep = " ×", convert = T) |>
mutate(Chance = round(Frequency / sum(Frequency), 2))
output <- raw_1 |>
mutate(letters = str_split(`7 letter word`, pattern = "")) |>
unnest(letters) |> mutate(letters = str_to_upper(letters)) |>
count(`7 letter word`, letters) |>
left_join(mid_2, by = c("letters" = "Tiles")) |>
mutate(`Updated Chance` = if_else(n <= Frequency, Chance ^ n, 0)) |>
summarize(
`% Chance` = prod(`Updated Chance`),
`Total Points` = sum(Points * n),
.by = `7 letter word`
) |>
filter(`% Chance` > 0) |>
mutate(
`Points Rank` = dense_rank(desc(`Total Points`)),
`Likelihood Rank` = dense_rank(desc(`% Chance`))
) |>
select(`Points Rank`, `Likelihood Rank`, `7 letter word`,
`% Chance`, `Total Points`) |>
arrange(`Points Rank`)
```
## Output
::: panel-tabset
## Table
```{r}
#| echo: false
output
```
## Summary
```{r}
#| echo: false
skimr::skim(output)
```
:::