Skip to content

Commit 1b0aa8b

Browse files
committed
use of geom_col
1 parent b3978b1 commit 1b0aa8b

3 files changed

+7
-6
lines changed

3_05_univariate_categorical.Rmd

+3-2
Original file line numberDiff line numberDiff line change
@@ -51,12 +51,13 @@ bar_plt <- ggplot(storms, aes(x = type))
5151
```
5252
We've called the object `bar_plt`, for obvious reasons. Notice that we only need to define one aesthetic mapping: we mapped `type` to the x axis. This produces a bar plot with vertical bars.
5353

54-
From here we follow the usual **ggplot2** workflow, meaning the next step is to add a layer using one of the `geom_XX` functions. The sensibly named `geom_bar` is the one we need:
54+
From here we follow the usual **ggplot2** workflow, meaning the next step is to add a layer using one of the `geom_XX` functions. There are two functions we can use to create bar charts in ggplot, `geom_bar` and `geom_col`. By default `geom_col` counts the number of observations in each category, whilst `geom_bar` plots the actual numbers in the data frame. In this case as we want the number of storms of each `type` we will use `geom_bar`:
55+
5556
```{r}
5657
bar_plt <- bar_plt + geom_bar()
5758
summary(bar_plt)
5859
```
59-
Look at the layer information below `----`. The `geom_bar` function sets the stat to "bin". Binning a categorical variable is analogous to binning a numeric variable. The only difference is that there is no need to specify bin widths because `type` is categorical, i.e. `ggplot2` will sum up the number of observations associated with every category of `type`. Here's the resulting figure:
60+
Look at the layer information below `----`. The `geom_bar` function sets the stat to "count". Counting a categorical variable is analogous to binning a numeric variable. The only difference is that there is no need to specify bin widths because `type` is categorical, i.e. `ggplot2` will sum up the number of observations associated with every category of `type`. Here's the resulting figure:
6061
```{r}
6162
bar_plt
6263
```

3_06_bivariate.Rmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ storms_alter <-
144144
This may look a little confusing at first glance, but all we did here was create a vector of ordered category names called `storm_names`, and then use mutate to change `type` to a factor using the ordering implied by `storm_names`. Just be careful with the spelling---the values in `storm_names` must match those in `type`. We did this with **dplyr**'s `mutate` function, again calling the modified data set `storms_alter`. Once we've applied the factor trick we can remake the bar chart:
145145

146146
```{r, fig.width=4.5}
147-
# 3. make the histogram
147+
# 3. make the bar plot
148148
ggplot(storms_alter, aes(x = year, fill = type)) +
149149
geom_bar()
150150
```

3_07_eda_complexity.Rmd

+3-3
Original file line numberDiff line numberDiff line change
@@ -48,17 +48,17 @@ storms_sum <-
4848
storms_sum
4949
```
5050

51-
We used `group_by` and `summarise` to calculate the set of means, which we called `mean_wind`. The second step uses the new data frame (called `storms_sum`) as the default data in a new graphical object, sets x and y aesthetic mappings from `type` and `mean_wind`, and adds a layer with `geom_bar`:
51+
We used `group_by` and `summarise` to calculate the set of means, which we called `mean_wind`. The second step uses the new data frame (called `storms_sum`) as the default data in a new graphical object, sets x and y aesthetic mappings from `type` and `mean_wind`, and adds a layer with `geom_col`:
5252

5353
```{r, fig.height=2.5}
5454
mean.plt <-
5555
ggplot(storms_sum, aes(x = type, y = mean_wind)) +
56-
geom_bar(stat = "identity") +
56+
geom_col() +
5757
coord_flip() + xlab("Storm Category") + ylab("Mean Wind Speed (mph)")
5858
mean.plt
5959
```
6060

61-
The result is the same as the last plot. Notice that we had to set `stat` to `"identity"`. This is important. The default behaviour of `geom_bar` is to count the observations in each category. The `stat = "identity"` argument tells it that the information in `mean_wind` must be plotted 'as is'.
61+
The result is the same as the last plot. Note that we have used `geom_col` instead of `geom_bar` here. Remember here that the default behaviour of `geom_bar` is to count the observations in each category. Using the `geom_col` function tells it that the information in `mean_wind` must be plotted 'as is' instead.
6262

6363
Which approach is better? The first approach is more compact though. We recommend the second long-winded way approach for new users because it separates the summary calculations from the plotting. This way, as long as we're comfortable with **dplyr**, we can get away with remembering less about how **ggplot2** works. It also makes it a bit easier to fix mistakes, as we can first check whether the right information is in the summary data frame, before we worry about plotting it.
6464

0 commit comments

Comments
 (0)