Overview #
A bar plot is a data visualization that shows the relationship between a categorical variable and a numerical variable.
Think of a categorical value as a type of thing, and a numerical variable as anything that can be counted or measured.
Data #
A bar plot requires at least one categorical variable and one numerical variable.
Additional variables can be reflected in the bar plot using such additional features as colors, textures, or outlines.
Depending on how the data is collected, it may require some manipulation before it’s of a suitable form for use in a bar plot.
R #
Let’s make a barplot in R with the ggplot2 package. We’ll also use tooling from tidyverse to manipulate the data.
library(tidyverse)
library(ggplot2) # a bit redundant, since this is included in tidyverse, but showing it explicitly here
For this example, we’ll use the built in diamonds
dataset, which contains information on a set of 50,000 round cut diamonds.
## # A tibble: 53,940 × 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
## 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39
## # ℹ 53,930 more rows
This particular dataset includes some categorical fields, such as cut
, color
, clarity
.
There are also numerical fields, such as carat
, depth
, and price
.
Let’s say we want to see what the average price is for a particular cut based on our data. We’ll first have to calculate the average price by cut.
avg_price_cut <- diamonds %>%
group_by(cut) %>%
summarize(avg_price = mean(price))
avg_price_cut
## # A tibble: 5 × 2
## cut avg_price
## <ord> <dbl>
## 1 Fair 4359.
## 2 Good 3929.
## 3 Very Good 3982.
## 4 Premium 4584.
## 5 Ideal 3458.
Now that the data is prepared, we can drop it into a bar plot.
plot <- avg_price_cut %>%
ggplot(
aes(
x = cut,
y = avg_price
)
) +
geom_bar(stat = "identity")
plot
The default plot is pretty bland. Let’s dress it up a tiny bit.
First, let’s add some labels.
plot <- plot +
labs(
title = "Average Diamond Prices by Cut",
x = "Cut",
y = "Average Price"
)
plot
Let’s also replace the theme with something else.
library(ggthemes)
plot +
ggthemes::theme_solarized()