Proportional Stacked Area Plot

Overview #

A proportional stacked area plot is used to show the proportion of a whole that different groups represent along different phases.

The whole measures out to 100% (or numerically, 1.0).

When to use #

A proportional stacked area plot is ideal for showing proportions.

By itself, it’s not great if what you want to do is communicate actual counts or values.

Data #

Similar to a stacked area plot, a proportional area plot starts with at least three fields of data:

  • A numerical or categorical field that is constantly increasing or somehow ordered. This will serve as the axis along which the other numerical field changes. This is often a time measure.
  • A numerical field that maps to the other axis and represents what is being measured in the visual.
  • A categorical field that identifies the groups.

This data then needs to be transformed into proportional values along each value of the ordered numerical or categorical field (e.g., time dimension).

R #

The simplest way to generate a stacked area plot is with the ggplot2 package.

# install.packages("ggplot2")
library(ggplot2)

Let’s load up some example data. In this case, let’s look at the net income of the different divisions within a Company X over the years.

example <- tribble(
  ~year, ~net_income, ~division,
  2020,2050,"A",
  2020,13000, "B",
  2021,4000, "A",
  2021, 12300, "B",
  2022,13000, "A",
  2022,14000, "B"
)

kable(example)
year net_income division
2020 2050 A
2020 13000 B
2021 4000 A
2021 12300 B
2022 13000 A
2022 14000 B

We have the net_income represented in absolute values. For a proportional stacked area plot, we need to transform that into proportions.

example_prop <- example %>%
  group_by(year) %>%
  mutate(
    year_tot = sum(net_income),
    proportion = net_income / year_tot
    ) %>%
  ungroup()

kable(example_prop)
year net_income division year_tot proportion
2020 2050 A 15050 0.1362126
2020 13000 B 15050 0.8637874
2021 4000 A 16300 0.2453988
2021 12300 B 16300 0.7546012
2022 13000 A 27000 0.4814815
2022 14000 B 27000 0.5185185

Now that the data has been properly transformed, let’s generate a proportional stacked area plot.

example_prop %>%
  ggplot() +
  geom_area(
    aes(
      x = year,
      y = proportion,
      fill = division
    )
  )

Let’s clean up a few things, including:

  • Better labels
  • Make the years whole years
  • Move Division A to the bottom
  • A different theme
  • Line dividing the areas

Now that the data has been properly transformed, let’s generate a proportional stacked area plot.

example_prop %>%
  mutate(divsion = as.factor(division)) %>%
  mutate(division = factor(division, levels = c("B", "A"))) %>%
  ggplot() +
  geom_area(
    aes(
      x = year,
      y = proportion,
      fill = division
    ),
    color = "black"
  ) +
  scale_x_continuous(breaks = seq(2020,2022,1)) +
  theme_minimal() +
  labs(
    title = "Company X by Division",
    x = NULL,
    y = "Proportion of Net Income"
  )

Success! A simple proportional stacked area plot.