Overview #
A circle pack is used to show proportions using circles of different sizes.
The relative positions of the circles don’t mean anything.
Circle packs can be hierarchical, in the sense that categories can be nested within other categories.
Functionally, circle packs are very similar to treemaps. In fact, circle packs are sometimes referred to as “circular treemaps”.
When to use #
Use circle-packs when you want to convey a sense of how different categories compare in terms of scale.
Circle packs aren’t great for precise comparisons, since humans generally aren’t great at visually comparing areas. As an alternative, consider something like a bar plot for precise comparisons.
Circle packs are good for conveying how different categories are nested within other categories (then again, something like this could still be communicated with a stacked bar plot).
Aesthetically, it is a nice change of pace, so if you’re getting bored of treemaps or bar plots, feel free to give circle packs a chance.
Pointers #
Depending on how many circles there are, it’s sometimes visually cleaner to leave details off the smaller circles. Generally, it’s more effective to focus on the big circles and use this sort of plot as a way to focus attention on the most prominent categories being considered.
If there are multiple layers of categories, and the top layer category doesn’t convey too much detail, it might help to just leave the top layer off entirely.
With circle packs, less is often more.
Data #
At a minimum, a circle pack requires a categorical field and a numerical field. When there are only these two fields, then there is no hierarchy to the data, in the sense that there will be no categories nested in other categories.
For instance, a flat, non-hierarchical dataset might look something like this, which is just a made up dataset conveying the number of square feet available on different floors of a house.
floor | footage |
---|---|
first | 900 |
second | 600 |
basement | 300 |
The data can also be hierarchical, in the sense that there are sub-categories nested within the major category.
floor | room | footage |
---|---|---|
first | bedroom1 | 450 |
first | bedroom2 | 300 |
first | bathroom | 150 |
second | kitchen | 200 |
second | dining | 400 |
basement | utility | 100 |
basement | storage | 200 |
R #
Circle packs can be generated in R using different packages that build upon ggplot2.
The specific packages that are used depends on whether or not the underlying data is flat or hierarchical.
Flat data #
The simplest package to help generate non-hierchical circle packs in R is the packcircles package.
The packcircles
package defines the positions of the circles, and ggplot2
then generates the actual plot.
# install.packages("packcircles") # run if the package isn't already installed
library(packcircles)
Let’s take a look at a flat example dataset.
flat_dat
## # A tibble: 3 × 2
## floor footage
## <chr> <dbl>
## 1 first 900
## 2 second 600
## 3 basement 300
Now, use the circleProgressiveLayout()
function from packcircles
to define the positions of the circles.
position <- circleProgressiveLayout(flat_dat$footage, sizetype='area')
position
## x y radius
## 1 -16.9256875 0.00000 16.92569
## 2 13.8197660 0.00000 13.81977
## 3 0.9871775 -19.79643 9.77205
The output defines X coordinates, Y coordinates, and radii for the circles. The radius values are proportional to the original numerical values. In our example, that would be the footage.
These newly-generated values are arranged so that the circles will be positioned so they each barely touch one another without any overlap.
Now it’s time to bind the generated position data with the original flat data.
flat_dat_prep <- cbind(flat_dat, position)
flat_dat_prep
## floor footage x y radius
## 1 first 900 -16.9256875 0.00000 16.92569
## 2 second 600 13.8197660 0.00000 13.81977
## 3 basement 300 0.9871775 -19.79643 9.77205
Next, we’ll generate another dataset that takes the coordinates and radii and calculates actual point positions of the circles by using the circleLayoutVertices()
function.
flat_dat_vertices <- circleLayoutVertices(position, npoints = 50)
head(flat_dat_vertices)
## x y id
## 1 0.0000000 0.000000 1
## 2 -0.1334641 2.121351 1
## 3 -0.5317516 4.209247 1
## 4 -1.1885813 6.230761 1
## 5 -2.0935945 8.154012 1
## 6 -3.2325187 9.948670 1
The npoints
parameter defines the number of points to be generated per circle. Lines are drawn between the points to form circles. The more points there are, the smoother the circle.
Note that the id
field corresponds to unique circles. This field will be used for mappings later when actually generating the plot.
unique(flat_dat_vertices$id)
## [1] 1 2 3
Now it’s time to generate the plot.
ggplot() +
geom_polygon(
data = flat_dat_vertices,
aes(
x = x,
y = y,
group = id,
fill = as.factor(id) # as.factor makes the id categorical rather than numerical
)
)
And there we have it: a circle pack. But without labels, it’s kind of useless. While we’re at it, let’s make the label sizes correspond to the square footage.
ggplot() +
geom_polygon(
data = flat_dat_vertices,
aes(
x = x,
y = y,
group = id,
fill = as.factor(id)
)
) +
geom_text(
data = flat_dat_prep,
aes(
x = x,
y = y,
size = footage,
label = floor
)
)
Let’s dress this up with more labels, outlines, dropping extraneous features, and making the coordinate plane equal so the circles look rounder.
ggplot() +
geom_polygon(
data = flat_dat_vertices,
aes(
x = x,
y = y,
group = id,
fill = as.factor(id)
),
color = "black"
) +
geom_text(
data = flat_dat_prep,
aes(
x = x,
y = y,
size = footage,
label = floor
)
) +
theme_void() + # switch to a bare theme
theme(
legend.position = "none" # get rid of the legend
) +
coord_equal() +
labs(
title = "Square footage of a house's various floors"
)
Hierarchical data #
It’s an entirely different approach when the data is hierarchical.
Hierarchical data can be thought of as graph data. The underlying data should be treated primarily as an edge list, with a “from” column and a “to” column. The “from” column represents the broader category, and the “to” column represents a sub-category.
Let’s look at our previous hierarchical data example.
hierarchical_dat
## # A tibble: 7 × 3
## floor room footage
## <chr> <chr> <dbl>
## 1 first bedroom1 450
## 2 first bedroom2 300
## 3 first bathroom 150
## 4 second kitchen 200
## 5 second dining 400
## 6 basement utility 100
## 7 basement storage 200
In this case, the main category is the floor, and the subcategory is the room. Footage is a detail about each room.
We can break this down into an edge list that looks like this:
edges <- select(hierarchical_dat, floor, room) %>%
rename("from" = "floor", "to" = "room") # rename the column names to edgelist conventions
edges
## # A tibble: 7 × 2
## from to
## <chr> <chr>
## 1 first bedroom1
## 2 first bedroom2
## 3 first bathroom
## 4 second kitchen
## 5 second dining
## 6 basement utility
## 7 basement storage
What this edgelist shows is the floor as the “from”, and the room it contains as the “to”.
We’re only showing one level of nesting, but there could be more. A single “from” category can point to multiple “to” categories.
For instance, “bedroom1” might be a “from”, and its corresponding “to” might be something contained within, like “closet1”.
The vertices (or nodes) are the distinct elements mentioned in the entire edgelist. Oftentimes, vertices will come with additional details.
In our example, the vertices are either the floor or the room, and they each have details about their square footage.
We’ll have to massage the vertices data some to get it into the right form.
rooms <- hierarchical_dat %>%
select(-floor) %>% # this leaves us with just the room and its square footage
rename("name" = "room")
floors <- flat_dat %>% # recall that this is just a floor and its corresponding square footage
rename("name" = "floor")
vertices <- rbind(rooms, floors)
vertices
## # A tibble: 10 × 2
## name footage
## <chr> <dbl>
## 1 bedroom1 450
## 2 bedroom2 300
## 3 bathroom 150
## 4 kitchen 200
## 5 dining 400
## 6 utility 100
## 7 storage 200
## 8 first 900
## 9 second 600
## 10 basement 300
At this point, we have an edge list and vertices. It’s time to assemble it into a hierchical circular packing plot.
To do so, we’ll use the igraph and ggraph packages, which are useful for handling graph data.
# install.packages(c("igraph", "ggraph")) # run this if the packages haven't already been installed.
library(igraph)
library(ggraph)
First, we have to construct a graph data object by using the graph_from_data_frame()
function from igraph
.
graph <- graph_from_data_frame(d = edges, vertices = vertices)
Then, we’ll render the circle pack.
ggraph(graph, layout = 'circlepack') +
geom_node_circle()
## Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
That’s kind of wonky. Let’s slim down the theme, and make the coordinates equal so the shapes are more circular.
ggraph(graph, layout = 'circlepack') +
geom_node_circle() +
theme_void() +
coord_equal()
So far, this still isn’t very useful. It needs some labels and maybe some color.
ggraph(graph, layout = 'circlepack') +
geom_node_circle(aes(fill = depth)) +
geom_node_text(aes(label = name), color = "white") +
theme_void() +
coord_equal()
Note that we mapped the fill color to “depth”. Depth refers to how deep down the nesting structure of something is. This is an attribute available to hierarchical data restructured as a graph object.
It’s progress, but let’s make this a bit more informative and cleaner. First, let’s add a title. Then, let’s scale the labels to match the corresponding vertex (room or floor).
Let’s also show the lowest level element – the rooms – in white. In graph nomenclature, the lowest level element are leaves (or leaf in singular form).
We’ll keep the label for floors, but make it a dramatically different in design.
ggraph(graph, layout = 'circlepack') +
geom_node_circle(aes(fill = depth)) +
geom_node_text(aes(label = name, size = footage, filter = leaf), color = "white") +
geom_node_label(aes(label = name, size = footage, filter = depth == 0), alpha = .5) +
theme_void() +
coord_equal() +
theme(legend.position = "none") + # remove the legend. We don't need it.
labs(
title = "Square Footage of Rooms in Different Floors of a House"
)
Resources #
Here are some excellent resources on circle packs.