FBI Internet Crime Compaint Center Report

FBI Internet Crime Compaint Center Report

March 28, 2022
analysis
code, R, ggplot2, cybersecurity

Overview #

The FBI’s Internet Crime Complaint Center (IC3) puts out an annual report that summarizes the recent year’s experiences with Internet crime.

It includes analysis that hints at the availability of quantitative data (versus purely qualitative anecdotes).

Here’s a copy of the IC3’s latest report.

Data #

The available data is embedded in the reports themselves, which are in .pdf – less than ideal.

Fortunately, my colleague hrbrmstr has been generous enough the manually collect the data on crime types, monetary losses, and victim counts from several years worth of reports, and consolidate it all into an easily-workable .csv file.

What the raw data looks like:

Most of the crime type labels are I think pretty self-explanatory. BEC/EAC stands for “Business Email Compromise / Email Account Compromise”.

Analysis #

The multi-year panel data lends itself well to a bump plot visualization, which is handy for presenting ordinal ranking over multiple periods.

Here’s how to put together some bump plots.

First, let’s load up the packages we’ll be using.

library(tidyverse)
library(hrbrthemes)
library(ggbump)

To assemble a bump plot, we have to derive rankings.

ranked <- dat %>%
  group_by(year) %>%
  arrange(desc(loss)) %>%
  mutate(rank_loss = row_number()) %>%
  arrange(desc(victim_count)) %>%
  mutate(rank_victims = row_number()) %>%
  ungroup()

head(ranked)
## # A tibble: 6 × 6
##   crime_type                     year   loss victim_count rank_loss rank_victims
##   <chr>                         <dbl>  <dbl>        <dbl>     <int>        <int>
## 1 Phishing/Vishing/Smishing/Ph…  2021 4.42e7       323972        18            1
## 2 Phishing/Vishing/Smishing/Ph…  2020 5.42e7       241342        17            1
## 3 Phishing/Vishing/Smishing/Ph…  2019 5.78e7       114702        13            1
## 4 Non-Payment/Non-Delivery       2020 2.65e8       108869         4            2
## 5 Non-Payment/Non-Delivery       2021 3.37e8        82478         7            2
## 6 Extortion                      2020 7.09e7        76741        14            3

It also helps to assign the number of distinct categories to a fixed variable that can easily be referenced later.

We could easily do this manually, but I like adhering the the DRY principle – “Don’t Repeat Yourself”. By encoding the value, we won’t have to change things across multiple pieces of code if the underlying data ever changes; we’ll only have to change the code in one spot.

crime_type_count <- length(unique(ranked$crime_type))

crime_type_count
## [1] 28

Bump plot ordered by monetary losses:

ranked %>%
  ggplot(aes(x = year, y = rank_loss, color = crime_type)) +
  geom_bump() +
  geom_point(size = 3) +
  geom_text(data = ranked %>% filter(year == min(year)), aes(x = year - .1, label = crime_type), hjust = 1) +
  geom_text(data = ranked %>% filter(year == max(year)), aes(x = year + .1, label = crime_type), hjust = 0) +
  scale_x_continuous(limits = c(2017.5, 2022.5), expand = c(.3,.3), breaks = c(2018,2019,2020,2021)) +
  scale_y_reverse(breaks = seq(1,crime_type_count,1)) + 
  theme_ipsum_rc(grid = "") +
  scale_color_viridis_d() +
  theme(
    legend.position = "none",
    axis.title.y = element_text(angle = 0)
    ) +
  labs(
    title = "FBI IC3 4 Year Historical Company Losses Ranked",
    x = NULL,
    y = NULL
  )

Bump plot ordered by number of victims:

ranked %>%
  ggplot(aes(x = year, y = rank_victims, color = crime_type)) +
  geom_bump() +
  geom_point(size = 3) +
  geom_text(data = ranked %>% filter(year == min(year)), aes(x = year - .1, label = crime_type), hjust = 1) +
  geom_text(data = ranked %>% filter(year == max(year)), aes(x = year + .1, label = crime_type), hjust = 0) +
  scale_x_continuous(limits = c(2017.5, 2022.5), expand = c(.3,.3), breaks = c(2018,2019,2020,2021)) +
  scale_y_reverse(breaks = seq(1,crime_type_count,1)) + 
  theme_ipsum_rc(grid = "") +
  scale_color_viridis_d() +
  theme(
    legend.position = "none",
    axis.title.y = element_text(angle = 0)
    ) +
  labs(
    title = "FBI IC3 4 Year Historical Ranked by Victim Count",
    x = NULL,
    y = NULL
  )

I’ve chosen to use the viridis color palette because it’s friendly to printers and the color blind.

Parting Thoughts #

There are more ways to analyze this particular dataset.

If you have ideas, feel free to drop me a message.