General Issues

1. Find a Better Visualization

The original:

Some issues:

A simple bar chart with a zero base line:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
d <- data.frame(pres = c("Obama", "Carter", "Clinton",
                         "G.W. Bush", "Reagan", "G.H.W Bush", "Trump"),
                appr = c(79, 78, 68, 65, 58, 56, 40),
                party = c("D", "D", "D", "R", "R", "R", "R"),
                year = c(2009, 1977, 1993, 2001, 1981, 1989, 2017))
d <- mutate(d, pres = reorder(pres, appr))

p <- ggplot(d, aes(x = pres, y = appr, fill = party)) +
    geom_col() + coord_flip()
p

This can be changed using scale_fill_manual:

p + scale_fill_manual(values = c(R = "red", D = "blue"))

We can reduce the saturation and the value in the HSV color representation to obtain less intense colors; this is commonly used in red state/blue state maps:

myred <- hsv(0, 0.6, 0.8)
myblue <- hsv(2 / 3, 0.6, 0.8)
p + scale_fill_manual(values = c(R = myred, D = myblue))

Some enhancements:

p + scale_fill_manual(values = c(R = myred, D = myblue)) + theme_void() +
    geom_text(aes(y = 3, label = pres),
              size = 8, hjust = "left", color = "white") +
    geom_text(aes(y = appr - 3, label = appr),
              size = 8, hjust = "right", color = "white")

Some notes:

2. EPA Fuel Economy Data

library(lubridate)
library(readr)
if (! file.exists("vehicles.csv.zip") ||
    file.mtime("vehicles.csv.zip") + months(6) < now())
    download.file("http://www.stat.uiowa.edu/~luke/data/vehicles.csv.zip",
                  "vehicles.csv.zip")
newmpg <- read_csv("vehicles.csv.zip", guess_max = 100000)

From the documentation for the data the appropriate variables seem to be:

The primary fuel type counts are

library(dplyr)
tbl <- count(newmpg, fuelType1)
kbl <- knitr::kable(tbl, format = "html")
kableExtra::kable_styling(kbl, full_width = FALSE)
fuelType1 n
Diesel 1231
Electricity 353
Midgrade Gasoline 142
Natural Gas 60
Premium Gasoline 13517
Regular Gasoline 29384

A bar chart of these numbers:

thm <- theme_minimal() + theme(text = element_text(size = 16))
ggplot(tbl, aes(x = n, y = reorder(fuelType1, n))) +
    geom_col() +
    scale_x_continuous(expand = expansion(mult = c(0, .1))) +
    thm +
    ylab(NULL)

Regular gas is the dominant fuel type over all years, with premium second. All other fuel types, including electricity, make up a small fraction.

3. Fuel Type Over the Years

A filled bar chart shows changes in the primary fuel type used over the years:

newmpg2 <- filter(newmpg, year <= 2021) %>%
    mutate(year = factor(year))
ggplot(newmpg2, aes(y = year, fill = fuelType1)) +
    geom_bar(position = "fill") +
    scale_x_continuous(expand = c(0, 0)) +
    labs(x = "Proportion", y = NULL)

Regular gas was the predominant fuel type in the mid 1980s, but premium’s share has gradually increased to the point where almost as many models use premium as regular. Diesel’s popularity declined early and had a small resurgence recently. The market share for electricity is still quite small but is growing.

4. Highway Fuel Economy Over the Years

newmpg3 <- filter(newmpg, year <= 2021, year >= 2000) %>%
    mutate(year = factor(year))
alpha <- 0.2
size <- 0.3

A strip chart is a useful way to look at the full data for a numeric variable at several different levels of a discrete variable, but some tuning is needed for larger data sets. For examining 22 years of highway gas mileage data from the EPA data set using alpha = 0.2 and size = 0.3 along with jittering seems to work reasonably well:

ggplot(newmpg3, aes(x = highway08, y = year)) +
    geom_point(position = "jitter", size = size, alpha = alpha) +
    ylab(NULL) +
    thm

Over time the highway gas mileage distributions are moving upward a little bit, with the upper tails becoming gradually longer and an increasing number of very high efficiency models (mostly electric).

