## Data Types

Many different kinds of data are shown on maps:

• Points, location data.

• Lines, routes, connections.

• Area data, aggregates, rates.

• Sampled surface data.

## Map Types

There are many different kinds of maps:

• Dot maps

• Symbol maps

• Line maps

• Choropleth maps

• Cartograms and tiled maps

### Dot Maps

• A dot map of violent crime locations in Houston. ### Symbol Maps

• Increases and decreases in jobs. ### Line Maps ### Isopleth, Isarithmic, or Contour Maps

• Density contours for Houston violent crime data: • Bird flu deaths in California: ### Choropleth Maps

• Unemployment rates by county and by state:  ### Cartograms and Tiled Maps   • Poverty and education. * A D3 approach.

## Drawing Maps

At the simplest level drawing a map involves;

• drawing or filling polygons that represent region boundaries;

• adding points, lines, and other annotations.

But there are a number of complications:

• you need to obtain the polygons representing the regions;

• regions with several pieces;

• holes in regions; lakes in islands in lakes;

• projections for mapping spherical longitude/lattitude coordinates to a plane;

• aspect ratio;

• making sure map and annotation layers are consistent in their use of coordinates and aspect ratio;

• connecting your data to the geographic features.

High level tools are available for creating some standard map types.

These tools usually have some limitations.

Creating maps directly, building on the things you have learned so far, gives more flexibility.

## Basic Map Data

Basic map data consisting of latitude and longitude of points along borders is available in the maps package.

The ggplot2 function map_data accesses this data and reorganizes it into a data frame.

library(ggplot2)
gusa <- map_data("state")
##        long      lat group order  region subregion
## 1 -87.46201 30.38968     1     1 alabama      <NA>
## 2 -87.48493 30.37249     1     2 alabama      <NA>
## 3 -87.52503 30.37249     1     3 alabama      <NA>
## 4 -87.53076 30.33239     1     4 alabama      <NA>

The borders function in ggplot2 uses this data to create a simple outline map.

Borders data for Johnson and Linn Counties:

jl_bdr <- map_data("county", "iowa") %>%
filter(subregion %in% c("johnson", "linn"))
jl_bdr
##         long      lat group order region subregion
## 1  -91.34666 41.60819    52   531   iowa   johnson
## 2  -91.35239 41.42485    52   532   iowa   johnson
## 3  -91.47272 41.43058    52   533   iowa   johnson
## 4  -91.48417 41.44777    52   534   iowa   johnson
## 5  -91.48990 41.46495    52   535   iowa   johnson
## 6  -91.50136 41.47642    52   536   iowa   johnson
## 7  -91.49563 41.49934    52   537   iowa   johnson
## 8  -91.50136 41.51079    52   538   iowa   johnson
## 9  -91.51855 41.52225    52   539   iowa   johnson
## 10 -91.82221 41.51652    52   540   iowa   johnson
## 11 -91.81648 41.60819    52   541   iowa   johnson
## 12 -91.82794 41.60819    52   542   iowa   johnson
## 13 -91.82221 41.87175    52   543   iowa   johnson
## 14 -91.35239 41.87175    52   544   iowa   johnson
## 15 -91.34666 41.60819    52   545   iowa   johnson
## 16 -91.82221 42.30148    57   609   iowa      linn
## 17 -91.58730 42.30148    57   610   iowa      linn
## 18 -91.35239 42.30148    57   611   iowa      linn
## 19 -91.35239 41.95197    57   612   iowa      linn
## 20 -91.35239 41.87175    57   613   iowa      linn
## 21 -91.82221 41.87175    57   614   iowa      linn
## 22 -91.82221 42.30148    57   615   iowa      linn

Vertex locations:

ggplot(jl_bdr, aes(x = long,
y = lat)) +
geom_point() +
coord_map() ggplot(jl_bdr, aes(x = long,
y = lat,
group = subregion)) +
geom_point() +
geom_polygon(color = "black", fill = NA) +
coord_map() Fill polygons:

ggplot(jl_bdr, aes(x = long,
y = lat,
fill = subregion)) +
geom_point() +
geom_polygon(color = "black") +
guides(fill = "none") +
coord_map() Simple polygon data along with geom_polygon() works reasonably well for many simple applications.

For more complex applications it is best to use special spatial data structures.

An approach becoming widely adopted in many open-source frameworks for working with spatial data is called simple features.

There are 63 regions in the state boundaries data because several states consist of disjoint regions.

A selection:

filter(gusa, region %in% c("massachusetts", "michigan", "virginia")) %>%
select(region, subregion) %>%
unique()
##             region         subregion
## 1    massachusetts martha's vineyard
## 37   massachusetts              main
## 257  massachusetts         nantucket
## 287       michigan             north
## 747       michigan             south
## 1117      virginia        chesapeake
## 1213      virginia      chincoteague
## 1228      virginia              main

The map can be drawn using geom_polygon: p_usa <- ggplot(gusa) +
geom_polygon(aes(x = long, y = lat,
group = group),
fill = NA,
color = "darkgrey")
p_usa

This map looks odd since it is not using a sensible projection for converting the spherical longitude/latitude coordinates to a flat surface.

## Projections

Longitude/latitude position points on a sphere; maps are drawn on a flat surface.

One degree of latitude corresponds to about 69 miles everywhere.

But one degree of longitude is shorter farther away from the equator.

In Iowa City, at latitude 41.6 degrees north, one degree of longitude corresponds to about

$69 \times \cos(41.6 / 90 \times \pi / 2) \approx 52$

miles.

A commonly used projection is the Mercator projection.

The Mercator projection is a cylindrical projection that preserves angles but distorts areas far away from the equator.

The Mercator projection is the default projection used by coord_map:

p_usa + coord_map() Other projections can be specified instead.

One alternative is The Bonne projection.

p_usa +
coord_map("bonne", parameters = 41.6) The Bonne projection is an equal area projection that preserves shapes around a standard latitude but distorts farther away.

This preserves shapes at the latitude of Iowa City.

The projections applied with coord_map will be used in all layers.

## Wind Turbines in Iowa

The wind_turbines data frame in package dviz.supp contains information about wind turbines set up through 2018.

More recent data is available here.

The locations for Iowa can be shown on a map of Iowa county borders.

First, get the wind turbine data:

data(wind_turbines, package = "dviz.supp")
wt_IA <- filter(wind_turbines,
t_state == "IA") %>%
mutate(p_year =
replace(p_year, p_year < 0, NA))

Next, get the map data:

giowa <- map_data("county", "iowa")

A map of iowa county borders:

p <- ggplot(giowa) +
geom_polygon(
aes(x = long, y = lat,
group = group),
fill = NA, color = "grey") +
coord_map()
p p +
geom_point(aes(x = xlong, y = ylat),
data = wt_IA) Use color to show the year of construction:

p +
geom_point(aes(x = xlong, y = ylat,
color = p_year),
data = wt_IA) +
scale_color_viridis_c() A background map showing features like terrain or city locations and major roads can often help.

Maps are available from various sources, with various restrictions and limitations.

Stamen maps are often a good option.

The ggmap package provides an interface for using Stamen maps.

A Stamen toner map background:

library(ggmap)
m <-
get_stamenmap(
c(-97.2, 40.4, -89.9, 43.6),
zoom = 8,
maptype = "toner")
ggmap(m) County borders with a Stamen map background:

ggmap(m) +
geom_polygon(aes(x = long, y = lat,
group = group),
data = giowa,
fill = NA,
color = "grey") ggmap(m) +
geom_polygon(aes(x = long, y = lat,
group = group),
data = giowa,
fill = NA,
color = "grey") +
geom_point(aes(x = xlong, y = ylat,
color = p_year),
data = wt_IA) +
scale_color_viridis_c() With other backgrounds it might be necessary to choose a different color palette.

## County Population Data

The census bureau provides estimates of populations of US counties.

• Estimates are available in several formats, including CSV.

• The CSV file for 2020-2021 is available at

• The zip-compressed CSV file for 2010-2019 is available at

State and county level population counts can be visualized several ways, e.g.:

• proportional symbol maps;

• choropleth maps (a common choice, but not a good one!).

For a proportional symbol map we need to pick locations for the symbols.

readPEP <- function(fname) {
filter(COUNTY != 0) %>% ## drop state totals
mutate(FIPS = 1000 * STATE + COUNTY) %>%
select(FIPS, county = CTYNAME, state = STNAME,
starts_with("POPESTIMATE")) %>%
pivot_longer(starts_with("POPESTIMATE"),
names_to = "year",
names_prefix = "POPESTIMATE",
values_to = "pop")
}

if (! file.exists("co-est2021-alldata.csv"))
"co-est2021-alldata.csv")

pep2021 <- readPEP("co-est2021-alldata.csv")

For the 2021 data:

head(pep2021)
## # A tibble: 6 × 5
##    FIPS county         state   year     pop
##   <dbl> <chr>          <chr>   <chr>  <int>
## 1  1001 Autauga County Alabama 2020   58877
## 2  1001 Autauga County Alabama 2021   59095
## 3  1003 Baldwin County Alabama 2020  233140
## 4  1003 Baldwin County Alabama 2021  239294
## 5  1005 Barbour County Alabama 2020   25180
## 6  1005 Barbour County Alabama 2021   24964

Counties and states are identified by name and also by FIPS code.

The final three digits identify the county within a state.

The leading one or two digits identify the state.

The FIPS code for Johnson County, IA is 19103.

## Create a Proportional Symbol Map of State Populations

We will need:

• State population counts.

• Locations for symbols to represent these counts.

Aggregate the county populations to the state level:

state_pops <- mutate(pep2021, state = tolower(state)) %>%
filter(year == 2021) %>%
group_by(state) %>%
summarize(pop = sum(pop, na.rm = TRUE))

Using tolower matches the case in the polygon data:

filter(gusa, region == "iowa") %>% head(1)
##        long      lat group order region subregion
## 1 -91.22634 43.49895    14  3539   iowa      <NA>

An alternative would be to use the state FIPS code and the state.fips table.

## Computing Centroids

Marks representing data values for a region are often placed at the region’s centroid, or centers of gravity.

A quick approximation to the centroids of the polygons is to compute the centers of their bounding rectangles.

state_centroids_quick <-
rename(gusa, state = region) %>%
group_by(state) %>%
summarize(x = mean(range(long)),
y = mean(range(lat)))
p_usa +
geom_point(aes(x, y),
data = state_centroids_quick,
color = "blue") +
coord_map() More sophisticated approaches to computing centroids are also available.

Using simple features and the sf package:

sf::sf_use_s2(FALSE)
state_centroids <-
sf::st_as_sf(gusa,
coords = c("long", "lat"),
crs = 4326) %>%
group_by(region) %>%
summarize(do_union = FALSE) %>%
sf::st_cast("POLYGON") %>%
ungroup() %>%
sf::st_centroid() %>%
sf::as_Spatial() %>%
as.data.frame() %>%
rename(x = coords.x1, y = coords.x2, state = region)
sf::sf_use_s2(TRUE)
pp <- p_usa +
geom_point(aes(x, y),
data = state_centroids, color = "red")
pp + coord_map() Comparing the two approaches:

p_usa +
geom_point(aes(x, y),
data = state_centroids_quick,
color = "blue") +
geom_point(aes(x, y),
data = state_centroids,
color = "red") +
coord_map() A projection specified with coord_map will be applied to all layers:

pp + coord_map("bonne", parameters = 41.6) ## Symbol Plots of State Population

Merge in the centroid locations.

state_pops <- inner_join(state_pops, state_centroids, "state")
## # A tibble: 6 × 4
##   state            pop      x     y
##   <chr>          <int>  <dbl> <dbl>
## 1 alabama      5039877  -86.8  32.8
## 2 arizona      7276316 -112.   34.3
## 3 arkansas     3025891  -92.4  34.9
## 4 california  39237836 -120.   37.3
## 5 colorado     5812069 -106.   39.0
## 6 connecticut  3605597  -72.7  41.6

Using inner_join drops cases not included in the lower-48 map data.

A dot plot: ggplot(state_pops) +
geom_point(aes(x = pop,
y = reorder(state, pop))) +
labs(x = "Population", y = "") +
theme(axis.text.y = element_text(size = 10)) +
scale_x_continuous(labels = scales::comma)

The population distribution is heavily skewed; a color coding would need to account for this.

A proportional symbol map:

ggplot(gusa) +
geom_polygon(aes(long, lat,
group = group),
fill = NA, color = "grey") +
geom_point(aes(x, y, size = pop),
data = state_pops) +
scale_size_area() +
coord_map("bonne", parameters = 41.6) +
ggthemes::theme_map() The thermometer symbol approach suggested by Cleveland and McGill (1984) can be emulated using geom_rect:

ggplot(gusa) +
geom_polygon(aes(long, lat,
group = group),
fill = NA, color = "grey") +
coord_map() +
ggthemes::theme_map() +
geom_rect(aes(xmin = x - .4,
xmax = x + .4,
ymin = y - 1,
ymax = y + 2 * (pop / max(pop)) - 1),
data = state_pops) +
geom_rect(aes(xmin = x - .4,
xmax = x + .4,
ymin = y - 1,
ymax = y + 1),
data = state_pops,
fill = NA,
color = "black") To work well along the northeast this would need a strategy similar to the one used by ggrepel for preventing label overlap.

## Choropleth Maps of State Population

A choropleth map needs to have the information for coloring all the pieces of a region.

This can be done by merging using left_join:

sp <- select(state_pops, region = state, pop)
gusa_pop <- left_join(gusa, sp, "region")
##        long      lat group order  region subregion     pop
## 1 -87.46201 30.38968     1     1 alabama      <NA> 5039877
## 2 -87.48493 30.37249     1     2 alabama      <NA> 5039877
## 3 -87.52503 30.37249     1     3 alabama      <NA> 5039877
## 4 -87.53076 30.33239     1     4 alabama      <NA> 5039877
## 5 -87.57087 30.32665     1     5 alabama      <NA> 5039877
## 6 -87.58806 30.32665     1     6 alabama      <NA> 5039877

A first attempt:

ggplot(gusa_pop) +
geom_polygon(aes(long, lat,
group = group,
fill = pop)) +
coord_map("bonne", parameters = 41.6) +
ggthemes::theme_map() This image is dominated by the fact that most state populations are small.

Showing population ranks, or percentile values, can help see the variation a bit better.

spr <- mutate(sp, rpop = rank(pop))
gusa_rpop <- left_join(gusa, spr, "region")
ggplot(gusa_rpop) +
geom_polygon(aes(long, lat,
group = group,
fill = rpop)) +
coord_map("bonne", parameters = 41.6) +
ggthemes::theme_map() Using quintile bins instead of a continuous scale:

spr <-
mutate(spr,
pcls = cut_width(100 * percent_rank(pop),
width = 20,
center = 10))
gusa_rpop <- left_join(gusa, spr, "region")
ggplot(gusa_rpop) +
geom_polygon(aes(long, lat,
group = group,
fill = pcls),
color = "grey") +
coord_map("bonne", parameters = 41.6) +
ggthemes::theme_map() +
scale_fill_brewer(palette = "Reds",
name = "Percentile") ## Choropleth Maps of County Population

For a county-level ggplot map, first get the polygon data frame:

gcounty <- map_data("county")
ggplot(gcounty) +
geom_polygon(aes(long, lat,
group = group),
fill = NA,
color = "black",
size = 0.05) +
coord_map("bonne", parameters = 41.6) We will again need to merge population data with the polygon data.

Using county name is challenging because of mismatches like this:

filter(gcounty, grepl("brien", subregion)) %>% head(1)
##        long      lat group order region subregion
## 1 -95.86156 43.26404   825 26075   iowa    obrien
filter(pep2021, FIPS == 19141, year == 2021) %>% as.data.frame()
##    FIPS         county state year   pop
## 1 19141 O'Brien County  Iowa 2021 14015

A better option is to match on FIPS county code.

The county.fips data frame in the maps package links the FIPS code to region names used by the map data in the maps package.

head(maps::county.fips, 4)
##   fips        polyname
## 1 1001 alabama,autauga
## 2 1003 alabama,baldwin
## 3 1005 alabama,barbour
## 4 1007    alabama,bibb

To attach the FIPS code to the polygons we first need to clean up the county.fips table a bit:

head(filter(maps::county.fips, grepl(":", polyname)), 3)
##    fips                  polyname
## 1 12091     florida,okaloosa:main
## 2 12091     florida,okaloosa:spit
## 3 22099 louisiana,st martin:north

Remove the sub-county regions, remove duplicate rows, and split the polyname variable into region and subregion.

fipstab <-
transmute(maps::county.fips, fips, county = sub(":.*", "", polyname)) %>%
unique() %>%
separate(county, c("region", "subregion"), sep = ",")
##   fips  region subregion
## 1 1001 alabama   autauga
## 2 1003 alabama   baldwin
## 3 1005 alabama   barbour

Then use left_join to merge the FIPS code into the polygon data:

gcounty <- left_join(gcounty, fipstab, c("region", "subregion"))
##        long      lat group order  region subregion fips
## 1 -86.50517 32.34920     1     1 alabama   autauga 1001
## 2 -86.53382 32.35493     1     2 alabama   autauga 1001
## 3 -86.54527 32.36639     1     3 alabama   autauga 1001
## 4 -86.55673 32.37785     1     4 alabama   autauga 1001
## 5 -86.57966 32.38357     1     5 alabama   autauga 1001
## 6 -86.59111 32.37785     1     6 alabama   autauga 1001

Next, we need to pull together the data for the map, adding ranks and quintile bins.

cpop <- filter(pep2021, year == 2021) %>%
select(fips = FIPS, pop) %>%
mutate(rpop = rank(pop),
pcls = cut_width(100 * percent_rank(pop),
width = 20, center = 10))
## # A tibble: 6 × 4
##    fips    pop  rpop pcls
##   <dbl>  <int> <dbl> <fct>
## 1  1001  59095  2256 (60,80]
## 2  1003 239294  2855 (80,100]
## 3  1005  24964  1529 (40,60]
## 4  1007  22477  1446 (40,60]
## 5  1009  59041  2255 (60,80]
## 6  1011  10320   755 (20,40]

Now left join with the county map data:

gcounty_pop <- left_join(gcounty, cpop, "fips")

County level population map using the default continuous color scale:

ggplot(gcounty_pop) +
geom_polygon(aes(long, lat,
group = group,
fill = rpop),
color = "grey",
size = 0.1) +
coord_map("bonne", parameters = 41.6) +
ggthemes::theme_map() ggplot(gcounty_pop) +
geom_polygon(aes(long, lat,
group = group,
fill = rpop),
color = "grey",
size = 0.1) +
geom_polygon(aes(long, lat,
group = group),
fill = NA,
data = gusa,
color = "lightgrey") +
coord_map("bonne", parameters = 41.6) +
ggthemes::theme_map() A discrete scale with a very different color to highlight the counties with missing information:

ggplot(gcounty_pop) +
geom_polygon(aes(long, lat,
group = group,
fill = pcls),
color = "grey",
size = 0.1) +
geom_polygon(aes(long, lat,
group = group),
fill = NA,
data = gusa,
color = "lightgrey") +
coord_map("bonne", parameters = 41.6) +
ggthemes::theme_map() +
scale_fill_brewer(palette = "Reds",
na.value = "blue",
name = "Percentile") Why is there a missing value in South Dakota?

Check whether fipstab provides a FIPS code for every county in the map data:

gcounty_regions <-
select(gcounty, region, subregion) %>%
unique()
anti_join(gcounty_regions, fipstab, c("region", "subregion"))
##         region     subregion
## 1 south dakota oglala dakota

There is also a fipstab entry without map data:

anti_join(fipstab, gcounty_regions, c("region", "subregion"))
##    fips       region subregion
## 1 46113 south dakota   shannon

Shannon County SD was renamed Oglala Lakota County.

It was also given a new FIPS code.

The map data tries to reflect this but gets the spelling wrong.

The pep2021 entry shows the correct FIPS code:

filter(pep2021,
year == 2021,
state == "South Dakota",
grepl("Oglala", county))
## # A tibble: 1 × 5
##    FIPS county               state        year    pop
##   <dbl> <chr>                <chr>        <chr> <int>
## 1 46102 Oglala Lakota County South Dakota 2021  13586

Add an entry with the FIPS code and subregion name matching the map data:

fipstab <-
rbind(fipstab,
data.frame(fips = 46102,
region = "south dakota",
subregion = "oglala dakota"))

Fix up the data:

gcounty <- map_data("county")
gcounty <- left_join(gcounty,
fipstab,
c("region", "subregion"))
gcounty_pop <- left_join(gcounty,
cpop,
"fips")

Redraw the plot: ## A County Population Symbol Map

A symbol map can also be used to show the county populations.

We again need centroids; approximate county centroids should do.

county_centroids <-
group_by(gcounty, fips) %>%
summarize(x = mean(range(long)), y = mean(range(lat)))

Merge the population values into the centroids data:

county_pops <- select(cpop, pop, fips)
county_pops <- inner_join(county_pops, county_centroids, "fips")

A proportional symbol map of county populations:

ggplot(gcounty) +
geom_polygon(aes(long, lat,
group = group),
fill = NA,
col = "grey",
data = gusa) +
geom_point(aes(x, y, size = pop),
data = county_pops) +
scale_size_area() +
coord_map("bonne", parameters = 41.6) +
ggthemes::theme_map() Jittering might be helpful to remove the very regular grid patters in the center.

## Comparing Populations in 2010 and 2021

One possible way to compare spatial data for several time periods is to use a faceted display with one map for each period.

With many periods using animation is also an option.

This can work well when there is a substantial amount of change; an example is available here.

It may not work as well when the changes are more subtle.

To bin the data by quintiles we can use the 2021 values.

ncls <- 6
cls <- quantile(filter(pep2021, year == 2021)$pop, seq(0, 1, len = ncls)) To merge the binned population data into the polygon data we can create a data frame with binned population variables for the two time frames. This will allow a cleaner merge in terms of handling counties with missing data. After selecting the years and the variables we need, we add the binned population and then pivot to a wider form with one row per county: cpop <- bind_rows(filter(pep2019, year == 2010), filter(pep2021, year == 2021)) %>% transmute(fips = FIPS, year, pcls = cut(pop, cls, include.lowest = TRUE)) %>% pivot_wider(values_from = c("pcls"), names_from = "year", names_prefix = "pcls") head(cpop) ## # A tibble: 6 × 3 ## fips pcls2010 pcls2021 ## <dbl> <fct> <fct> ## 1 1001 (3.68e+04,9.71e+04] (3.68e+04,9.71e+04] ## 2 1003 (9.71e+04,9.83e+06] (9.71e+04,9.83e+06] ## 3 1005 (1.87e+04,3.68e+04] (1.87e+04,3.68e+04] ## 4 1007 (1.87e+04,3.68e+04] (1.87e+04,3.68e+04] ## 5 1009 (3.68e+04,9.71e+04] (3.68e+04,9.71e+04] ## 6 1011 (8.7e+03,1.87e+04] (8.7e+03,1.87e+04] This data can now be merged with the polygon data. gcounty_pop <- left_join(gcounty, cpop, "fips") To create a faceted display with ggplot we need the data in tidy form. gcounty_pop_long <- pivot_longer(gcounty_pop, starts_with("pcls"), names_to = "year", names_prefix = "pcls", values_to = "pcls") %>% mutate(year = as.integer(year)) An alternative would be to create two layers and show them in separate facets. ggplot(gcounty_pop_long) + geom_polygon(aes(long, lat, group = group, fill = pcls), color = "grey", size = 0.1) + geom_polygon(aes(long, lat, group = group), fill = NA, data = gusa, color = "lightgrey") + coord_map("bonne", parameters = 41.6) + ggthemes::theme_map() + scale_fill_brewer(palette = "Reds", na.value = "blue") + facet_wrap(~ year, ncol = 2) + theme(legend.position = "right") Since the changes are quite subtle a side-by-side display does not work very well. A more effective approach in this case is to plot the relative changes. This is also a more appropriate use of a choropleth map. Start by adding percent changes to the data. cpop <- bind_rows(filter(pep2019, year == 2010), filter(pep2021, year == 2021)) %>% select(fips = FIPS, pop, year) %>% pivot_wider(values_from = "pop", names_from = "year", names_prefix = "pop") %>% mutate(pchange = 100 * (pop2021 - pop2010) / pop2010) gcounty_pop <- left_join(gcounty, cpop, "fips") A binned version of the percent changes will also be useful. bins <- c(-Inf, -20, -10, -5, 5, 10, 20, Inf) cpop <- mutate(cpop, cpchange = cut(pchange, bins, ordered_result = TRUE)) gcounty_pop <- left_join(gcounty, cpop, "fips") Using a continuous scale with the default palette: pcounty <- ggplot(gcounty_pop) + coord_map("bonne", parameters = 41.6) + ggthemes::theme_map() state_layer <- geom_polygon(aes(long, lat, group = group), fill = NA, data = gusa, color = "lightgrey") county_cont <- geom_polygon(aes(long, lat, group = group, fill = pchange), color = "grey", size = 0.1) pcounty + county_cont + state_layer + theme(legend.position = "right") Using a simple diverging palette: pcounty + county_cont + state_layer + scale_fill_gradient2() + theme(legend.position = "right") For the binned version the default ordered discrete color palette produces: county_disc <- geom_polygon(aes(long, lat, group = group, fill = cpchange), color = "grey", size = 0.1) pcounty + county_disc + state_layer + theme(legend.position = "right") A diverging palette allows increases and decreases to be seen. Using the PRGn palette from ColorBrewer: (p_dvrg_48 <- pcounty + county_disc + state_layer + scale_fill_brewer(palette = "PRGn", na.value = "red", guide = guide_legend(reverse = TRUE))) + theme(legend.position = "right") ## Focusing on Iowa The default continuous palette produces piowa <- ggplot(filter(gcounty_pop, region == "iowa")) + coord_map() + ggthemes::theme_map() pc_cont_iowa <- geom_polygon(aes(long, lat, group = group, fill = pchange), color = "grey", size = 0.2) piowa + pc_cont_iowa + theme(legend.position = "right") Aside: Iowa has 99 counties, which seems a peculiar number. It used to have 100: With the default ordered discrete palette: pc_disc_iowa <- geom_polygon(aes(long, lat, group = group, fill = cpchange), color = "grey", size = 0.2) piowa + pc_disc_iowa + theme(legend.position = "right") Using the PRGn palette, for example, will by default adjust the palette to the levels present in the data, and Iowa does not have counties in all the bins: piowa + pc_disc_iowa + scale_fill_brewer(palette = "PRGn", guide = guide_legend(reverse = TRUE)) + theme(legend.position = "right") Two issues: • The color coding does not match the coding for the national map. • The neutral color is not positioned at the zero level. Adding drop = FALSE to the palette specification preserves the encoding from the full map: (p_dvrg_iowa <- piowa + pc_disc_iowa + scale_fill_brewer(palette = "PRGn", drop = FALSE, guide = guide_legend(reverse = TRUE))) + theme(legend.position = "right") Matching the national map would be particularly important for showing the two together: library(patchwork) (p_dvrg_48 + guides(fill = "none")) + p_dvrg_iowa + plot_layout(guides = "collect", widths = c(3, 1)) ## Some Notes on Choropleth Maps Choropleth maps are very popular, in part because of their visual appeal. But they do have to be used with care. Choropleth maps are good for showing rates or counts per unit of area, such as • amount of rainfall per square meter; • percentage of a region covered by forest. They also work well for measurements that may vary little across a region, as well as for some demographic summaries, such as • average winter temperature in a state or county; • average level of summer humidity in a state or county. • average age of population in a state or county. Choropleth maps should generally not be used for counts, such as number of cases of a disease, since the result is usually just a map of population density. Proportional symbol maps are usually a better choice for showing counts, such as population sizes, in a geographic contexts Choropleth maps can be used to show proportional counts, or counts per area population, but need to be used with care. • Changes in counts by one or two can cause large changes in proportions when population sizes are small. • County level choropleth maps of cancer rates, for example, can change a lot with one additional case in a small county. • When proportions are shown but total counts are what really matters, there is a tendency for viewers to let the area supply the denominator. An example is provided by a choropleth map that has been used to show county level results for the 2016 presidential election. A map showing which of the two major parties received a plurality in each county: if (! file.exists("US_County_Level_Presidential_Results_08-16.csv")) { download.file("http://www.stat.uiowa.edu/~luke/data/US_County_Level_Presidential_Results_08-16.csv", ## nolint "US_County_Level_Presidential_Results_08-16.csv") } pres <- read.csv("US_County_Level_Presidential_Results_08-16.csv") %>% mutate(fips = fips_code, p = dem_2016 / (dem_2016 + gop_2016), win = ifelse(p > 0.5, "DEM", "GOP")) gcounty_pres <- left_join(gcounty, pres, "fips") ggplot(gcounty_pres, aes(long, lat, group = group, fill = win)) + geom_polygon(col = "black", size = 0.05) + geom_polygon(aes(long, lat, group = group), fill = NA, color = "grey30", size = 0.5, data = gusa) + coord_map("bonne", parameters = 41.6) + ggthemes::theme_map() + scale_fill_manual(values = c(DEM = "blue", GOP = "red")) There are many internet posts about a map like this, for example here and here. To a casual viewer this gives the impression of an overwhelming GOP win. But many of the red counties have very small populations. A proportional symbol map more accurately reflects the results, which were very close with a majority of the popular vote going to the Democrats candidate: county_pres <- left_join(county_pops, pres, "fips") ggplot(gcounty) + geom_polygon(aes(long, lat, group = group), fill = NA, col = "grey", data = gusa) + geom_point(aes(x, y, size = pop, color = win), data = county_pres) + scale_size_area(labels = scales::label_number(scale = 1e-6, suffix = " M", trim = FALSE)) + ## scale_color_manual(values = c(DEM = "blue", GOP = "red")) + colorspace::scale_color_discrete_diverging(palette = "Blue-Red 2") + coord_map("bonne", parameters = 41.6) + ggthemes::theme_map() ## Warning: Removed 1 rows containing missing values (geom_point). ## Isopleth Map of Population Density While populations do tend to cluster in cities, they are not entirely clustered at county centroids. Other quantities, such as temperatures, may be measured at discrete points but vary smoothly across a region. Isopleth maps show the contours of a smooth surface. For populations, it is sometimes useful to estimate and visualize a population density surface. To estimate the population density we can use the county centroids weighted by the county population values. The function kde2d from the MASS package can be used to compute kernel density estimates, but does not support the use of weights. A simple modification that does support weights is available here. By default, kde2d estimates the density on a grid over the range of the two variables. source(here::here("kde2d.R")) ds <- with(county_pops, kde2d(x, y, weights = pop)) str(ds) ## List of 3 ##$ x: num [1:25] -124 -122 -119 -117 -115 ...
##  $y: num [1:25] 25.5 26.4 27.4 28.4 29.4 ... ##$ z: num [1:25, 1:25] 1.25e-12 2.46e-11 4.12e-10 5.76e-09 3.76e-08 ...

This result can be converted to a data frame using broom::tidy.

dsdf <- broom::tidy(ds) %>%
rename(Lon = x, Lat = y, dens = z) %>%
mutate(Lat = as.numeric(Lat)) ## ????

A plot of the contours:

ggplot(gusa) +
geom_polygon(aes(long, lat, group = group),
fill = NA,
color = "grey") +
geom_contour(aes(Lon, Lat, z = dens),
data = dsdf) +
coord_map() To produce contours that work better when filled, it is useful to increase the number of grid points and enlarge the range.

ds <- with(county_pops,
kde2d(x, y, weights = pop,
n = 50,
lims = c(-130, -60, 20, 50)))
dsdf <- broom::tidy(ds) %>%
rename(Lon = x, Lat = y, dens = z) %>%
mutate(Lat = as.numeric(Lat)) ## ????
pusa <- ggplot(gusa) +
geom_polygon(aes(long, lat, group = group),
fill = NA,
color = "grey") +
coord_map()
pusa +
geom_contour(aes(Lon, Lat, z = dens),
data = dsdf) A filled contour version can be created using stat_contour and fill = after_stat(level).

pusa +
stat_contour(aes(Lon, Lat, z = dens,
fill = after_stat(level)),
data = dsdf,
geom = "polygon",
alpha = 0.2) To focus on Iowa we need subsets of the populations data and the map data:

iowa_pops <- filter(county_pops,
fips %/% 1000 == 19)
giowa <- filter(gcounty,
fips %/% 1000 == 19)

A proportional symbols plot for the Iowa county populations:

iowa_base <- ggplot(giowa) +
geom_polygon(aes(long, lat,
group = group),
fill = NA,
col = "grey") +
coord_map()

iowa_base +
geom_point(aes(x, y,
size = pop),
data = iowa_pops) +
scale_size_area() +
ggthemes::theme_map() For a density plot it is useful to expand the populations used to include nearby large cities, such as Omaha.

iowa_pops <- filter(county_pops, x > -97 & x < -90 & y > 40 & y < 44)

Density estimates can then be computed for the expanded Iowa data.

dsi <- with(iowa_pops, kde2d(x, y, weights = pop,
n = 50, lims = c(-98, -89, 40, 44.5)))
dsidf <- broom::tidy(dsi) %>%
rename(Lon = x, Lat = y, dens = z) %>%
mutate(Lat = as.numeric(Lat)) ## ???

A simple contour map:

ggplot(giowa) +
geom_polygon(aes(long, lat,
group = group),
fill = NA,
color = "grey") +
geom_contour(aes(Lon, Lat, z = dens),
color = "blue",
na.rm = TRUE,
data = dsidf) +
coord_map() A filled contour map

pdiowa <- ggplot(giowa) +
geom_polygon(aes(long, lat, group = group),
fill = NA,
color = "grey") +
stat_contour(aes(Lon, Lat, z = dens,
fill = after_stat(level)),
color = "black",
alpha = 0.2,
na.rm = TRUE,
data = dsidf,
geom = "polygon")
pdiowa + coord_map() Using xlim and ylim arguments to coord_map to trim the range:

pdiowa +
coord_map(xlim = c(-96.7, -90),
ylim = c(40.3, 43.6)) ## Simple Features

Using geom_polygon and geom_point works well for many simple applications.

For more complex applications it is best to use special spatial data structures.

An approach becoming widely adopted in many open-source frameworks for working with spatial data is called simple features.

The package sf provides a range of tools for operating with simple feature objects, such as:

• computing centroids;

• combining and intersecting objects;

• moving and expanding or shrinking objects.

Some useful vignettes in the sf package:

Other references:

A simple utility function for obtaining map data as sf objects:

map_data_sf <- function(map, region = ".", exact = FALSE, crs = 4326, ...) {
maps::map(map, region, exact = exact, plot = FALSE, fill = TRUE, ...) %>%
sf::st_as_sf(crs = crs) %>%
separate(ID, c("region", "subregion"), sep = ",")
}

Iowa county data:

map_data_sf("county", "iowa") %>% head(4)
## Simple feature collection with 4 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -94.93338 ymin: 40.60552 xmax: -91.06018 ymax: 43.51041
## Geodetic CRS:  WGS 84
##   region subregion                           geom
## 1   iowa     adair MULTIPOLYGON (((-94.24583 4...
## 2   iowa     adams MULTIPOLYGON (((-94.70992 4...
## 3   iowa allamakee MULTIPOLYGON (((-91.22634 4...
## 4   iowa appanoose MULTIPOLYGON (((-92.63009 4...

A basic Iowa county map using sf:

giowa_sf <- map_data_sf("county", "iowa")
giowa_sf <- left_join(giowa_sf,
fipstab,
c("region", "subregion"))
ggplot(giowa_sf) + geom_sf() + coord_sf() wt_IA_sf <-
sf::st_as_sf(wt_IA,
coords = c("xlong", "ylat"),
crs = 4326)
ggplot(giowa_sf) +
geom_sf(fill = "grey95") +
geom_sf(aes(color = p_year),
data = wt_IA_sf) +
scale_color_viridis_c() +
coord_sf() With a Stamen map background:

library(ggmap)
m <-
get_stamenmap(c(-97.2, 40.4, -89.9, 43.6),
zoom = 8,
maptype = "toner")
ggmap(m) +
geom_sf(data = giowa_sf,
fill = NA,
inherit.aes = FALSE) +
geom_sf(aes(color = p_year),
data = wt_IA_sf,
inherit.aes = FALSE) +
scale_color_viridis_c() +
coord_sf() +
labs(x = NULL, y = NULL) A choropleth map of population changes:

giowa_pop_sf <-
left_join(giowa_sf, cpop, "fips")
ggplot(giowa_pop_sf) +
geom_sf(aes(fill = cpchange)) +
ggthemes::theme_map() +
scale_fill_brewer(
palette = "PRGn",
drop = FALSE,
guide = guide_legend(reverse = TRUE)) Proportional symbol map for 2021 Iowa county population estimates:

sf::sf_use_s2(FALSE)
iowa_pop_ctrds <-
mutate(giowa_pop_sf,
geom = sf::st_centroid(geom))
sf::sf_use_s2(TRUE)
ggplot(giowa_sf) +
geom_sf() +
geom_sf(aes(size = pop2021),
data = iowa_pop_ctrds,
show.legend = "point") +
scale_size_area() A USA county map with a Bonne projection centered on Iowa City:

gusa_sf <- map_data_sf("county")
gusa_sf <-
left_join(gusa_sf,
fipstab,
c("region", "subregion"))
gusa_pop_sf <-
left_join(gusa_sf, cpop, "fips")
ggplot(gusa_pop_sf) +
geom_sf(aes(fill = cpchange), size = 0.1) +
scale_fill_brewer(
palette = "PRGn",
na.value = "red",
guide = guide_legend(reverse = TRUE)) +
coord_sf(crs = "+proj=bonne +lat_1=41.6 +lon_0=-95") As an example of computation with sf objects we can revisit the population contour plot.

First create an sf representation of the contours (there may be better ways to do this):

ggpoly2sf <- function(poly, coords = c("long", "lat"),
id = "group", region = "region", crs = 4326) {
sf::st_as_sf(poly, coords = coords, crs = crs) %>%
group_by(!! as.name(id), !! as.name(region)) %>%
summarize(do_union = FALSE) %>%
sf::st_cast("POLYGON") %>%
ungroup() %>%
group_by(!! as.name(region)) %>%
summarize(do_union = FALSE) %>%
ungroup()
}

v <- contourLines(dsi)
dsi_sf <- lapply(seq_along(v),
function(i) { x <- v[[i]]; x\$group <- i; as.data.frame(x) }) %>%
bind_rows() %>%
ggpoly2sf(coords = c("x", "y"), region = "level")

This reproduces our previous plot:

ggplot(giowa_sf) +
geom_sf() +
geom_sf(aes(fill = level),
data = dsi_sf,
alpha = 0.2) With an outline of Iowa from the maps package we can use sf::st_intersection function to show only the part of the contours that is within Iowa.

iowa_sf <- map_data_sf("state", "iowa")
dsi_i_sf <-
sf::st_intersection(iowa_sf, dsi_sf)
ggplot(giowa_sf) +
geom_sf() +
geom_sf(aes(fill = level),
data = dsi_i_sf,
alpha = 0.2) ## Shape Files

The ESRI shapefile format is a widely used vector data format for geographic information systems.

• Many GIS software systems support reading shapefiles.

• The maptools and rgdal packages can be used to read shapefiles in R.

• sf uses these utilities under the hood.

Shapefiles are available from many sources; for example:

Other formats and data bases are also available. Some references:

• Paul Murrell’s R Graphics, 2nd Ed., Chapter 14.

## Shape Files: Census Tracts

Census Tracts are small, relatively permanent subdivisions of a county.

Census tracts generally have population size between 1,200 and 8,000 people.

Tracts for Iowa (state FIPS 19) for 2010:

tract_sf <- tigris::tracts(19, year = 2010)
(p <- ggplot(tract_sf) + geom_sf()) Johnson County:

p %+% filter(tract_sf, COUNTYFP == "103") Polk County:

p %+% filter(tract_sf, COUNTYFP == "153") ## Proportion of Population Without Health Insurance

A lot of data is published at the census tract level.

One example is the proportion of the population that is uninsured available from the American Community Survey.

Data from 2019 is available locally.

dname <- "ACS_2019_5YR_S2701"
if (! file.exists(dname)) {
data_url <- "http://homepage.divms.uiowa.edu/~luke/data"
aname <- paste0(dname, ".tgz")
untar(aname)
file.remove(aname)
}

acs_file <-
file.path(dname,
"ACSST5Y2019.S2701_data_with_overlays_2021-05-05T153143.csv")
acs_data_sf <- read.csv(acs_file, na.strings = "-")[-1, ] %>%
select(GEO_ID,
name = NAME,
population = S2701_C01_001E,
uninsured = S2701_C05_001E) %>%
mutate(across(3:4, as.numeric)) %>%
mutate(GEO_ID = sub("1400000US", "", GEO_ID))

Merge and plot.

plot_data_sf <-
left_join(tract_sf,
acs_data_sf,
c("GEOID10" = "GEO_ID"))

p <- ggplot(plot_data_sf) +
geom_sf(aes(fill = uninsured)) +
scale_fill_distiller(palette = "Reds", direction = 1)
p Johnson county:

p %+%
filter(plot_data_sf, COUNTYFP == "103") Polk county:

p %+%
filter(plot_data_sf, COUNTYFP == "153") For the state-wide map it may help to drop the census tract borders and add county borders:

giowa_sf <- sf::st_as_sf(maps::map("county", "iowa", plot = FALSE, fill = TRUE),
crs = 4326)
ggplot(plot_data_sf) +
geom_sf(aes(fill = uninsured), color = NA) +
geom_sf(fill = NA, color = "darkgrey", data = giowa_sf, size = 0.2) +
scale_fill_distiller(palette = "Reds", direction = 1) +
coord_sf() ## Leaflet Maps

The leaflet package can be used to make interactive versions of these maps using the Leaflet a JavaScript library for interactive maps:

The leaflet package can work with sf objects or the data objects returned by the map function.

The base leaflet plot object:

library(leaflet)
lf_sf <- leaflet(sf::st_transform(plot_data_sf, 4326))

Evaluating these expressions in the R interpreter opens map windows in the browser:

addPolygons(lf_sf, fillColor = ~ colorQuantile("Reds", percent)(percent))

addPolygons(lf_sf, fillColor = ~ colorNumeric("Reds", percent)(percent))

addPolygons(lf_sf, fillColor = ~ colorNumeric("Reds", percent)(percent),
weight = 1, fillOpacity = 0.9)

This code produces an embedded leaflet widget:

addPolygons(lf_sf, fillColor = ~ colorNumeric("Reds", uninsured)(uninsured),
weight = 0.5, color = "black", fillOpacity = 0.9, opacity = 1.0,
highlightOptions = highlightOptions(color = "white",
weight = 2,
bringToFront = TRUE),
label = ~ lapply(paste0(name, "<br/>",
"pct: ", uninsured, "<br/>",
"pop: ", population),
htmltools::HTML))