>
```
## A County Population Symbol Map
A symbol map can also be used to show the county populations.
We again need centroids; approximate county centroids should do.
```{r}
county_centroids <-
group_by(gcounty, fips) %>%
summarize(x = mean(range(long)), y = mean(range(lat)))
```
Merge the population values into the centroids data:
```{r}
county_pops <- select(cpop, pop, fips)
county_pops <- inner_join(county_pops, county_centroids, "fips")
```
A proportional symbol map of county populations:
```{r cpop-syms, eval = FALSE}
ggplot(gcounty) +
geom_polygon(aes(long, lat,
group = group),
fill = NA,
col = "grey",
data = gusa) +
geom_point(aes(x, y, size = pop),
data = county_pops) +
scale_size_area() +
coord_map("bonne", parameters = 41.6) +
ggthemes::theme_map()
```
```{r cpop-syms, echo = FALSE, fig.width = 8}
```
Jittering might be helpful to remove the very regular grid patters in
the center.
## Comparing Populations in 2010 and 2021
One possible way to compare spatial data for several time periods is
to use a faceted display with one map for each period.
With many periods using animation is also an option.
This can work well when there is a substantial amount of change; an
example is available
[here](https://projects.fivethirtyeight.com/mortality-rates-united-states/cardiovascular2/#1980).
It may not work as well when the changes are more subtle.
To bin the data by quintiles we can use the 2021 values.
```{r}
ncls <- 6
cls <- quantile(filter(pep2021, year == 2021)$pop,
seq(0, 1, len = ncls))
```
To merge the binned population data into the polygon data we can
create a data frame with binned population variables for the two time
frames.
This will allow a cleaner merge in terms of handling counties with
missing data.
After selecting the years and the variables we need, we add the binned
population and then pivot to a wider form with one row per county:
```{r}
cpop <-
bind_rows(filter(pep2019, year == 2010),
filter(pep2021, year == 2021)) %>%
transmute(fips = FIPS,
year,
pcls = cut(pop, cls, include.lowest = TRUE)) %>%
pivot_wider(values_from = c("pcls"),
names_from = "year",
names_prefix = "pcls")
head(cpop)
```
This data can now be merged with the polygon data.
```{r}
gcounty_pop <- left_join(gcounty, cpop, "fips")
```
To create a faceted display with `ggplot` we need the data in _tidy_
form.
```{r}
gcounty_pop_long <-
pivot_longer(gcounty_pop, starts_with("pcls"),
names_to = "year",
names_prefix = "pcls",
values_to = "pcls") %>%
mutate(year = as.integer(year))
```
An alternative would be to create two layers and show them in separate
facets.
```{r, fig.width = 14, fig.height = 5, class.source = "fold-hide"}
ggplot(gcounty_pop_long) +
geom_polygon(aes(long, lat, group = group, fill = pcls),
color = "grey", linewidth = 0.1) +
geom_polygon(aes(long, lat, group = group),
fill = NA, data = gusa, color = "lightgrey") +
coord_map("bonne", parameters = 41.6) +
ggthemes::theme_map() +
scale_fill_brewer(palette = "Reds", na.value = "blue") +
facet_wrap(~ year, ncol = 2) +
theme(legend.position = "right")
```
Since the changes are quite subtle a side-by-side display does not
work very well.
A more effective approach in this case is to plot the relative changes.
This is also a (somewhat) more appropriate use of a choropleth map.
Start by adding percent changes to the data.
```{r}
cpop <-
bind_rows(filter(pep2019, year == 2010),
filter(pep2021, year == 2021)) %>%
select(fips = FIPS, pop, year) %>%
pivot_wider(values_from = "pop",
names_from = "year", names_prefix = "pop") %>%
mutate(pchange = 100 * (pop2021 - pop2010) / pop2010)
gcounty_pop <- left_join(gcounty, cpop, "fips")
```
A binned version of the percent changes will also be useful.
```{r}
bins <- c(-Inf, -20, -10, -5, 5, 10, 20, Inf)
cpop <- mutate(cpop, cpchange = cut(pchange, bins, ordered_result = TRUE))
gcounty_pop <- left_join(gcounty, cpop, "fips")
```
Using a continuous scale with the default palette:
```{r, fig.width = 10, class.source = "fold-hide"}
pcounty <- ggplot(gcounty_pop) +
coord_map("bonne", parameters = 41.6) + ggthemes::theme_map()
state_layer <- geom_polygon(aes(long, lat, group = group),
fill = NA, data = gusa, color = "lightgrey")
county_cont <- geom_polygon(aes(long, lat, group = group, fill = pchange),
color = "grey", linewidth = 0.1)
pcounty + county_cont + state_layer + theme(legend.position = "right")
```
Using a simple diverging palette:
```{r, fig.width = 10, class.source = "fold-hide"}
pcounty +
county_cont +
state_layer +
scale_fill_gradient2() +
theme(legend.position = "right")
```
For the binned version the default ordered discrete color palette produces:
```{r, fig.width = 10, class.source = "fold-hide"}
county_disc <- geom_polygon(aes(long, lat, group = group, fill = cpchange),
color = "grey", linewidth = 0.1)
pcounty + county_disc + state_layer + theme(legend.position = "right")
```
A diverging palette allows increases and decreases to be seen.
Using the `PRGn` palette from `ColorBrewer`:
```{r, fig.width = 10, class.source = "fold-hide"}
(p_dvrg_48 <-
pcounty +
county_disc +
state_layer +
scale_fill_brewer(palette = "PRGn",
na.value = "red",
guide = guide_legend(reverse = TRUE))) +
theme(legend.position = "right")
```
## Focusing on Iowa
The default continuous palette produces
```{r cpop-iowa-cont, eval = FALSE, class.source = "fold-hide"}
piowa <- ggplot(filter(gcounty_pop, region == "iowa")) +
coord_map() + ggthemes::theme_map()
pc_cont_iowa <- geom_polygon(aes(long, lat, group = group, fill = pchange),
color = "grey", linewidth = 0.2)
piowa + pc_cont_iowa + theme(legend.position = "right")
```
```{r cpop-iowa-cont, echo = FALSE}
```
Aside: Iowa has 99 counties, which seems a peculiar number.
It used to have 100:
* [Kossuth county](https://en.wikipedia.org/wiki/Kossuth_County,_Iowa)
* [Bancroft county](https://en.wikipedia.org/wiki/Bancroft_County,_Iowa)
With the default ordered discrete palette:
```{r, fig.width = 10, class.source = "fold-hide"}
pc_disc_iowa <- geom_polygon(aes(long, lat, group = group, fill = cpchange),
color = "grey", linewidth = 0.2)
piowa + pc_disc_iowa + theme(legend.position = "right")
```
Using the `PRGn` palette, for example, will by default adjust the
palette to the levels present in the data, and Iowa does not have
counties in all the bins:
```{r, fig.width = 10, class.source = "fold-hide"}
piowa +
pc_disc_iowa +
scale_fill_brewer(palette = "PRGn",
guide = guide_legend(reverse = TRUE)) +
theme(legend.position = "right")
```
Two issues:
* The color coding does not match the coding for the national map.
* The neutral color is not positioned at the zero level.
Adding `drop = FALSE` to the palette specification preserves the
encoding from the full map:
```{r, fig.width = 10, class.source = "fold-hide"}
(p_dvrg_iowa <-
piowa +
pc_disc_iowa +
scale_fill_brewer(palette = "PRGn",
drop = FALSE,
guide = guide_legend(reverse = TRUE))) +
theme(legend.position = "right")
```
Matching the national map would be particularly important for showing
the two together:
```{r, fig.width = 14, class.source = "fold-hide"}
library(patchwork)
(p_dvrg_48 + guides(fill = "none")) + p_dvrg_iowa +
plot_layout(guides = "collect", widths = c(3, 1))
```
## Some Notes on Choropleth Maps
Choropleth maps are very popular, in part because of their visual appeal.
But they do have to be used with care.
Choropleth maps are good for showing rates or counts per unit of
area, such as
* amount of rainfall per square meter;
* percentage of a region covered by forest.
They also work well for measurements that may vary little across a region,
as well as for some demographic summaries, such as
* average winter temperature in a state or county;
* average level of summer humidity in a state or county.
* average age of population in a state or county.
Choropleth maps should generally _not_ be used for counts, such as
number of cases of a disease, since the result is usually just a map
of population density.
Proportional symbol maps are usually a better choice for showing
counts, such as population sizes, in a geographic contexts
Choropleth maps can be used to show proportional counts, or counts per
area, but need to be used with care.
* Changes in counts by one or two can cause large changes in
proportions when population sizes are small.
* County level choropleth maps of cancer rates, for example, can
change a lot with one additional case in a county with a small
population.
* When proportions are shown but total counts are what really matters,
there is a tendency for viewers to let the area supply the
denominator.
An example is provided by a choropleth map that has been used to show
county level results for the 2016 presidential election.
A map showing which of the two major parties received a plurality in
each county:
```{r trump-map, eval = FALSE, class.source = "fold-hide"}
if (! file.exists("US_County_Level_Presidential_Results_08-16.csv")) {
download.file("https://stat.uiowa.edu/~luke/data/US_County_Level_Presidential_Results_08-16.csv", ## nolint
"US_County_Level_Presidential_Results_08-16.csv")
}
pres <- read.csv("US_County_Level_Presidential_Results_08-16.csv") %>%
mutate(fips = fips_code,
p = dem_2016 / (dem_2016 + gop_2016),
win = ifelse(p > 0.5, "DEM", "GOP"))
gcounty_pres <- left_join(gcounty, pres, "fips")
ggplot(gcounty_pres, aes(long, lat, group = group, fill = win)) +
geom_polygon(col = "black", linewidth = 0.05) +
geom_polygon(aes(long, lat, group = group),
fill = NA, color = "grey30", linewidth = 0.5, data = gusa) +
coord_map("bonne", parameters = 41.6) +
ggthemes::theme_map() +
scale_fill_manual(values = c(DEM = "blue", GOP = "red"))
```
```{r trump-map, echo = FALSE}
```
There are many internet posts about a map like this, for example
[here](https://www.core77.com/posts/90771/) and
[here](https://stemlounge.com/muddy-america-color-balancing-trumps-election-map-infographic/).
To a casual viewer this gives the impression of an overwhelming GOP win.
But many of the red counties have very small populations.
A proportional symbol map more accurately reflects the results, which
were very close with a majority of the popular vote going to the
Democrat's candidate:
```{r, fig.width = 10, class.source = "fold-hide"}
county_pres <- left_join(county_pops, pres, "fips")
ggplot(gcounty) +
geom_polygon(aes(long, lat, group = group), fill = NA, col = "grey",
data = gusa) +
geom_point(aes(x, y, size = pop, color = win), data = county_pres) +
scale_size_area(labels = scales::label_number(scale = 1e-6,
suffix = " M",
trim = FALSE)) +
## scale_color_manual(values = c(DEM = "blue", GOP = "red")) +
colorspace::scale_color_discrete_diverging(palette = "Blue-Red 2") +
coord_map("bonne", parameters = 41.6) +
ggthemes::theme_map()
```
```{r, eval = FALSE, include = FALSE}
## Continuous variants
ggplot(gcounty_pres, aes(long, lat, group = group, fill = p)) +
geom_polygon() +
coord_map("bonne", parameters = 41.6) +
ggthemes::theme_map() +
## scale_fill_gradient2(low = "red", high = "blue", midpoint = 0.5)
colorspace::scale_fill_continuous_diverging(palette = "Blue-Red 2",
mid = 0.5, rev = TRUE)
ggplot(gcounty) +
geom_polygon(aes(long, lat, group = group), fill = NA, col = "grey",
data = gusa) +
geom_point(aes(x, y, size = pop, color = p), data = county_pres) +
scale_size_area() +
colorspace::scale_color_continuous_diverging(palette = "Blue-Red 2",
mid = 0.5, rev = TRUE) +
## scale_color_gradient2(low = "red", high = "blue", midpoint = 0.5) +
coord_map("bonne", parameters = 41.6) +
ggthemes::theme_map()
```
```{r, eval = FALSE, include = FALSE}
## approximation to a bivariate palette using alpha
## along the lines of the 'muddy' link
mutate(gcounty_pres, rvotes = rank(dem_2016 + gop_2016)) %>%
ggplot(aes(long, lat, group = group, fill = p, alpha = rvotes)) +
geom_polygon() +
scale_fill_gradient2(midpoint = 0.5, mid = "grey") +
coord_map() +
scale_alpha(range = c(0.1, 1))
```
## Isopleth Map of Population Density
While populations do tend to cluster in cities, they are not entirely
clustered at county centroids.
Other quantities, such as temperatures, may be measured at discrete
points but vary smoothly across a region.
Isopleth maps show the contours of a smooth surface.
For populations, it is sometimes useful to estimate and visualize a
population density surface.
To estimate the population density we can use the county centroids
weighted by the county population values.
The function `kde2d` from the `MASS` package can be used to compute
kernel density estimates, but does not support the use of weights.
A simple modification that does support weights is available
[here](https://stat.uiowa.edu/~luke/classes/STAT4580/kde2d.R).
By default, `kde2d` estimates the density on a grid over the range of
the two variables.
```{r}
source(here::here("kde2d.R"))
ds <- with(county_pops, kde2d(x, y, weights = pop))
str(ds)
```
This result can be converted to a data frame using `broom::tidy`.
```{r, warning = FALSE}
dsdf <- broom::tidy(ds) %>%
rename(Lon = x, Lat = y, dens = z)
```
A plot of the contours:
```{r, fig.width = 10, class.source = "fold-hide"}
ggplot(gusa) +
geom_polygon(aes(long, lat, group = group),
fill = NA,
color = "grey") +
geom_contour(aes(Lon, Lat, z = dens),
data = dsdf) +
coord_map()
```
To produce contours that work better when filled, it is useful to
increase the number of grid points and enlarge the range.
```{r, fig.width = 10, class.source = "fold-hide"}
ds <- with(county_pops,
kde2d(x, y, weights = pop,
n = 50,
lims = c(-130, -60, 20, 50)))
dsdf <- broom::tidy(ds) %>%
rename(Lon = x, Lat = y, dens = z)
pusa <- ggplot(gusa) +
geom_polygon(aes(long, lat, group = group),
fill = NA,
color = "grey") +
coord_map()
pusa +
geom_contour(aes(Lon, Lat, z = dens),
data = dsdf)
```
A filled contour version can be created using `stat_contour` and
`fill = after_stat(level)`.
```{r, fig.width = 10, class.source = "fold-hide"}
pusa +
stat_contour(aes(Lon, Lat, z = dens,
fill = after_stat(level)),
data = dsdf,
geom = "polygon",
alpha = 0.2)
```
To focus on Iowa we need subsets of the populations data and the map data:
```{r}
iowa_pops <- filter(county_pops,
fips %/% 1000 == 19)
giowa <- filter(gcounty,
fips %/% 1000 == 19)
```
A proportional symbols plot for the Iowa county populations:
```{r, class.source = "fold-hide"}
iowa_base <- ggplot(giowa) +
geom_polygon(aes(long, lat,
group = group),
fill = NA,
col = "grey") +
coord_map()
iowa_base +
geom_point(aes(x, y,
size = pop),
data = iowa_pops) +
scale_size_area() +
ggthemes::theme_map()
```
For a density plot it is useful to expand the populations used to
include nearby large cities, such as Omaha.
```{r}
iowa_pops <- filter(county_pops, x > -97 & x < -90 & y > 40 & y < 44)
```
Density estimates can then be computed for the expanded Iowa data.
```{r, warning = FALSE}
dsi <- with(iowa_pops, kde2d(x, y, weights = pop,
n = 50, lims = c(-98, -89, 40, 44.5)))
dsidf <- broom::tidy(dsi) %>%
rename(Lon = x, Lat = y, dens = z)
```
A simple contour map:
```{r, fig.width = 10, class.source = "fold-hide"}
ggplot(giowa) +
geom_polygon(aes(long, lat,
group = group),
fill = NA,
color = "grey") +
geom_contour(aes(Lon, Lat, z = dens),
color = "blue",
na.rm = TRUE,
data = dsidf) +
coord_map()
```
A filled contour map
```{r, fig.width = 10, class.source = "fold-hide"}
pdiowa <- ggplot(giowa) +
geom_polygon(aes(long, lat, group = group),
fill = NA,
color = "grey") +
stat_contour(aes(Lon, Lat, z = dens,
fill = after_stat(level)),
color = "black",
alpha = 0.2,
na.rm = TRUE,
data = dsidf,
geom = "polygon")
pdiowa + coord_map()
```
Using `xlim` and `ylim` arguments to `coord_map` to trim the range:
```{r, fig.width = 10, class.source = "fold-hide"}
pdiowa +
coord_map(xlim = c(-96.7, -90),
ylim = c(40.3, 43.6))
```
## Simple Features
Using `geom_polygon` and `geom_point` works well for many simple
applications.
For more complex applications it is best to use special spatial data
structures.
An approach becoming widely adopted in many open-source frameworks for
working with spatial data is called _simple features_.
The package `sf` provides a range of tools for operating with simple
feature objects, such as:
* computing centroids;
* combining and intersecting objects;
* moving and expanding or shrinking objects.
Some useful vignettes in the `sf` package:
* [Simple Features for R](https://cran.r-project.org/package=sf/vignettes/sf1.html)
* [Reading, Writing and Converting Simple Features](https://cran.r-project.org/package=sf/vignettes/sf2.html)
* [Manipulating Simple Feature Geometries](https://cran.r-project.org/package=sf/vignettes/sf3.html)
Other references:
* Robin Lovelace, Jakub Nowosad, Jannes Muenchow (2019).
_Geocomputation with R_, Chapman and Hall/CRC. Also available [on
line](https://geocompr.robinlovelace.net/).
* Mel Moreno and Mathieu Basille (2018). Drawing beautiful maps
programmatically with R, sf and ggplot2, [Part 1:
Basics](https://r-spatial.org/r/2018/10/25/ggplot2-sf.html),
[Part 2:
Layers](https://r-spatial.org/r/2018/10/25/ggplot2-sf-2.html),
[Part 3:
Layouts](https://r-spatial.org/r/2018/10/25/ggplot2-sf-3.html).
* Claudia A Engel (2019). [Using Spatial Data with
R](https://cengel.github.io/R-spatial/).
* Andrew Heiss (2023). [Making Middle Earth maps with
R](https://www.andrewheiss.com/blog/2023/04/26/middle-earth-mapping-sf-r-gis/).
* Taro Mieno (2023). [R as GIS for
Economists](https://tmieno2.github.io/R-as-GIS-for-Economists/index.html).
A simple utility function for obtaining map data as sf objects:
```{r}
map_data_sf <- function(map, region = ".", exact = FALSE, crs = 4326, ...) {
maps::map(map, region, exact = exact, plot = FALSE, fill = TRUE, ...) %>%
sf::st_as_sf(crs = crs) %>%
separate(ID, c("region", "subregion"), sep = ",") %>%
sf::st_transform(crs)
}
```
Iowa county data:
```{r}
map_data_sf("county", "iowa") %>% head(4)
```
A basic Iowa county map using `sf`:
```{r giowa-sf, eval = FALSE}
giowa_sf <- map_data_sf("county", "iowa")
giowa_sf <- left_join(giowa_sf,
fipstab,
c("region", "subregion"))
ggplot(giowa_sf) + geom_sf() + coord_sf()
```
```{r giowa-sf, echo = FALSE, fig.width = 7}
```
Adding wind turbine locations:
```{r wind-sf, eval = FALSE}
wt_IA_sf <-
sf::st_as_sf(wt_IA,
coords = c("xlong", "ylat"),
crs = 4326)
ggplot(giowa_sf) +
geom_sf(fill = "grey95") +
geom_sf(aes(color = p_year),
data = wt_IA_sf) +
scale_color_viridis_c() +
coord_sf()
```
```{r wind-sf, echo = FALSE, fig.width = 8}
```
With a Stamen map background:
```{r wind-sf-stamen, eval = FALSE}
library(ggmap)
m <-
get_stamenmap(c(-97.2, 40.4, -89.9, 43.6),
zoom = 8,
maptype = "toner")
ggmap(m) +
geom_sf(data = giowa_sf,
fill = NA,
inherit.aes = FALSE) +
geom_sf(aes(color = p_year),
data = wt_IA_sf,
inherit.aes = FALSE) +
scale_color_viridis_c() +
coord_sf() +
labs(x = NULL, y = NULL)
```
```{r wind-sf-stamen, echo = FALSE, fig.width = 8, cache = TRUE, message = FALSE}
```
A choropleth map of population changes:
```{r pchanges-sf, eval = FALSE}
giowa_pop_sf <-
left_join(giowa_sf, cpop, "fips")
ggplot(giowa_pop_sf) +
geom_sf(aes(fill = cpchange)) +
ggthemes::theme_map() +
scale_fill_brewer(
palette = "PRGn",
drop = FALSE,
guide = guide_legend(reverse = TRUE))
```
```{r pchanges-sf, echo = FALSE, fig.width = 8}
```
Proportional symbol map for 2021 Iowa county population estimates:
```{r pop-syms-sf, eval = FALSE}
sf::sf_use_s2(FALSE)
iowa_pop_ctrds <-
mutate(giowa_pop_sf,
geom = sf::st_centroid(geom))
sf::sf_use_s2(TRUE)
ggplot(giowa_sf) +
geom_sf() +
geom_sf(aes(size = pop2021),
data = iowa_pop_ctrds,
show.legend = "point") +
scale_size_area()
```
```{r pop-syms-sf, echo = FALSE, warning = FALSE, message = FALSE, fig.width = 8}
```
A USA county map with a Bonne projection centered on Iowa City:
```{r pchanges-usa-sf, eval = FALSE}
gusa_sf <- map_data_sf("county")
gusa_sf <-
left_join(gusa_sf,
fipstab,
c("region", "subregion"))
gusa_pop_sf <-
left_join(gusa_sf, cpop, "fips")
ggplot(gusa_pop_sf) +
geom_sf(aes(fill = cpchange), size = 0.1) +
scale_fill_brewer(
palette = "PRGn",
na.value = "red",
guide = guide_legend(reverse = TRUE)) +
coord_sf(crs = "+proj=bonne +lat_1=41.6 +lon_0=-95")
```
```{r pchanges-usa-sf, echo = FALSE, fig.width = 8}
```
As an example of computation with `sf` objects we can revisit the
population contour plot.
First create an `sf` representation of the contours (there may be
better ways to do this):
```{r}
ggpoly2sf <- function(poly,
coords = c("long", "lat"),
id = "group",
region = "region",
crs = 4326) {
sf::st_as_sf(poly, coords = coords, crs = crs) %>%
group_by(!! as.name(id), !! as.name(region)) %>%
summarize(do_union = FALSE) %>%
sf::st_cast("POLYGON") %>%
ungroup() %>%
group_by(!! as.name(region)) %>%
summarize(do_union = FALSE) %>%
ungroup() %>%
sf::st_transform(crs)
}
```
```{r}
sf_contourLines <- function(xyz, crs = 4326) {
v <- contourLines(xyz)
line_df <- function(i) {
x <- v[[i]]
x$group <- i
as.data.frame(x)
}
lapply(seq_along(v), line_df) %>%
bind_rows() %>%
ggpoly2sf(coords = c("x", "y"),
region = "level",
crs = crs)
}
```
```{r, message = FALSE}
dsi_sf <- sf_contourLines(dsi)
```
This reproduces our previous plot:
```{r contour-sf-1, eval = FALSE, fig.width = 7}
ggplot(giowa_sf) +
geom_sf() +
geom_sf(aes(fill = level),
data = dsi_sf,
alpha = 0.2)
```
```{r contour-sf-1, echo = FALSE, fig.width = 7}
```
With an outline of Iowa from the `maps` package we can use
`sf::st_intersection` function to show only the part of the contours
that is within Iowa.
```{r, message = FALSE, warning = FALSE}
iowa_sf <- map_data_sf("state", "iowa")
```
```{r, message = FALSE, warning = FALSE}
dsi_i_sf <-
sf::st_intersection(iowa_sf, dsi_sf)
```
```{r conrour-sf-2, eval = FALSE}
ggplot(giowa_sf) +
geom_sf() +
geom_sf(aes(fill = level),
data = dsi_i_sf,
alpha = 0.2)
```
```{r conrour-sf-2, echo = FALSE, fig.width = 8, message = FALSE, warning = FALSE}
```
For the lower 48 states:
```{r, warning = FALSE, message = FALSE, fig.height = 4, class.source = "fold-hide"}
states_sf <- map_data_sf("state")
ds_sf <- sf_contourLines(ds)
ggplot(states_sf) +
geom_sf() +
geom_sf(aes(fill = level),
data = ds_sf,
alpha = 0.2)
```
```{r, warning = FALSE, message = FALSE, fig.height = 4, class.source = "fold-hide"}
sf::sf_use_s2(FALSE)
usa_sf <- sf::st_make_valid(map_data_sf("usa"))
ds_i_sf <- sf::st_intersection(usa_sf, ds_sf)
sf::sf_use_s2(TRUE)
ggplot(states_sf) +
geom_sf() +
geom_sf(aes(fill = level),
data = ds_i_sf,
alpha = 0.2)
```
The cleanup steps using `sf_use_s2()` and `st_make_valid()` can be
avoided by using better map data.
## Shape Files
The [ESRI](https://www.esri.com/en-us/home)
[_shapefile_ format](https://en.wikipedia.org/wiki/Shapefile) is a
widely used vector data format for geographic information systems.
* Many GIS software systems support reading shapefiles.
* The `maptools` and `rgdal` packages can be used to read shapefiles
in R.
* `sf` uses these utilities under the hood.
Shapefiles are available from many sources; for example:
* The [TIGER](https://www.census.gov/programs-surveys/geography.html)
site at the Census Bureau. The
[`tigris`](https://cran.r-project.org/package=tigris) package
provides an R interface.
* The [Natural Earth Project](https://www.naturalearthdata.com/). The
[rnaturalearth](https://cran.r-project.org/package=rnaturalearth)
package provides an R interface.
Other formats and data bases are also available. Some references:
* Paul Murrell's _R Graphics, 2nd Ed._, Chapter 14.
* The [CRAN Task View](https://cran.r-project.org/web/views/) for
[Analysis of Spatial Data](https://cran.r-project.org/view=Spatial)
* [Introduction to visualising spatial data in
R](https://cran.r-project.org/doc/contrib/intro-spatial-rl.pdf).
* [Geocomputation with R](https://geocompr.robinlovelace.net/).
State population isopleth map using shape files from the `tigris` package:
```{r, message = FALSE, results = FALSE}
options(tigris_use_cache = TRUE)
ssf <- tigris::states(resolution = "20m",
year = 2022,
cb = TRUE) %>%
filter(! NAME %in% c("Alaska", "Hawaii"))
```
```{r, message = FALSE}
ds_sf <- sf_contourLines(ds,
crs = sf::st_crs(ssf))
```
```{r state-isopleth-tigris, eval = FALSE}
ggplot(ssf) +
geom_sf() +
geom_sf(aes(fill = level),
alpha = 0.2,
data = sf::st_intersection(ssf,
ds_sf))
```
```{r state-isopleth-tigris, echo = FALSE, warning = FALSE}
```
## Shape Files: Census Tracts
Census Tracts are small, relatively permanent subdivisions of a county.
Census tracts generally have population size between 1,200 and 8,000 people.
Tracts for Iowa (state FIPS 19) for 2010:
```{r load-iowa-2010-tracts, include = FALSE}
tract_sf <- tigris::tracts(19, year = 2010)
```
```{r, eval = FALSE}
<>
```
```{r plot-iowa-cencus-tracts, eval = FALSE}
(p <- ggplot(tract_sf) + geom_sf())
```
```{r plot-iowa-cencus-tracts, echo = FALSE}
```
Johnson County:
```{r}
p %+% filter(tract_sf, COUNTYFP == "103")
```
Polk County:
```{r}
p %+% filter(tract_sf, COUNTYFP == "153")
```
## Proportion of Population Without Health Insurance
A lot of data is published at the census tract level.
One example is the proportion of the population that is uninsured
available from the [American Community
Survey](https://www.census.gov/programs-surveys/acs).
Data from 2019 is available locally.
Download the data.
```{r, class.source = "fold-hide"}
dname <- "ACS_2019_5YR_S2701"
if (! file.exists(dname)) {
data_url <- "https://stat.uiowa.edu/~luke/data"
aname <- paste0(dname, ".tgz")
download.file(paste(data_url, aname, sep = "/"), aname)
untar(aname)
file.remove(aname)
}
```
Read the data.
```{r, class.source = "fold-hide"}
acs_file <-
file.path(dname,
"ACSST5Y2019.S2701_data_with_overlays_2021-05-05T153143.csv")
acs_data_sf <- read.csv(acs_file, na.strings = "-")[-1, ] %>%
select(GEO_ID,
name = NAME,
population = S2701_C01_001E,
uninsured = S2701_C05_001E) %>%
mutate(across(3:4, as.numeric)) %>%
mutate(GEO_ID = sub("1400000US", "", GEO_ID))
```
Merge and plot.
```{r plot-iowa-uninsured, eval = FALSE, class.source = "fold-hide"}
plot_data_sf <-
left_join(tract_sf,
acs_data_sf,
c("GEOID10" = "GEO_ID"))
p <- ggplot(plot_data_sf) +
geom_sf(aes(fill = uninsured)) +
scale_fill_distiller(palette = "Reds",
direction = 1)
p
```
```{r plot-iowa-uninsured, echo = FALSE, fig.width = 9}
```
Johnson county:
```{r, class.source = "fold-hide"}
p %+%
filter(plot_data_sf, COUNTYFP == "103")
```
Polk county:
```{r, class.source = "fold-hide"}
p %+%
filter(plot_data_sf, COUNTYFP == "153")
```
For the state-wide map it may help to drop the census tract borders
and add county borders:
```{r plot-uninsured-2, eval = FALSE, class.source = "fold-hide"}
giowa_sf <- sf::st_as_sf(maps::map("county", "iowa", plot = FALSE, fill = TRUE),
crs = 4326)
ggplot(plot_data_sf) +
geom_sf(aes(fill = uninsured), color = NA) +
geom_sf(fill = NA, color = "darkgrey", data = giowa_sf, size = 0.2) +
scale_fill_distiller(palette = "Reds", direction = 1) +
coord_sf()
```
```{r plot-uninsured-2, echo = FALSE, fig.width = 9}
```
## Interactive Maps: Leaflet
The [`leaflet` package](https://rstudio.github.io/leaflet/) can be
used to make interactive versions of these maps using the
[Leaflet](https://leafletjs.com/) JavaScript library for
interactive maps:
The `leaflet` package can work with `sf` objects or the data objects
returned by the `map` function.
The base leaflet plot object:
```{r}
library(leaflet)
lf_sf <- leaflet(sf::st_transform(plot_data_sf, 4326))
```
Evaluating these expressions in the R interpreter opens map windows in
the browser:
```{r, eval = FALSE}
addPolygons(lf_sf, fillColor = ~ colorQuantile("Reds", percent)(percent))
addPolygons(lf_sf, fillColor = ~ colorNumeric("Reds", percent)(percent))
addPolygons(lf_sf, fillColor = ~ colorNumeric("Reds", percent)(percent),
weight = 1, fillOpacity = 0.9)
```
This code produces an embedded leaflet widget:
```{r}
addPolygons(lf_sf, fillColor = ~ colorNumeric("Reds", uninsured)(uninsured),
weight = 0.5, color = "black", fillOpacity = 0.9, opacity = 1.0,
highlightOptions = highlightOptions(color = "white",
weight = 2,
bringToFront = TRUE),
label = ~ lapply(paste0(name, "
",
"pct: ", uninsured, "
",
"pop: ", population),
htmltools::HTML))
```
Like `ggmap`, leaflet maps can include a base map as background:
```{r, eval = FALSE}
lfplot_tile <- addProviderTiles(lf_sf, providers$Stamen.Toner)
addPolygons(lfplot_tile, fillColor = ~ colorQuantile("Reds", percent)(percent),
weight = 0.5)
lfplot_tile <- addProviderTiles(lf_sf, providers$Stamen.Terrain)
addPolygons(lfplot_tile, fillColor = ~ colorQuantile("Reds", percent)(percent),
weight = 0.5)
```
## Interactive Maps: Plotly
```{r, warning = FALSE, fig.height = 8, fig.width = 10, class.source = "fold-hide"}
p <- ggplot(plot_data_sf) +
geom_sf(aes(fill = uninsured,
text = paste0(name, "\n",
"pct: ", uninsured, "\n",
"pop: ", population)),
color = NA) +
geom_sf(fill = NA, color = "darkgrey", data = giowa_sf, size = 0.2) +
scale_fill_distiller(palette = "Reds", direction = 1) +
coord_sf() +
ggthemes::theme_map()
plotly::ggplotly(p, tooltip = "text") %>%
plotly::style(hoverlabel = list(bgcolor = "white"))
```
## Interactive Maps: Ggiraph
```{r, fig.width = 10, class.source = "fold-hide"}
p <- ggplot(plot_data_sf) +
ggiraph::geom_sf_interactive(
aes(fill = uninsured,
tooltip = paste0(name, "\n",
"pct: ", uninsured, "\n",
"pop: ", population)),
color = NA) +
geom_sf(fill = NA, color = "darkgrey", data = giowa_sf, size = 0.2) +
scale_fill_distiller(palette = "Reds", direction = 1) +
coord_sf() +
ggthemes::theme_map()
ggiraph::girafe(ggobj = p)
```
## Shape Files: Zip Code Boundaries
Data is often aggregated at a zip code level.
Shape files for zip code boundaries are available from the Census
Bureau TIGER project and can be accessed with the `zctas` function in
the `tigris` package.
```{r zip_code_file_load, eval = FALSE, class.source = "fold-hide"}
library(sf)
library(tigris)
library(ggplot2)
z_sf <-
zctas(cb = TRUE,
starts_with = c("50", "51", "52"),
class = "sf",
year = 2015)
```
```{r, cache = TRUE, include = FALSE, ref.label = "zip_code_file_load"}
## separate chunk to avoid output
```
A map of the Iowa zip code regions:
```{r iowa-zip-code-map, eval = FALSE, class.source = "fold-hide"}
ggplot(z_sf) +
geom_sf(fill = NA,
color = "black",
size = 0.1) +
coord_sf()
```
```{r iowa-zip-code-map, echo = FALSE, fig.width = 9}
```
Zip codes within Johnson County:
```{r, warning = FALSE, class.source = "fold-hide"}
jc_sf <- map_data_sf("county", "iowa") %>%
filter(subregion == "johnson") %>%
sf::st_transform(sf::st_crs(z_sf))
ggplot(sf::st_intersection(z_sf, jc_sf)) +
geom_sf() +
geom_sf_label(aes(label = GEOID10))
```
Zip codes that intersect Johnson County:
```{r, warning = FALSE, class.source = "fold-hide"}
filter(z_sf,
sf::st_intersects(z_sf,
jc_sf,
sparse = FALSE)) %>%
ggplot() +
geom_sf(color = "black", data = jc_sf) +
geom_sf(fill = NA) +
geom_sf_label(aes(label = GEOID10))
```
## A Tile Map of State Populations
One package available for constructing state-level tile maps is
`statebins`.
The package is based on `ggplot` and provides functions for handling
continuous and discrete data.
```{r, fig.width = 10}
spops <- group_by(pep2021, state, year) %>%
summarize(pop = sum(pop)) %>%
ungroup() %>%
filter(year == 2021)
library(statebins)
ggplot(spops) +
geom_statebins(aes(fill = pop, state = state)) +
coord_equal() +
theme_statebins() +
scale_fill_distiller(palette = "Reds", direction = 1)
```
## A Hex Map of State Populations
An [R Graph Gallery
contribution](https://r-graph-gallery.com/328-hexbin-map-of-the-usa.html)
shows how to create a hexagonal map using layout data available
[here](https://team.carto.com/u/andrew/tables/andrew.us_states_hexgrid/public/map). Several
formats are available, including a shape file.
```{r, fig.width = 10, class.source = "fold-hide"}
## download the shape file
hexlayer <- "us_states_hexgrid"
hexdir <- hexlayer
if (! dir.exists(hexdir)) {
hexzip <- paste0(hexlayer, ".zip")
data_url <- "https://stat.uiowa.edu/~luke/data"
download.file(paste(data_url, hexzip, sep = "/"), hexzip)
dir.create(hexdir)
unzip(hexzip, exdir = hexdir)
file.remove(hexzip)
}
## read shape file and merge with data
hex_pops <- sf::st_read(dsn = hexdir, layer = hexlayer) %>%
mutate(state = sub(" \\(.*", "", google_nam)) %>%
left_join(spops, "state")
ggplot(hex_pops, aes(fill = pop)) +
geom_sf() +
geom_sf_text(aes(label = iso3166_2,
color = ifelse(rank(pop) < 48, "dark", "light"))) +
scale_color_manual(values = c(dark = "black", light = "white"),
guide = "none") +
scale_fill_distiller(palette = "Reds", direction = 1) +
ggthemes::theme_map() +
theme(legend.position = "bottom")
```
A
[gist](https://gist.github.com/hrbrmstr/51f961198f65509ad863#file-us_states_hexgrid-geojson)
and a [blog
post](https://rud.is/b/2015/05/15/u-s-drought-monitoring-with-hexbin-state-maps-in-r/)
describes another approach to creating a hexagonal tile map.
## A Cartogram of State Populations
Cartograms are another approach.
Cartograms deform regions so their areas match values of a variable.
A [blog
post](https://www.r-bloggers.com/2016/10/cartograms-with-r/) describes how to
create them in R and another [R Graph Gallery
article](https://r-graph-gallery.com/332-hexbin-chloropleth-cartogram.html)
also describes how to create them.
A cartogram version of the hex map reshaped to reflect population size:
```{r, message = FALSE, fig.width = 10, class.source = "fold-hide"}
library(cartogram)
sf::st_transform(hex_pops, 3857) %>%
cartogram_cont("pop") %>%
ggplot() +
geom_sf() +
geom_sf_text(aes(label = iso3166_2), size = 3) +
ggthemes::theme_map()
```
A cartogram version of a standard polygon map reshaped to reflect
population size:
```{r, message = FALSE, fig.width = 10, class.source = "fold-hide"}
states_sf <- sf::st_as_sf(maps::map("state", plot = FALSE, fill = TRUE),
crs = 4326)
left_join(mutate(states_sf, state = as.character(ID)),
mutate(spops,
state.abb = state.abb[match(state, state.name)],
state = tolower(state)),
"state") %>%
sf::st_transform(3857) %>%
cartogram::cartogram_cont("pop") %>%
ggplot() +
geom_sf() +
geom_sf_text(aes(label = state.abb), size = 3) +
ggthemes::theme_map()
```
Cartograms are sometimes used to show electoral college results.
## A Cartogram of 2016 Presidential Election Votes
Scaled by total number of votes in each state:
```{r, results = FALSE, class.source = "fold-hide"}
states_sf <- tigris::states(resolution = "20m", cb = TRUE)
spres2016 <- pres %>%
group_by(fp = fips_code %/% 1000) %>%
summarize(margin = sum(dem_2016 - gop_2016, na.rm = TRUE),
total = sum(dem_2016 + gop_2016 + oth_2016, na.rm = TRUE)) %>%
ungroup() %>%
mutate(win = ifelse(margin > 0, "DEM", "GOP"))
```
```{r, message = FALSE, fig.width = 8, class.source = "fold-hide"}
mutate(states_sf, fp = as.numeric(STATEFP)) %>%
left_join(spres2016, "fp") %>%
left_join(data.frame(NAME = state.name, ABB = state.abb), "NAME") %>%
filter(! (NAME %in% c("Hawaii", "Alaska", "Puerto Rico"))) %>%
sf::st_transform(3857) %>%
cartogram::cartogram_cont("total") %>%
ggplot() +
geom_sf(aes(fill = win)) +
geom_sf_text(aes(label = ABB)) +
scale_fill_manual(values = c(DEM = "blue", GOP = "red")) +
ggthemes::theme_map()
```
## Linked Micromaps
The `micromap` package provides several functions for creating linked
micromaps.
```{r jc-micromap, eval = FALSE, class.source = "fold-hide"}
library(micromap)
maptab <- filter(tract_sf, COUNTYFP == 103) %>%
sf::as_Spatial() %>%
create_map_table(IDcolumn = "GEOID10")
acs_data_jc <- filter(acs_data_sf, grepl("Johnson", name))
lmplot(stat.data = acs_data_jc, map.data = maptab,
grouping = 5, ord.by = "uninsured",
panel.types = c("labels", "dot", "dot", "map"),
panel.data = c("GEO_ID", "uninsured", "population", NA),
map.link = c("GEO_ID", "ID"),
panel.att = list(list(1, text.size = 0.6),
list(2, header = "Pct Uninsured"),
list(3, header = "Population")))
```
```{r jc-micromap, echo = FALSE, message = FALSE, fig.asp = 1.3, fig.width = 6.5}
```
## Odds and Ends
### Including Alaska and Hawaii
Distance and size are challenging:
Mercator:
```{r}
ggplot(states_sf) + geom_sf()
```
Bonne at Iowa City latitude:
```{r}
ggplot(states_sf) +
geom_sf() +
coord_sf(crs = "+proj=bonne +lat_1=41.6 +lon_0=-95")
```
One of several options for obtaining maps that include Alaska and
Hawaii is the `urbnmapr` package available on GitHub.
```{r, message = FALSE, class.source = "fold-hide"}
## remotes::install_github("UrbanInstitute/urbnmapr")
library(urbnmapr)
states_sf <- get_urbn_map(map = "states",
sf = TRUE)
ggplot(states_sf) + geom_sf()
```
```{r, message = FALSE, class.source = "fold-hide"}
counties_sf <- get_urbn_map(map = "counties",
sf = TRUE)
ggplot(counties_sf) + geom_sf()
```
2016 presidential election winners by county:
```{r, fig.width = 10, class.source = "fold-hide"}
## fix Oglala Lakota County SD FIPS
pres <- mutate(pres, fips = ifelse(fips_code == 46113, 46102, fips_code))
counties_sf <- mutate(counties_sf, fips = as.numeric(county_fips))
counties_sf <- left_join(counties_sf, pres, "fips")
NA_col <- "grey80"
ggplot(counties_sf) +
geom_sf(aes(fill = win), size = 0.1) +
scale_fill_manual(values = c(DEM = "blue", GOP = "red", "NA" = NA_col),
na.value = NA_col) +
ggthemes::theme_map()
```
Another option uses is to use shape data and `shift_geometry()` from
the `tigris` package:
```{r, results = FALSE, class.source = "fold-hide"}
tcounties_sf <- tigris::counties(resolution = "20m", cb = TRUE) %>%
## filter(STATE_NAME != "Puerto Rico") %>%
mutate(fips_code = as.numeric(paste0(STATEFP, COUNTYFP))) %>%
left_join(pres, "fips_code") %>%
tigris::shift_geometry()
```
```{r, fig.width = 10, class.source = "fold-hide"}
ggplot(tcounties_sf) +
geom_sf(aes(fill = win)) +
scale_fill_manual(values = c(DEM = "blue", GOP = "red")) +
ggthemes::theme_map()
```
### Dot Density Maps
Dot density map of 2016 election results for Iowa counties (using
`tidycensus::as_dot_density()`):
```{r, class.source = "fold-hide"}
iowa_sf <- filter(counties_sf, state_abbv == "IA")
iowa_sf_long <- pivot_longer(iowa_sf,
contains("20"),
values_to = "count",
names_to = "vname") %>%
separate(vname, c("party", "year"), convert = TRUE) %>%
filter(party %in% c("dem", "gop")) %>%
pivot_wider(names_from = "party", values_from = "count") %>%
mutate(margin = abs(dem - gop), win = ifelse(dem > gop, "DEM", "GOP")) %>%
sf::st_as_sf()
## the GitHub version of tidycensus may be needed for as_dot_density
## remotes::install_github("walkerke/tidycensus")
library(tidycensus)
dd <- pivot_longer(iowa_sf_long, dem | gop,
values_to = "votes",
names_to = "party") %>%
as_dot_density("votes", values_per_dot = 1000, group = "party")
p <- ggplot(filter(iowa_sf_long, year >= 2012)) +
geom_sf() +
geom_sf(aes(color = party),
data = filter(dd, year >= 2012),
size = 1) +
facet_wrap(~ year) +
scale_color_manual(name = "Party", values = c(dem = "blue", gop = "red")) +
coord_sf(crs = "+proj=bonne +lat_1=41.6 +lon_0=-95") +
guides(color = guide_legend(override.aes = list(size = 5))) +
ggthemes::theme_map() +
theme(legend.position = "right")
```
```{r, echo = FALSE, fig.width = 12, fig.height = 6}
## needed to evaluate p if the previous chunk is cached
p
```
An alternative is to show the margin (DEM votes minus GOP Votes) as
proportional symbols:
```{r, message = FALSE, fig.width = 14, fig.height = 7, class.source = "fold-hide"}
ctr <- sf::st_centroid(iowa_sf_long)
ggplot(filter(iowa_sf_long, year >= 2012)) +
geom_sf() +
geom_sf(aes(size = margin, color = win),
data = filter(ctr, year >= 2012)) +
scale_color_manual(name = "Winner", values = c(DEM = "blue", GOP = "red")) +
coord_sf(crs = "+proj=bonne +lat_1=41.6 +lon_0=-95") +
facet_wrap(~year) +
guides(color = guide_legend(override.aes = list(size = 5))) +
ggthemes::theme_map() +
theme(legend.position = "right")
```
Another option is choropleth maps:
```{r, fig.width = 14, fig.height = 7, class.source = "fold-hide"}
## margin/win
filter(iowa_sf_long, year >= 2012) %>%
ggplot(aes(fill = win)) +
geom_sf() +
scale_fill_manual(name = "Winner", values = c(DEM = "blue", GOP = "red")) +
coord_sf(crs = "+proj=bonne +lat_1=41.6 +lon_0=-95") +
ggthemes::theme_map() +
theme(legend.position = "right") +
facet_wrap(~year)
```
### Package [`mapview`](https://r-spatial.github.io/mapview/)
```{r}
filter(ctr, year == 2012) %>%
mapview::mapview(zcol = "win", cex = "margin")
```
```{r}
filter(ctr, year == 2016) %>%
mapview::mapview(zcol = "win", cex = "margin")
```
### Accessing US Governmet Data
A lot of data is made available via [DATA.GOV](https://data.gov/).
Data files are available in various formats.
Web APIs are also available for accessing many data sets.
R packages are available for using many of these APIs.
Some examples:
* [`blsAPI`](https://www.bls.gov/developers/home.htm) for the Bureau
of Labor Statistics (BLS) API.
* [`censusapi`](https://github.com/hrecht/censusapi) for the Census
Bureau API.
* [`rnoaa`](https://github.com/ropensci/rnoaa) for NOAA data.
A book on working with cencus data is
[available](https://walker-data.com/census-r/index.html).
## Other Notes
- Rearranging maps
- [Examples](https://ikashnitsky.github.io/2017/map-hacking/)
using `ggplot`
- The [`usmap`
package](https://cran.r-project.org/package=usmap/vignettes/introduction.html)
for showing Alaska and Hawaii along with the lower 48 states.
- An [animation
example](https://projects.fivethirtyeight.com/mortality-rates-united-states/mental-substance/)
from [_FiveThirtyEight_](https://projects.fivethirtyeight.com).
- An [article](https://www.washingtonpost.com/graphics/2018/national/segregation-us-cities/) using dot-density maps.
- There are many other frameworks and packages to explore.
## Reading
Chapter [_Draw maps_](https://socviz.co/maps.html#maps)
in [_Data Visualization_](https://socviz.co/).
Chapter [_Visualizing geospatial
data_](https://clauswilke.com/dataviz/geospatial-data.html) in
[_Fundamentals of Data
Visualization_](https://clauswilke.com/dataviz/).
## Exercises
1. This code produces a choropleth map of Europe showing deviations in
life expectancy from the continent median in 2007 based on the data
in the `gapminer` package:
```{r, fig.cap = ""}
library(dplyr)
library(ggplot2)
library(gapminder)
eu <- filter(gapminder, year == 2007, continent == "Europe")
m <- map_data("world")
meu <- inner_join(m, eu, c("region" = "country"))
p <- ggplot(meu, aes(x = long, y = lat,
group = group,
fill = lifeExp - median(eu$lifeExp))) +
geom_polygon(color = "black") +
coord_map() +
labs(title = "Life Expectancy Variation in Europe",
subtitle = "Year: 2007",
fill = "Years above\nthe median") +
ggthemes::theme_map()
p
```
A better color scale would use a diverging palette with the
neutral color aligned with zero, corresponding to countries at the
median. Which of the following produces a map with such a color
scale?
a. `p + scale_fill_continuous()`
b. `p + scale_fill_viridis_d()`
c. `p + scale_fill_gradient2()`
d. `p + scale_fill_distiller(type = "div")`