class: center, middle, title-slide # Maps and Geographical Data ### Luke Tierney ### University of Iowa ### 2022-05-04 --- <link rel="stylesheet" href="stat4580.css" type="text/css" /> <style type="text/css"> .remark-code { font-size: 85%; } </style> ## Data Types Many different kinds of data are shown on maps: -- * Points, location data. -- * Lines, routes, connections. -- * Area data, aggregates, rates. -- * Sampled surface data. -- * ... --- layout: true ## Map Types --- There are many different kinds of maps: -- * Dot maps -- * Symbol maps -- * Line maps -- * Choropleth maps -- * Cartograms and tiled maps -- * Linked micromaps -- * ... --- ### Dot Maps .pull-left[ * A dot map of violent crime locations in Houston.  <!-- Data frame `crime` from `ggmap`. Original data from Houston PD. --> ] -- .pull-right[ <!-- * A [racial dot map](https://demographics.virginia.edu/DotMap/) of the US. --> * A [racial dot map](https://all-of-us.benschmidt.org/) of the US. {{content}} ] -- * Gun violence in America [article](https://www.theguardian.com/us-news/ng-interactive/2017/jan/09/special-report-fixing-gun-violence-in-america) in the Guardian. {{content}} -- * [Dot density maps](https://www.axismaps.com/guide/dot-density). {{content}} -- * [Bird migration patterns](https://web.archive.org/web/20200228110412/https://www.canadiangeographic.ca/article/mesmerizing-map-shows-bird-migrations-throughout-year). <!-- * [Comments on the Junk Charts blog]( https://junkcharts.typepad.com/junk_charts/2018/04/hog-wild-about-dot-maps.html) on dot maps. --> --- ### Symbol Maps .pull-left[ <!-- http://ashmalone.blogspot.com/2019/02/proportional-bivariate-choropleth.html --> * Increases and decreases in jobs. <img src="../img/Jobs.png" width="500" /> ] -- .pull-right[ * [How the US generated electricity](https://www.washingtonpost.com/graphics/national/power-plants/?utm_term=.506ef6f03d7c). {{content}} ] -- * [Graduated and proportional symbol maps](https://gisgeography.com/dot-distribution-graduated-symbols-proportional-symbol-maps/). <!-- img/gradsyms.jpeg --> --- ### Line Maps .pull-left[ * [Mapping connections with great circles](https://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/).  ] -- .pull-right[ * [Airline route maps](https://www.airlineroutemaps.com/). {{content}} ] -- * [Wind maps](http://hint.fm/wind/). --- ### Isopleth, Isarithmic, or Contour Maps .pull-left[ * Density contours for Houston violent crime data: <img src="https://github.com/dkahle/ggmap/raw/master/tools/README-styling-1.png" width="400" /> ] -- .pull-right[ * Bird flu deaths in California: <img src="https://i.ytimg.com/vi/UYmJCjPZP5A/maxresdefault.jpg" width="600" /> ] --- ### Choropleth Maps <img src="https://www.bls.gov/web/metro/twmcort.gif" width="500" /><img src="https://www.bls.gov/web/laus/mstrtcr1.gif" width="500" /> --- * [Opinions on climate change](https://www.nytimes.com/interactive/2017/03/21/climate/how-americans-think-about-climate-change-in-six-maps.html?_r=0) -- * [Rise in health insurance coverage under the ACA](http://www.npr.org/sections/health-shots/2017/04/14/522956939/maps-show-a-dramatic-rise-health-in-insurance-coverage-under-aca) from NPR. --- ### Cartograms and Tiled Maps * An [introduction](http://blog.apps.npr.org/2015/05/11/hex-tile-maps.html) from the NPR blog. <img src="http://blog.apps.npr.org/img/posts/2015-05-11-hex-tile-maps/square-tiles.png" width="350" /><img src="http://blog.apps.npr.org/img/posts/2015-05-11-hex-tile-maps/hex-tiles.png" width="350" /><img src="http://blog.apps.npr.org/img/posts/2015-05-11-hex-tile-maps/cartogram.jpg" width="350" /> -- * One approach to [cartograms in R](http://staff.math.su.se/hoehle/blog/2016/10/10/cartograms.html). --- ### Linked Micromaps .pull-left[ * Poverty and education. <!-- https://www.researchgate.net/figure/A-linked-micromap-redisplay-of-poverty-and-college-education-data-from-Fig-1-with-maps_fig2_228101144 --> <img src="../img/linked-micromap-poverty-education.png" width="650" /> ] -- .pull-right[ * A [D3 approach](http://bl.ocks.org/jazzido/raw/5962277/). ] --- layout: true ## Drawing Maps --- At the simplest level drawing a map involves; -- * possibly loading a background image, such as terrain; -- * drawing or filling polygons that represent region boundaries; -- * adding points, lines, and other annotations. --- But there are a number of complications: -- * you need to obtain the polygons representing the regions; -- * regions with several pieces; -- * holes in regions; lakes in islands in lakes; -- * projections for mapping spherical longitude/lattitude coordinates to a plane; -- * aspect ratio; -- * making sure map and annotation layers are consistent in their use of coordinates and aspect ratio; -- * connecting your data to the geographic features. -- High level tools are available for creating some standard map types. -- These tools usually have some limitations. -- Creating maps directly, building on the things you have learned so far, gives more flexibility. --- layout: true ## Basic Map Data --- Basic map data consisting of latitude and longitude of points along borders is available in the `maps` package. -- The `ggplot2` function `map_data` accesses this data and reorganizes it into a data frame. ```r library(ggplot2) gusa <- map_data("state") head(gusa, 4) ## long lat group order region subregion ## 1 -87.46201 30.38968 1 1 alabama <NA> ## 2 -87.48493 30.37249 1 2 alabama <NA> ## 3 -87.52503 30.37249 1 3 alabama <NA> ## 4 -87.53076 30.33239 1 4 alabama <NA> ``` -- The `borders` function in `ggplot2` uses this data to create a simple outline map. --- .pull-left.tiny-code[ Borders data for Johnson and Linn Counties: ```r jl_bdr <- map_data("county", "iowa") %>% filter(subregion %in% c("johnson", "linn")) jl_bdr ## long lat group order region subregion ## 1 -91.34666 41.60819 52 531 iowa johnson ## 2 -91.35239 41.42485 52 532 iowa johnson ## 3 -91.47272 41.43058 52 533 iowa johnson ## 4 -91.48417 41.44777 52 534 iowa johnson ## 5 -91.48990 41.46495 52 535 iowa johnson ## 6 -91.50136 41.47642 52 536 iowa johnson ## 7 -91.49563 41.49934 52 537 iowa johnson ## 8 -91.50136 41.51079 52 538 iowa johnson ## 9 -91.51855 41.52225 52 539 iowa johnson ## 10 -91.82221 41.51652 52 540 iowa johnson ## 11 -91.81648 41.60819 52 541 iowa johnson ## 12 -91.82794 41.60819 52 542 iowa johnson ## 13 -91.82221 41.87175 52 543 iowa johnson ## 14 -91.35239 41.87175 52 544 iowa johnson ## 15 -91.34666 41.60819 52 545 iowa johnson ## 16 -91.82221 42.30148 57 609 iowa linn ## 17 -91.58730 42.30148 57 610 iowa linn ## 18 -91.35239 42.30148 57 611 iowa linn ## 19 -91.35239 41.95197 57 612 iowa linn ## 20 -91.35239 41.87175 57 613 iowa linn ## 21 -91.82221 41.87175 57 614 iowa linn ## 22 -91.82221 42.30148 57 615 iowa linn ``` ] --- .pull-left[ Vertex locations: ```r ggplot(jl_bdr, aes(x = long, y = lat)) + geom_point() + coord_map() ``` ] .pull-right[ <!-- --> ] --- .pull-left[ Add polygons: ```r ggplot(jl_bdr, aes(x = long, y = lat, group = subregion)) + geom_point() + geom_polygon(color = "black", fill = NA) + coord_map() ``` ] .pull-right[ <!-- --> ] --- .pull-left[ Fill polygons: ```r ggplot(jl_bdr, aes(x = long, y = lat, fill = subregion)) + geom_point() + geom_polygon(color = "black") + guides(fill = "none") + coord_map() ``` ] .pull-right[ <!-- --> ] --- Simple polygon data along with `geom_polygon()` works reasonably well for many simple applications. -- For more complex applications it is best to use special spatial data structures. -- An approach becoming widely adopted in many open-source frameworks for working with spatial data is called [simple features](file:///home/luke/writing/classes/uiowa/STAT4580/webpage/maps.html#simple-features). --- There are 63 regions in the state boundaries data because several states consist of disjoint regions. -- A selection: ```r filter(gusa, region %in% c("massachusetts", "michigan", "virginia")) %>% select(region, subregion) %>% unique() ## region subregion ## 1 massachusetts martha's vineyard ## 37 massachusetts main ## 257 massachusetts nantucket ## 287 michigan north ## 747 michigan south ## 1117 virginia chesapeake ## 1213 virginia chincoteague ## 1228 virginia main ``` --- The map can be drawn using `geom_polygon`: .pull-left[ <!-- --> ] -- .pull-right[ ```r p_usa <- ggplot(gusa) + geom_polygon(aes(x = long, y = lat, group = group), fill = NA, color = "darkgrey") p_usa ``` {{content}} ] -- This map looks odd since it is not using a sensible _projection_ for converting the spherical longitude/latitude coordinates to a flat surface. --- layout: true ## Projections --- Longitude/latitude position points on a sphere; maps are drawn on a flat surface. -- One degree of latitude corresponds to about 69 miles everywhere. -- But one degree of longitude is shorter farther away from the equator. -- In Iowa City, at latitude 41.6 degrees north, one degree of longitude corresponds to about $$ 69 \times \cos(41.6 / 90 \times \pi / 2) \approx 52 $$ miles. -- A commonly used projection is the [_Mercator projection_](https://en.wikipedia.org/wiki/Mercator_projection). -- The Mercator projection is a cylindrical projection that preserves angles but distorts areas far away from the equator. --- The Mercator projection is the default projection used by `coord_map`: .pull-left[ ```r p_usa + coord_map() ``` <!-- --> ] -- .pull-right[ Other projections can be specified instead. ] --- One alternative is The [_Bonne projection_](https://en.wikipedia.org/wiki/Bonne_projection). .pull-left[ ```r p_usa + * coord_map("bonne", parameters = 41.6) ``` <!-- --> ] -- .pull-right[ The Bonne projection is an equal area projection that preserves shapes around a standard latitude but distorts farther away. {{content}} ] -- This preserves shapes at the latitude of Iowa City. {{content}} -- The projections applied with `coord_map` will be used in all layers. --- layout: true ## Wind Turbines in Iowa --- The `wind_turbines` data frame in package `dviz.supp` contains information about wind turbines set up through 2018. -- More recent data is available [here](https://eerscmap.usgs.gov/uswtdb/data/). -- The locations for Iowa can be shown on a map of Iowa county borders. -- .pull-left[ First, get the wind turbine data: .smaller-code[ ```r data(wind_turbines, package = "dviz.supp") wt_IA <- filter(wind_turbines, t_state == "IA") %>% mutate(p_year = replace(p_year, p_year < 0, NA)) ``` ] ] --- .pull-left[ Next, get the map data: ```r giowa <- map_data("county", "iowa") ``` {{content}} ] -- A map of iowa county borders: ```r p <- ggplot(giowa) + geom_polygon( aes(x = long, y = lat, group = group), fill = NA, color = "grey") + coord_map() p ``` -- .pull-right[ <!-- --> ] --- .pull-left[ Add the wind turbine locations: ```r p + geom_point(aes(x = xlong, y = ylat), data = wt_IA) ``` ] .pull-right[ <!-- --> ] --- .pull-left[ Use color to show the year of construction: ```r p + geom_point(aes(x = xlong, y = ylat, color = p_year), data = wt_IA) + scale_color_viridis_c() ``` ] .pull-right[ <!-- --> ] --- .pull-left[ A background map showing features like terrain or city locations and major roads can often help. {{content}} ] -- Maps are available from various sources, with various restrictions and limitations. {{content}} -- [Stamen maps](https://stamen.com/open-source/) are often a good option. {{content}} -- The `ggmap` package provides an interface for using Stamen maps. {{content}} -- A Stamen _toner_ map background: ```r library(ggmap) m <- get_stamenmap( c(-97.2, 40.4, -89.9, 43.6), zoom = 8, maptype = "toner") ggmap(m) ``` -- .pull-right[ <!-- --> ] --- .pull-left[ County borders with a [Stamen map](https://stamen.com/open-source/) background: ```r ggmap(m) + geom_polygon(aes(x = long, y = lat, group = group), data = giowa, fill = NA, color = "grey") ``` ] .pull-right[ <!-- --> ] --- .pull-left[ Add the wind turbine locations: ```r ggmap(m) + geom_polygon(aes(x = long, y = lat, group = group), data = giowa, fill = NA, color = "grey") + geom_point(aes(x = xlong, y = ylat, color = p_year), data = wt_IA) + scale_color_viridis_c() ``` ] .pull-right[ <!-- --> {{content}} ] -- With other backgrounds it might be necessary to choose a different color palette. --- layout: true ## County Population Data --- <!-- Old csv data may be here: https://www.census.gov/programs-surveys/popest.html https://www.census.gov/programs-surveys/popest/data/tables.html They seem to want you to use API for new stuff. A couple of R packages do that. The 2020-2021 data seems to be here: https://www2.census.gov/programs-surveys/popest/datasets/2020-2021/counties/totals/co-est2021-alldata.csv --> The census bureau provides [estimates](https://data.census.gov/) of populations of US counties. -- * Estimates are available in several formats, including CSV. -- * The CSV file for 2020-2021 is available at <http://www.stat.uiowa.edu/~luke/data/co-est2021-alldata.csv>. -- * The zip-compressed CSV file for 2010-2019 is available at <http://www.stat.uiowa.edu/~luke/data/co-est2019-alldata.zip>. -- State and county level population counts can be visualized several ways, e.g.: -- * proportional symbol maps; -- * choropleth maps (a common choice, but not a good one!). -- For a proportional symbol map we need to pick locations for the symbols. --- First step: read the data. .hide-code[ ```r readPEP <- function(fname) { read.csv(fname) %>% filter(COUNTY != 0) %>% ## drop state totals mutate(FIPS = 1000 * STATE + COUNTY) %>% select(FIPS, county = CTYNAME, state = STNAME, starts_with("POPESTIMATE")) %>% pivot_longer(starts_with("POPESTIMATE"), names_to = "year", names_prefix = "POPESTIMATE", values_to = "pop") } if (! file.exists("co-est2021-alldata.csv")) download.file("http://www.stat.uiowa.edu/~luke/data/co-est2021-alldata.csv", "co-est2021-alldata.csv") pep2021 <- readPEP("co-est2021-alldata.csv") ``` ] For the 2021 data: .smaller-code[ ```r head(pep2021) ## # A tibble: 6 × 5 ## FIPS county state year pop ## <dbl> <chr> <chr> <chr> <int> ## 1 1001 Autauga County Alabama 2020 58877 ## 2 1001 Autauga County Alabama 2021 59095 ## 3 1003 Baldwin County Alabama 2020 233140 ## 4 1003 Baldwin County Alabama 2021 239294 ## 5 1005 Barbour County Alabama 2020 25180 ## 6 1005 Barbour County Alabama 2021 24964 ``` ] -- Counties and states are identified by name and also by [FIPS code](https://en.wikipedia.org/wiki/FIPS_county_code). -- The final three digits identify the county within a state. -- The leading one or two digits identify the state. -- The FIPS code for Johnson County, IA is 19103. --- layout: false ## Create a Proportional Symbol Map of State Populations We will need: -- * State population counts. -- * Locations for symbols to represent these counts. -- Aggregate the county populations to the state level: -- ```r state_pops <- mutate(pep2021, state = tolower(state)) %>% filter(year == 2021) %>% group_by(state) %>% summarize(pop = sum(pop, na.rm = TRUE)) ``` -- Using `tolower` matches the case in the polygon data: ```r filter(gusa, region == "iowa") %>% head(1) ## long lat group order region subregion ## 1 -91.22634 43.49895 14 3539 iowa <NA> ``` -- An alternative would be to use the state FIPS code and the `state.fips` table. --- layout: true ## Computing Centroids --- Marks representing data values for a region are often placed at the region's _centroid_, or centers of gravity. -- A quick approximation to the centroids of the polygons is to compute the centers of their bounding rectangles. .pull-left[ ```r state_centroids_quick <- rename(gusa, state = region) %>% group_by(state) %>% summarize(x = mean(range(long)), y = mean(range(lat))) p_usa + geom_point(aes(x, y), data = state_centroids_quick, color = "blue") + coord_map() ``` ] -- .pull-right[ <!-- --> ] --- More sophisticated approaches to computing centroids are also available. Using [_simple features_](#simple-features) and the `sf` package: .pull-left.smaller-code[ ```r sf::sf_use_s2(FALSE) state_centroids <- sf::st_as_sf(gusa, coords = c("long", "lat"), crs = 4326) %>% group_by(region) %>% summarize(do_union = FALSE) %>% sf::st_cast("POLYGON") %>% ungroup() %>% sf::st_centroid() %>% sf::as_Spatial() %>% as.data.frame() %>% rename(x = coords.x1, y = coords.x2, state = region) sf::sf_use_s2(TRUE) pp <- p_usa + geom_point(aes(x, y), data = state_centroids, color = "red") pp + coord_map() ``` ] .pull-right[ <!-- --> ] --- Comparing the two approaches: .pull-left[ ```r p_usa + geom_point(aes(x, y), data = state_centroids_quick, color = "blue") + geom_point(aes(x, y), data = state_centroids, color = "red") + coord_map() ``` ] .pull-right[ <!-- --> ] --- A projection specified with `coord_map` will be applied to all layers: .pull-left[ ```r pp + coord_map("bonne", parameters = 41.6) ``` ] .pull-right[ <!-- --> ] --- layout: true ## Symbol Plots of State Population --- Merge in the centroid locations. -- ```r state_pops <- inner_join(state_pops, state_centroids, "state") head(state_pops) ## # A tibble: 6 × 4 ## state pop x y ## <chr> <int> <dbl> <dbl> ## 1 alabama 5039877 -86.8 32.8 ## 2 arizona 7276316 -112. 34.3 ## 3 arkansas 3025891 -92.4 34.9 ## 4 california 39237836 -120. 37.3 ## 5 colorado 5812069 -106. 39.0 ## 6 connecticut 3605597 -72.7 41.6 ``` -- Using `inner_join` drops cases not included in the lower-48 map data. --- A dot plot: .pull-left[ <!-- --> ] .pull-right.smaller-code[ ```r ggplot(state_pops) + geom_point(aes(x = pop, y = reorder(state, pop))) + labs(x = "Population", y = "") + theme(axis.text.y = element_text(size = 10)) + scale_x_continuous(labels = scales::comma) ``` {{content}} ] -- The population distribution is heavily skewed; a color coding would need to account for this. --- A proportional symbol map: .pull-left[ ```r ggplot(gusa) + geom_polygon(aes(long, lat, group = group), fill = NA, color = "grey") + geom_point(aes(x, y, size = pop), data = state_pops) + scale_size_area() + coord_map("bonne", parameters = 41.6) + ggthemes::theme_map() ``` ] -- .pull-right[ <!-- --> ] --- .pull-left.smaller-code[ The thermometer symbol approach suggested by Cleveland and McGill (1984) can be emulated using `geom_rect`: ```r ggplot(gusa) + geom_polygon(aes(long, lat, group = group), fill = NA, color = "grey") + coord_map() + ggthemes::theme_map() + geom_rect(aes(xmin = x - .4, xmax = x + .4, ymin = y - 1, ymax = y + 2 * (pop / max(pop)) - 1), data = state_pops) + geom_rect(aes(xmin = x - .4, xmax = x + .4, ymin = y - 1, ymax = y + 1), data = state_pops, fill = NA, color = "black") ``` ] -- .pull-right[ <!-- --> {{content}} ] -- To work well along the northeast this would need a strategy similar to the one used by `ggrepel` for preventing label overlap. --- layout: true ## Choropleth Maps of State Population --- A choropleth map needs to have the information for coloring all the pieces of a region. -- This can be done by merging using `left_join`: ```r sp <- select(state_pops, region = state, pop) gusa_pop <- left_join(gusa, sp, "region") head(gusa_pop) ## long lat group order region subregion pop ## 1 -87.46201 30.38968 1 1 alabama <NA> 5039877 ## 2 -87.48493 30.37249 1 2 alabama <NA> 5039877 ## 3 -87.52503 30.37249 1 3 alabama <NA> 5039877 ## 4 -87.53076 30.33239 1 4 alabama <NA> 5039877 ## 5 -87.57087 30.32665 1 5 alabama <NA> 5039877 ## 6 -87.58806 30.32665 1 6 alabama <NA> 5039877 ``` --- A first attempt: .pull-left[ ```r ggplot(gusa_pop) + geom_polygon(aes(long, lat, group = group, fill = pop)) + coord_map("bonne", parameters = 41.6) + ggthemes::theme_map() ``` ] .pull-right[ <!-- --> {{content}} ] -- This image is dominated by the fact that most state populations are small. --- Showing population ranks, or percentile values, can help see the variation a bit better. -- .pull-left[ ```r spr <- mutate(sp, rpop = rank(pop)) ``` {{content}} ] -- ```r gusa_rpop <- left_join(gusa, spr, "region") ``` {{content}} -- ```r ggplot(gusa_rpop) + geom_polygon(aes(long, lat, group = group, fill = rpop)) + coord_map("bonne", parameters = 41.6) + ggthemes::theme_map() ``` -- .pull-right[ <!-- --> ] --- Using quintile bins instead of a continuous scale: -- .pull-left.smaller-code[ ```r spr <- mutate(spr, pcls = cut_width(100 * percent_rank(pop), width = 20, center = 10)) ``` {{content}} ] -- ```r gusa_rpop <- left_join(gusa, spr, "region") ``` {{content}} -- ```r ggplot(gusa_rpop) + geom_polygon(aes(long, lat, group = group, fill = pcls), color = "grey") + coord_map("bonne", parameters = 41.6) + ggthemes::theme_map() + scale_fill_brewer(palette = "Reds", name = "Percentile") ``` -- .pull-right[ <!-- --> ] --- layout: true ## Choropleth Maps of County Population --- For a county-level `ggplot` map, first get the polygon data frame: -- .pull-left[ ```r gcounty <- map_data("county") ``` {{content}} ] -- ```r ggplot(gcounty) + geom_polygon(aes(long, lat, group = group), fill = NA, color = "black", size = 0.05) + coord_map("bonne", parameters = 41.6) ``` -- .pull-right[ <!-- --> ] --- We will again need to merge population data with the polygon data. -- Using county name is challenging because of mismatches like this: -- ```r filter(gcounty, grepl("brien", subregion)) %>% head(1) ## long lat group order region subregion ## 1 -95.86156 43.26404 825 26075 iowa obrien filter(pep2021, FIPS == 19141, year == 2021) %>% as.data.frame() ## FIPS county state year pop ## 1 19141 O'Brien County Iowa 2021 14015 ``` -- A better option is to match on [FIPS](https://en.wikipedia.org/wiki/FIPS_county_code) county code. -- The `county.fips` data frame in the `maps` package links the FIPS code to region names used by the map data in the `maps` package. ```r head(maps::county.fips, 4) ## fips polyname ## 1 1001 alabama,autauga ## 2 1003 alabama,baldwin ## 3 1005 alabama,barbour ## 4 1007 alabama,bibb ``` --- To attach the FIPS code to the polygons we first need to clean up the `county.fips` table a bit: ```r head(filter(maps::county.fips, grepl(":", polyname)), 3) ## fips polyname ## 1 12091 florida,okaloosa:main ## 2 12091 florida,okaloosa:spit ## 3 22099 louisiana,st martin:north ``` -- Remove the sub-county regions, remove duplicate rows, and split the `polyname` variable into `region` and `subregion`. ```r fipstab <- transmute(maps::county.fips, fips, county = sub(":.*", "", polyname)) %>% unique() %>% separate(county, c("region", "subregion"), sep = ",") head(fipstab, 3) ## fips region subregion ## 1 1001 alabama autauga ## 2 1003 alabama baldwin ## 3 1005 alabama barbour ``` --- Then use `left_join` to merge the FIPS code into the polygon data: ```r gcounty <- left_join(gcounty, fipstab, c("region", "subregion")) head(gcounty) ## long lat group order region subregion fips ## 1 -86.50517 32.34920 1 1 alabama autauga 1001 ## 2 -86.53382 32.35493 1 2 alabama autauga 1001 ## 3 -86.54527 32.36639 1 3 alabama autauga 1001 ## 4 -86.55673 32.37785 1 4 alabama autauga 1001 ## 5 -86.57966 32.38357 1 5 alabama autauga 1001 ## 6 -86.59111 32.37785 1 6 alabama autauga 1001 ``` --- Next, we need to pull together the data for the map, adding ranks and quintile bins. ```r cpop <- filter(pep2021, year == 2021) %>% select(fips = FIPS, pop) %>% mutate(rpop = rank(pop), pcls = cut_width(100 * percent_rank(pop), width = 20, center = 10)) head(cpop) ## # A tibble: 6 × 4 ## fips pop rpop pcls ## <dbl> <int> <dbl> <fct> ## 1 1001 59095 2256 (60,80] ## 2 1003 239294 2855 (80,100] ## 3 1005 24964 1529 (40,60] ## 4 1007 22477 1446 (40,60] ## 5 1009 59041 2255 (60,80] ## 6 1011 10320 755 (20,40] ``` -- Now left join with the county map data: ```r gcounty_pop <- left_join(gcounty, cpop, "fips") ``` --- County level population map using the default continuous color scale: -- .pull-left[ ```r ggplot(gcounty_pop) + geom_polygon(aes(long, lat, group = group, fill = rpop), color = "grey", size = 0.1) + coord_map("bonne", parameters = 41.6) + ggthemes::theme_map() ``` ] -- .pull-right[ <!-- --> ] --- Adding state boundaries might help: .pull-left[ ```r ggplot(gcounty_pop) + geom_polygon(aes(long, lat, group = group, fill = rpop), color = "grey", size = 0.1) + geom_polygon(aes(long, lat, group = group), fill = NA, data = gusa, color = "lightgrey") + coord_map("bonne", parameters = 41.6) + ggthemes::theme_map() ``` ] .pull-right[ <!-- --> ] --- A discrete scale with a very different color to highlight the counties with missing information: .pull-left[ ```r ggplot(gcounty_pop) + geom_polygon(aes(long, lat, group = group, * fill = pcls), color = "grey", size = 0.1) + geom_polygon(aes(long, lat, group = group), fill = NA, data = gusa, color = "lightgrey") + coord_map("bonne", parameters = 41.6) + ggthemes::theme_map() + * scale_fill_brewer(palette = "Reds", * na.value = "blue", * name = "Percentile") ``` ] .pull-right[ <!-- --> ] --- Why is there a missing value in South Dakota? -- Check whether `fipstab` provides a FIPS code for every county in the map data: ```r gcounty_regions <- select(gcounty, region, subregion) %>% unique() anti_join(gcounty_regions, fipstab, c("region", "subregion")) ## region subregion ## 1 south dakota oglala dakota ``` -- There is also a `fipstab` entry without map data: ```r anti_join(fipstab, gcounty_regions, c("region", "subregion")) ## fips region subregion ## 1 46113 south dakota shannon ``` -- Shannon County SD was renamed [Oglala Lakota County](https://en.wikipedia.org/wiki/Oglala_Lakota_County,_South_Dakota). -- It was also given a [new FIPS code](https://www.census.gov/programs-surveys/geography/technical-documentation/county-changes.html). -- The map data tries to reflect this but gets the spelling wrong. --- The `pep2021` entry shows the correct FIPS code: ```r filter(pep2021, year == 2021, state == "South Dakota", grepl("Oglala", county)) ## # A tibble: 1 × 5 ## FIPS county state year pop ## <dbl> <chr> <chr> <chr> <int> ## 1 46102 Oglala Lakota County South Dakota 2021 13586 ``` -- Add an entry with the FIPS code and subregion name matching the map data: ```r fipstab <- rbind(fipstab, data.frame(fips = 46102, region = "south dakota", subregion = "oglala dakota")) ``` --- .pull-left[ Fix up the data: ```r gcounty <- map_data("county") gcounty <- left_join(gcounty, fipstab, c("region", "subregion")) gcounty_pop <- left_join(gcounty, cpop, "fips") ``` ] -- .pull-right[ Redraw the plot: <!-- --> ] --- layout: true ## A County Population Symbol Map --- A symbol map can also be used to show the county populations. -- We again need centroids; approximate county centroids should do. -- ```r county_centroids <- group_by(gcounty, fips) %>% summarize(x = mean(range(long)), y = mean(range(lat))) ``` -- Merge the population values into the centroids data: -- ```r county_pops <- select(cpop, pop, fips) county_pops <- inner_join(county_pops, county_centroids, "fips") ``` --- A proportional symbol map of county populations: -- .pull-left[ ```r ggplot(gcounty) + geom_polygon(aes(long, lat, group = group), fill = NA, col = "grey", data = gusa) + geom_point(aes(x, y, size = pop), data = county_pops) + scale_size_area() + coord_map("bonne", parameters = 41.6) + ggthemes::theme_map() ``` ] -- .pull-right[ <!-- --> Jittering might be helpful to remove the very regular grid patters in the center. ] --- layout: true ## Comparing Populations in 2010 and 2021 --- One possible way to compare spatial data for several time periods is to use a faceted display with one map for each period. -- With many periods using animation is also an option. -- This can work well when there is a substantial amount of change; an example is available [here](https://projects.fivethirtyeight.com/mortality-rates-united-states/cardiovascular2/#1980). -- It may not work as well when the changes are more subtle. --- To bin the data by quintiles we can use the 2021 values. ```r ncls <- 6 cls <- quantile(filter(pep2021, year == 2021)$pop, seq(0, 1, len = ncls)) ``` -- To merge the binned population data into the polygon data we can create a data frame with binned population variables for the two time frames. -- This will allow a cleaner merge in terms of handling counties with missing data. --- After selecting the years and the variables we need, we add the binned population and then pivot to a wider form with one row per county: ```r cpop <- bind_rows(filter(pep2019, year == 2010), filter(pep2021, year == 2021)) %>% transmute(fips = FIPS, year, pcls = cut(pop, cls, include.lowest = TRUE)) %>% pivot_wider(values_from = c("pcls"), names_from = "year", names_prefix = "pcls") head(cpop) ## # A tibble: 6 × 3 ## fips pcls2010 pcls2021 ## <dbl> <fct> <fct> ## 1 1001 (3.68e+04,9.71e+04] (3.68e+04,9.71e+04] ## 2 1003 (9.71e+04,9.83e+06] (9.71e+04,9.83e+06] ## 3 1005 (1.87e+04,3.68e+04] (1.87e+04,3.68e+04] ## 4 1007 (1.87e+04,3.68e+04] (1.87e+04,3.68e+04] ## 5 1009 (3.68e+04,9.71e+04] (3.68e+04,9.71e+04] ## 6 1011 (8.7e+03,1.87e+04] (8.7e+03,1.87e+04] ``` --- This data can now be merged with the polygon data. ```r gcounty_pop <- left_join(gcounty, cpop, "fips") ``` -- To create a faceted display with `ggplot` we need the data in _tidy_ form. -- ```r gcounty_pop_long <- pivot_longer(gcounty_pop, starts_with("pcls"), names_to = "year", names_prefix = "pcls", values_to = "pcls") %>% mutate(year = as.integer(year)) ``` -- An alternative would be to create two layers and show them in separate facets. --- .hide-code[ ```r ggplot(gcounty_pop_long) + geom_polygon(aes(long, lat, group = group, fill = pcls), color = "grey", size = 0.1) + geom_polygon(aes(long, lat, group = group), fill = NA, data = gusa, color = "lightgrey") + coord_map("bonne", parameters = 41.6) + ggthemes::theme_map() + scale_fill_brewer(palette = "Reds", na.value = "blue") + facet_wrap(~ year, ncol = 2) + theme(legend.position = "right") ``` <!-- --> ] -- Since the changes are quite subtle a side-by-side display does not work very well. --- A more effective approach in this case is to plot the relative changes. -- This is also a (somewhat) more appropriate use of a choropleth map. -- Start by adding percent changes to the data. ```r cpop <- bind_rows(filter(pep2019, year == 2010), filter(pep2021, year == 2021)) %>% select(fips = FIPS, pop, year) %>% pivot_wider(values_from = "pop", names_from = "year", names_prefix = "pop") %>% mutate(pchange = 100 * (pop2021 - pop2010) / pop2010) gcounty_pop <- left_join(gcounty, cpop, "fips") ``` -- A binned version of the percent changes will also be useful. ```r bins <- c(-Inf, -20, -10, -5, 5, 10, 20, Inf) cpop <- mutate(cpop, cpchange = cut(pchange, bins, ordered_result = TRUE)) gcounty_pop <- left_join(gcounty, cpop, "fips") ``` --- Using a continuous scale with the default palette: .hide-code[ ```r pcounty <- ggplot(gcounty_pop) + coord_map("bonne", parameters = 41.6) + ggthemes::theme_map() state_layer <- geom_polygon(aes(long, lat, group = group), fill = NA, data = gusa, color = "lightgrey") county_cont <- geom_polygon(aes(long, lat, group = group, fill = pchange), color = "grey", size = 0.1) pcounty + county_cont + state_layer + theme(legend.position = "right") ``` <!-- --> ] --- Using a simple diverging palette: .hide-code[ ```r pcounty + county_cont + state_layer + scale_fill_gradient2() + theme(legend.position = "right") ``` <!-- --> ] --- For the binned version the default ordered discrete color palette produces: .hide-code[ ```r county_disc <- geom_polygon(aes(long, lat, group = group, fill = cpchange), color = "grey", size = 0.1) pcounty + county_disc + state_layer + theme(legend.position = "right") ``` <!-- --> ] --- A diverging palette allows increases and decreases to be seen. -- Using the `PRGn` palette from `ColorBrewer`: .hide-code[ ```r (p_dvrg_48 <- pcounty + county_disc + state_layer + scale_fill_brewer(palette = "PRGn", na.value = "red", guide = guide_legend(reverse = TRUE))) + theme(legend.position = "right") ``` <!-- --> ] --- layout: true ## Focusing on Iowa --- The default continuous palette produces .hide-code[ ```r piowa <- ggplot(filter(gcounty_pop, region == "iowa")) + coord_map() + ggthemes::theme_map() pc_cont_iowa <- geom_polygon(aes(long, lat, group = group, fill = pchange), color = "grey", size = 0.2) piowa + pc_cont_iowa + theme(legend.position = "right") ``` ] .pull-left[ <!-- --> ] -- .pull-right[ Aside: Iowa has 99 counties, which seems a peculiar number. {{content}} ] -- It used to have 100: {{content}} -- * [Kossuth county](https://en.wikipedia.org/wiki/Kossuth_County,_Iowa) {{content}} -- * [Bancroft county](https://en.wikipedia.org/wiki/Bancroft_County,_Iowa) --- With the default ordered discrete palette: .hide-code[ ```r pc_disc_iowa <- geom_polygon(aes(long, lat, group = group, fill = cpchange), color = "grey", size = 0.2) piowa + pc_disc_iowa + theme(legend.position = "right") ``` <!-- --> ] --- Using the `PRGn` palette, for example, will by default adjust the palette to the levels present in the data, and Iowa does not have counties in all the bins: -- .hide-code[ ```r piowa + pc_disc_iowa + scale_fill_brewer(palette = "PRGn", guide = guide_legend(reverse = TRUE)) + theme(legend.position = "right") ``` <!-- --> ] --- Two issues: -- * The color coding does not match the coding for the national map. -- * The neutral color is not positioned at the zero level. --- Adding `drop = FALSE` to the palette specification preserves the encoding from the full map: .hide-code[ ```r (p_dvrg_iowa <- piowa + pc_disc_iowa + scale_fill_brewer(palette = "PRGn", drop = FALSE, guide = guide_legend(reverse = TRUE))) + theme(legend.position = "right") ``` <!-- --> ] --- Matching the national map would be particularly important for showing the two together: .hide-code[ ```r library(patchwork) (p_dvrg_48 + guides(fill = "none")) + p_dvrg_iowa + plot_layout(guides = "collect", widths = c(3, 1)) ``` <!-- --> ] --- layout: true ## Some Notes on Choropleth Maps --- Choropleth maps are very popular, in part because of their visual appeal. -- But they do have to be used with care. -- Choropleth maps are good for showing rates or counts per unit of area, such as -- * amount of rainfall per square meter; -- * percentage of a region covered by forest. -- They also work well for measurements that may vary little across a region, as well as for some demographic summaries, such as -- * average winter temperature in a state or county; -- * average level of summer humidity in a state or county. -- * average age of population in a state or county. --- Choropleth maps should generally _not_ be used for counts, such as number of cases of a disease, since the result is usually just a map of population density. -- Proportional symbol maps are usually a better choice for showing counts, such as population sizes, in a geographic contexts -- Choropleth maps can be used to show proportional counts, or counts per area population, but need to be used with care. -- * Changes in counts by one or two can cause large changes in proportions when population sizes are small. -- * County level choropleth maps of cancer rates, for example, can change a lot with one additional case in a small county. -- * When proportions are shown but total counts are what really matters, there is a tendency for viewers to let the area supply the denominator. --- An example is provided by a choropleth map that has been used to show county level results for the 2016 presidential election. <!-- https://www.core77.com/posts/90771/A-Great-Example-of-Better-Data-Visualization-This-Voting-Map-GIF https://stemlounge.com/muddy-america-color-balancing-trumps-election-map-infographic/ --> -- A map showing which of the two major parties received a plurality in each county: <!-- original data source: https://raw.githubusercontent.com/tonmcg/US_County_Level_Election_Results_08-16/master/US_County_Level_Presidential_Results_08-16.csv --> .hide-code[ ```r if (! file.exists("US_County_Level_Presidential_Results_08-16.csv")) { download.file("http://www.stat.uiowa.edu/~luke/data/US_County_Level_Presidential_Results_08-16.csv", ## nolint "US_County_Level_Presidential_Results_08-16.csv") } pres <- read.csv("US_County_Level_Presidential_Results_08-16.csv") %>% mutate(fips = fips_code, p = dem_2016 / (dem_2016 + gop_2016), win = ifelse(p > 0.5, "DEM", "GOP")) gcounty_pres <- left_join(gcounty, pres, "fips") ggplot(gcounty_pres, aes(long, lat, group = group, fill = win)) + geom_polygon(col = "black", size = 0.05) + geom_polygon(aes(long, lat, group = group), fill = NA, color = "grey30", size = 0.5, data = gusa) + coord_map("bonne", parameters = 41.6) + ggthemes::theme_map() + scale_fill_manual(values = c(DEM = "blue", GOP = "red")) ``` ] .pull-left[ <!-- --> ] -- .pull-right[ There are many internet posts about a map like this, for example [here](https://www.core77.com/posts/90771/) and [here](https://stemlounge.com/muddy-america-color-balancing-trumps-election-map-infographic/). {{content}} ] -- To a casual viewer this gives the impression of an overwhelming GOP win. {{content}} -- But many of the red counties have very small populations. --- A proportional symbol map more accurately reflects the results, which were very close with a majority of the popular vote going to the Democrat's candidate: .hide-code[ ```r county_pres <- left_join(county_pops, pres, "fips") ggplot(gcounty) + geom_polygon(aes(long, lat, group = group), fill = NA, col = "grey", data = gusa) + geom_point(aes(x, y, size = pop, color = win), data = county_pres) + scale_size_area(labels = scales::label_number(scale = 1e-6, suffix = " M", trim = FALSE)) + ## scale_color_manual(values = c(DEM = "blue", GOP = "red")) + colorspace::scale_color_discrete_diverging(palette = "Blue-Red 2") + coord_map("bonne", parameters = 41.6) + ggthemes::theme_map() ## Warning: Removed 1 rows containing missing values (geom_point). ``` <!-- --> ] --- layout: true ## Isopleth Map of Population Density --- While populations do tend to cluster in cities, they are not entirely clustered at county centroids. -- Other quantities, such as temperatures, may be measured at discrete points but vary smoothly across a region. -- Isopleth maps show the contours of a smooth surface. -- For populations, it is sometimes useful to estimate and visualize a population density surface. -- To estimate the population density we can use the county centroids weighted by the county population values. -- The function `kde2d` from the `MASS` package can be used to compute kernel density estimates, but does not support the use of weights. -- A simple modification that does support weights is available [here](http://homepage.divms.uiowa.edu/~luke/classes/STAT4580/kde2d.R). --- By default, `kde2d` estimates the density on a grid over the range of the two variables. ```r source(here::here("kde2d.R")) ds <- with(county_pops, kde2d(x, y, weights = pop)) str(ds) ## List of 3 ## $ x: num [1:25] -124 -122 -119 -117 -115 ... ## $ y: num [1:25] 25.5 26.4 27.4 28.4 29.4 ... ## $ z: num [1:25, 1:25] 1.25e-12 2.46e-11 4.12e-10 5.76e-09 3.76e-08 ... ``` -- This result can be converted to a data frame using `broom::tidy`. ```r dsdf <- broom::tidy(ds) %>% rename(Lon = x, Lat = y, dens = z) %>% mutate(Lat = as.numeric(Lat)) ## ???? ``` --- .pull-left.width-35[ A plot of the contours: ] .pull-right.width-60[ .hide-code[ ```r ggplot(gusa) + geom_polygon(aes(long, lat, group = group), fill = NA, color = "grey") + geom_contour(aes(Lon, Lat, z = dens), data = dsdf) + coord_map() ``` <!-- --> ] ] --- .pull-left.width-35[ To produce contours that work better when filled, it is useful to increase the number of grid points and enlarge the range. ] .pull-right.width-60[ .hide-code[ ```r ds <- with(county_pops, kde2d(x, y, weights = pop, n = 50, lims = c(-130, -60, 20, 50))) dsdf <- broom::tidy(ds) %>% rename(Lon = x, Lat = y, dens = z) %>% mutate(Lat = as.numeric(Lat)) ## ???? pusa <- ggplot(gusa) + geom_polygon(aes(long, lat, group = group), fill = NA, color = "grey") + coord_map() pusa + geom_contour(aes(Lon, Lat, z = dens), data = dsdf) ``` <!-- --> ] ] --- .pull-left.width-35[ A filled contour version can be created using `stat_contour` and `fill = after_stat(level)`. ] .pull-right.width-60[ .hide-code[ ```r pusa + stat_contour(aes(Lon, Lat, z = dens, fill = after_stat(level)), data = dsdf, geom = "polygon", alpha = 0.2) ``` <!-- --> ] ] --- .pull-left[ To focus on Iowa we need subsets of the populations data and the map data: ```r iowa_pops <- filter(county_pops, fips %/% 1000 == 19) giowa <- filter(gcounty, fips %/% 1000 == 19) ``` {{content}} ] -- A proportional symbols plot for the Iowa county populations: -- .pull-right[ .hide-code[ ```r iowa_base <- ggplot(giowa) + geom_polygon(aes(long, lat, group = group), fill = NA, col = "grey") + coord_map() iowa_base + geom_point(aes(x, y, size = pop), data = iowa_pops) + scale_size_area() + ggthemes::theme_map() ``` <!-- --> ] ] --- For a density plot it is useful to expand the populations used to include nearby large cities, such as Omaha. ```r iowa_pops <- filter(county_pops, x > -97 & x < -90 & y > 40 & y < 44) ``` -- Density estimates can then be computed for the expanded Iowa data. ```r dsi <- with(iowa_pops, kde2d(x, y, weights = pop, n = 50, lims = c(-98, -89, 40, 44.5))) dsidf <- broom::tidy(dsi) %>% rename(Lon = x, Lat = y, dens = z) %>% mutate(Lat = as.numeric(Lat)) ## ??? ``` --- .pull-left.width-35[ A simple contour map: ] .pull-right.width-60[ .hide-code[ ```r ggplot(giowa) + geom_polygon(aes(long, lat, group = group), fill = NA, color = "grey") + geom_contour(aes(Lon, Lat, z = dens), color = "blue", na.rm = TRUE, data = dsidf) + coord_map() ``` <!-- --> ] ] --- .pull-left.width-35[ A filled contour map ] .pull-right.width-60[ .hide-code[ ```r pdiowa <- ggplot(giowa) + geom_polygon(aes(long, lat, group = group), fill = NA, color = "grey") + stat_contour(aes(Lon, Lat, z = dens, fill = after_stat(level)), color = "black", alpha = 0.2, na.rm = TRUE, data = dsidf, geom = "polygon") pdiowa + coord_map() ``` <!-- --> ] ] --- .pull-left.width-35[ Using `xlim` and `ylim` arguments to `coord_map` to trim the range: ] .pull-right.width-60[ .hide-code[ ```r pdiowa + coord_map(xlim = c(-96.7, -90), ylim = c(40.3, 43.6)) ``` <!-- --> ] ] --- layout: true ## Simple Features --- Using `geom_polygon` and `geom_point` works well for many simple applications. -- For more complex applications it is best to use special spatial data structures. -- An approach becoming widely adopted in many open-source frameworks for working with spatial data is called _simple features_. -- The package `sf` provides a range of tools for operating with simple feature objects, such as: -- * computing centroids; -- * combining and intersecting objects; -- * moving and expanding or shrinking objects. --- Some useful vignettes in the `sf` package: * [Simple Features for R](https://cran.r-project.org/package=sf/vignettes/sf1.html) * [Reading, Writing and Converting Simple Features](https://cran.r-project.org/package=sf/vignettes/sf2.html) * [Manipulating Simple Feature Geometries](https://cran.r-project.org/package=sf/vignettes/sf3.html) -- Other references: * Robin Lovelace, Jakub Nowosad, Jannes Muenchow (2019). _Geocomputation with R_, Chapman and Hall/CRC. Also available [on line](https://geocompr.robinlovelace.net/). * Mel Moreno and Mathieu Basille (2018). Drawing beautiful maps programmatically with R, sf and ggplot2, [Part 1: Basics](https://r-spatial.org/r/2018/10/25/ggplot2-sf.html), [Part 2: Layers](https://r-spatial.org/r/2018/10/25/ggplot2-sf-2.html), [Part 3: Layouts](https://r-spatial.org/r/2018/10/25/ggplot2-sf-3.html). * Claudia A Engel (2019). [Using Spatial Data with R](https://cengel.github.io/R-spatial/). --- A simple utility function for obtaining map data as sf objects: ```r map_data_sf <- function(map, region = ".", exact = FALSE, crs = 4326, ...) { maps::map(map, region, exact = exact, plot = FALSE, fill = TRUE, ...) %>% sf::st_as_sf(crs = crs) %>% separate(ID, c("region", "subregion"), sep = ",") } ``` -- Iowa county data: ```r map_data_sf("county", "iowa") %>% head(4) ## Simple feature collection with 4 features and 2 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: -94.93338 ymin: 40.60552 xmax: -91.06018 ymax: 43.51041 ## Geodetic CRS: WGS 84 ## region subregion geom ## 1 iowa adair MULTIPOLYGON (((-94.24583 4... ## 2 iowa adams MULTIPOLYGON (((-94.70992 4... ## 3 iowa allamakee MULTIPOLYGON (((-91.22634 4... ## 4 iowa appanoose MULTIPOLYGON (((-92.63009 4... ``` --- A basic Iowa county map using `sf`: .pull-left[ ```r giowa_sf <- map_data_sf("county", "iowa") giowa_sf <- left_join(giowa_sf, fipstab, c("region", "subregion")) ggplot(giowa_sf) + geom_sf() + coord_sf() ``` ] .pull-right[ <!-- --> ] --- Adding wind turbine locations: .pull-left[ ```r wt_IA_sf <- sf::st_as_sf(wt_IA, coords = c("xlong", "ylat"), crs = 4326) ggplot(giowa_sf) + geom_sf(fill = "grey95") + geom_sf(aes(color = p_year), data = wt_IA_sf) + scale_color_viridis_c() + coord_sf() ``` ] .pull-right[ <!-- --> ] --- With a Stamen map background: .pull-left[ ```r library(ggmap) m <- get_stamenmap(c(-97.2, 40.4, -89.9, 43.6), zoom = 8, maptype = "toner") ggmap(m) + geom_sf(data = giowa_sf, fill = NA, inherit.aes = FALSE) + geom_sf(aes(color = p_year), data = wt_IA_sf, inherit.aes = FALSE) + scale_color_viridis_c() + coord_sf() + labs(x = NULL, y = NULL) ``` ] .pull-right[ <!-- --> ] --- A choropleth map of population changes: .pull-left[ ```r giowa_pop_sf <- left_join(giowa_sf, cpop, "fips") ggplot(giowa_pop_sf) + geom_sf(aes(fill = cpchange)) + ggthemes::theme_map() + scale_fill_brewer( palette = "PRGn", drop = FALSE, guide = guide_legend(reverse = TRUE)) ``` ] .pull-right[ <!-- --> ] --- Proportional symbol map for 2021 Iowa county population estimates: .pull-left[ ```r sf::sf_use_s2(FALSE) iowa_pop_ctrds <- mutate(giowa_pop_sf, geom = sf::st_centroid(geom)) sf::sf_use_s2(TRUE) ggplot(giowa_sf) + geom_sf() + geom_sf(aes(size = pop2021), data = iowa_pop_ctrds, show.legend = "point") + scale_size_area() ``` ] .pull-right[ <!-- --> ] --- A USA county map with a Bonne projection centered on Iowa City: .pull-left.smaller-code[ ```r gusa_sf <- map_data_sf("county") gusa_sf <- left_join(gusa_sf, fipstab, c("region", "subregion")) gusa_pop_sf <- left_join(gusa_sf, cpop, "fips") ggplot(gusa_pop_sf) + geom_sf(aes(fill = cpchange), size = 0.1) + scale_fill_brewer( palette = "PRGn", na.value = "red", guide = guide_legend(reverse = TRUE)) + coord_sf(crs = "+proj=bonne +lat_1=41.6 +lon_0=-95") ``` ] .pull-right[ <!-- --> ] --- As an example of computation with `sf` objects we can revisit the population contour plot. -- First create an `sf` representation of the contours (there may be better ways to do this): <!-- based on https://stackoverflow.com/questions/49526198/convert-a-fortified-data-frame-back-to-sf-object --> ```r ggpoly2sf <- function(poly, coords = c("long", "lat"), id = "group", region = "region", crs = 4326) { sf::st_as_sf(poly, coords = coords, crs = crs) %>% group_by(!! as.name(id), !! as.name(region)) %>% summarize(do_union = FALSE) %>% sf::st_cast("POLYGON") %>% ungroup() %>% group_by(!! as.name(region)) %>% summarize(do_union = FALSE) %>% ungroup() } v <- contourLines(dsi) dsi_sf <- lapply(seq_along(v), function(i) { x <- v[[i]]; x$group <- i; as.data.frame(x) }) %>% bind_rows() %>% ggpoly2sf(coords = c("x", "y"), region = "level") ``` --- This reproduces our previous plot: .pull-left[ ```r ggplot(giowa_sf) + geom_sf() + geom_sf(aes(fill = level), data = dsi_sf, alpha = 0.2) ``` ] .pull-right[ <!-- --> ] --- With an outline of Iowa from the `maps` package we can use `sf::st_intersection` function to show only the part of the contours that is within Iowa. -- .pull-left[ ```r iowa_sf <- map_data_sf("state", "iowa") ``` {{content}} ] -- ```r dsi_i_sf <- sf::st_intersection(iowa_sf, dsi_sf) ``` {{content}} -- ```r ggplot(giowa_sf) + geom_sf() + geom_sf(aes(fill = level), data = dsi_i_sf, alpha = 0.2) ``` -- .pull-right[ <!-- --> ] --- layout: false ## Shape Files The [ESRI](https://www.esri.com/) [_shapefile_ format](https://en.wikipedia.org/wiki/Shapefile) is a widely used vector data format for geographic information systems. -- * Many GIS software systems support reading shapefiles. -- * The `maptools` and `rgdal` packages can be used to read shapefiles in R. -- * `sf` uses these utilities under the hood. -- Shapefiles are available from many sources; for example: -- * The [TIGER](https://www.census.gov/programs-surveys/geography.html) site at the Census Bureau. The [`tigris`](https://cran.r-project.org/package=tigris) package provides an R interface. -- * The [Natural Earth Project](https://www.naturalearthdata.com/). The [rnaturalearth](https://cran.r-project.org/package=rnaturalearth) package provides an R interface. -- Other formats and data bases are also available. Some references: -- * Paul Murrell's _R Graphics, 2nd Ed._, Chapter 14. -- * The [CRAN Task View](https://cran.r-project.org/web/views/) for [Analysis of Spatial Data](https://cran.r-project.org/view=Spatial) -- * [Introduction to visualising spatial data in R](https://cran.r-project.org/doc/contrib/intro-spatial-rl.pdf). -- * [Geocomputation with R](https://geocompr.robinlovelace.net/). --- layout: true ## Shape Files: Census Tracts --- Census Tracts are small, relatively permanent subdivisions of a county. -- Census tracts generally have population size between 1,200 and 8,000 people. -- Tracts for Iowa (state FIPS 19) for 2010: .pull-left[ <!-- include = FALSE to hide printing --> ```r tract_sf <- tigris::tracts(19, year = 2010) ``` {{content}} ] -- ```r (p <- ggplot(tract_sf) + geom_sf()) ``` -- .pull-right[ <!-- --> ] --- .pull-left[ Johnson County: ```r p %+% filter(tract_sf, COUNTYFP == "103") ``` <!-- --> ] -- .pull-right[ Polk County: ```r p %+% filter(tract_sf, COUNTYFP == "153") ``` <!-- --> ] --- layout: true ## Proportion of Population Without Health Insurance --- A lot of data is published at the census tract level. -- One example is the proportion of the population that is uninsured available from the [American Community Survey](https://www.census.gov/programs-surveys/acs). <!-- This example is adapted from two blog posts by Kevin Johnson (unfortunately no longer available). ([part 1](http://www.kevjohnson.org/making-maps-in-r/), [part 2](http://www.kevjohnson.org/making-maps-in-r-part-2/)).--> -- Data from 2019 is available locally. -- .pull-left[ Download the data. .hide-code[ ```r dname <- "ACS_2019_5YR_S2701" if (! file.exists(dname)) { data_url <- "http://homepage.divms.uiowa.edu/~luke/data" aname <- paste0(dname, ".tgz") download.file(paste(data_url, aname, sep = "/"), aname) untar(aname) file.remove(aname) } ``` ] Read the data. .hide-code[ ```r acs_file <- file.path(dname, "ACSST5Y2019.S2701_data_with_overlays_2021-05-05T153143.csv") acs_data_sf <- read.csv(acs_file, na.strings = "-")[-1, ] %>% select(GEO_ID, name = NAME, population = S2701_C01_001E, uninsured = S2701_C05_001E) %>% mutate(across(3:4, as.numeric)) %>% mutate(GEO_ID = sub("1400000US", "", GEO_ID)) ``` ] Merge and plot. ] -- .pull-right[ .hide-code[ ```r plot_data_sf <- left_join(tract_sf, acs_data_sf, c("GEOID10" = "GEO_ID")) p <- ggplot(plot_data_sf) + geom_sf(aes(fill = uninsured)) + scale_fill_distiller(palette = "Reds", direction = 1) p ``` ] <!-- --> ] --- .pull-left[ Johnson county: .hide-code[ ```r p %+% filter(plot_data_sf, COUNTYFP == "103") ``` <!-- --> ] ] -- .pull-right[ Polk county: .hide-code[ ```r p %+% filter(plot_data_sf, COUNTYFP == "153") ``` <!-- --> ] ] --- For the state-wide map it may help to drop the census tract borders and add county borders: .pull-left[ .hide-code[ ```r giowa_sf <- sf::st_as_sf(maps::map("county", "iowa", plot = FALSE, fill = TRUE), crs = 4326) ggplot(plot_data_sf) + geom_sf(aes(fill = uninsured), color = NA) + geom_sf(fill = NA, color = "darkgrey", data = giowa_sf, size = 0.2) + scale_fill_distiller(palette = "Reds", direction = 1) + coord_sf() ``` ] ] .pull-right[ <!-- --> ] --- layout: false ## Leaflet Maps The [`leaflet` package](https://rstudio.github.io/leaflet/) can be used to make interactive versions of these maps using the [Leaflet](https://leafletjs.com/) a JavaScript library for interactive maps: .hide-code[ ```r library(leaflet) lf_sf <- leaflet(sf::st_transform(plot_data_sf, 4326), width = "70%") addPolygons(lf_sf, fillColor = ~ colorNumeric("Reds", uninsured)(uninsured), weight = 0.5, color = "black", fillOpacity = 0.9, opacity = 1.0, highlightOptions = highlightOptions(color = "white", weight = 2, bringToFront = TRUE), label = ~ lapply(paste0(name, "<br/>", "pct: ", uninsured, "<br/>", "pop: ", population), htmltools::HTML)) ```