Data Types

Map Types

Drawing Maps

At the simplest level drawing a map involves;

But there are a number of complications:

County Population Data

The census bureau provides estimates of populations of US counties.

if (! file.exists("PEP_2018_PEPANNRES.zip")) {
    download.file("http://www.stat.uiowa.edu/~luke/data/PEP_2018_PEPANNRES.zip",
                  "PEP_2018_PEPANNRES.zip")
    unzip("PEP_2018_PEPANNRES.zip")
}

pep2018 <- read.csv("PEP_2018_PEPANNRES_with_ann.csv")
pepvars <- names(pep2018)
pep2018 <- read.csv("PEP_2018_PEPANNRES_with_ann.csv", stringsAsFactors = FALSE,
                    head = FALSE, skip = 2)
names(pep2018) <- pepvars

head(pep2018)
##           GEO.id GEO.id2       GEO.display.label rescen42010 resbase42010
## 1 0500000US01001    1001 Autauga County, Alabama       54571        54574
## 2 0500000US01003    1003 Baldwin County, Alabama      182265       182264
## 3 0500000US01005    1005 Barbour County, Alabama       27457        27457
## 4 0500000US01007    1007    Bibb County, Alabama       22915        22920
## 5 0500000US01009    1009  Blount County, Alabama       57322        57321
## 6 0500000US01011    1011 Bullock County, Alabama       10914        10911
##   respop72010 respop72011 respop72012 respop72013 respop72014 respop72015
## 1       54754       55208       54936       54713       54876       54838
## 2      183111      186540      190143      194886      199189      202995
## 3       27330       27350       27174       26944       26758       26294
## 4       22872       22747       22664       22516       22541       22562
## 5       57373       57554       57570       57611       57521       57522
## 6       10878       10677       10607       10551       10665       10400
##   respop72016 respop72017 respop72018
## 1       55242       55443       55601
## 2      207712      212619      218022
## 3       25819       25158       24881
## 4       22576       22555       22400
## 5       57517       57827       57840
## 6       10381       10176       10138
tail(pep2018)
##              GEO.id GEO.id2                GEO.display.label rescen42010
## 3215 0500000US72143   72143 Vega Alta Municipio, Puerto Rico       39951
## 3216 0500000US72145   72145 Vega Baja Municipio, Puerto Rico       59662
## 3217 0500000US72147   72147   Vieques Municipio, Puerto Rico        9301
## 3218 0500000US72149   72149  Villalba Municipio, Puerto Rico       26073
## 3219 0500000US72151   72151   Yabucoa Municipio, Puerto Rico       37941
## 3220 0500000US72153   72153     Yauco Municipio, Puerto Rico       42043
##      resbase42010 respop72010 respop72011 respop72012 respop72013
## 3215        39951       39946       39811       39625       39500
## 3216        59662       59548       58668       57839       57039
## 3217         9301        9305        9236        9215        9166
## 3218        26073       26003       25547       25137       24747
## 3219        37941       37881       37367       36843       36374
## 3220        41947       41838       41047       40260       39508
##      respop72014 respop72015 respop72016 respop72017 respop72018
## 3215       39073       38661       38011       37068       35806
## 3216       55922       54787       53659       52303       50185
## 3217        9069        8952        8819        8650        8364
## 3218       24263       23667       23099       22458       21476
## 3219       35728       35062       34328       33468       32158
## 3220       38554       37607       36676       35497       33860

Working with the county names can be tricky:

head(filter(pep2018, grepl(", Iowa", GEO.display.label)))
## Warning in grepl(", Iowa", GEO.display.label): input string 1803 is invalid
## in this locale
## Warning in grepl(", Iowa", GEO.display.label): input string 3148 is invalid
## in this locale
## Warning in grepl(", Iowa", GEO.display.label): input string 3153 is invalid
## in this locale
## Warning in grepl(", Iowa", GEO.display.label): input string 3157 is invalid
## in this locale
## Warning in grepl(", Iowa", GEO.display.label): input string 3159 is invalid
## in this locale
##           GEO.id GEO.id2      GEO.display.label rescen42010 resbase42010
## 1 0500000US19001   19001     Adair County, Iowa        7682         7682
## 2 0500000US19003   19003     Adams County, Iowa        4029         4029
## 3 0500000US19005   19005 Allamakee County, Iowa       14330        14328
## 4 0500000US19007   19007 Appanoose County, Iowa       12887        12887
## 5 0500000US19009   19009   Audubon County, Iowa        6119         6119
## 6 0500000US19011   19011    Benton County, Iowa       26076        26069
##   respop72010 respop72011 respop72012 respop72013 respop72014 respop72015
## 1        7679        7546        7468        7387        7367        7141
## 2        4023        3994        3909        3892        3877        3753
## 3       14379       14219       14147       14064       14058       13858
## 4       12856       12847       12707       12653       12671       12572
## 5        6098        6003        5865        5863        5771        5713
## 6       26046       26075       25825       25696       25622       25601
##   respop72016 respop72017 respop72018
## 1        6998        7053        7063
## 2        3692        3663        3645
## 3       13850       13804       13832
## 4       12505       12363       12437
## 5        5629        5566        5506
## 6       25633       25633       25642
pep2018[1803,]
##              GEO.id GEO.id2              GEO.display.label rescen42010
## 1803 0500000US35013   35013 Do\xf1a Ana County, New Mexico      209233
##      resbase42010 respop72010 respop72011 respop72012 respop72013
## 1803       209202      210097      213127      214507      214394
##      respop72014 respop72015 respop72016 respop72017 respop72018
## 1803      214084      214151      214748      216186      217522
filter(pep2018, GEO.id2 == 19141)
##           GEO.id GEO.id2    GEO.display.label rescen42010 resbase42010
## 1 0500000US19141   19141 O'Brien County, Iowa       14398        14398
##   respop72010 respop72011 respop72012 respop72013 respop72014 respop72015
## 1       14408       14218       14155       14046       14043       13926
##   respop72016 respop72017 respop72018
## 1       13933       13812       13840
filter(pep2018, GEO.id2 == 22001)
##           GEO.id GEO.id2        GEO.display.label rescen42010 resbase42010
## 1 0500000US22001   22001 Acadia Parish, Louisiana       61773        61787
##   respop72010 respop72011 respop72012 respop72013 respop72014 respop72015
## 1       61875       61853       61991       62294       62664       62681
##   respop72016 respop72017 respop72018
## 1       62793       62514       62190

For US counties it is safer to work with the FIPS county code, which is the GEO.id2 variable.

The county.fips data frame in the maps package links the FIPS code to region names used by the map data in the maps package.

library(maps)
head(county.fips)
##   fips        polyname
## 1 1001 alabama,autauga
## 2 1003 alabama,baldwin
## 3 1005 alabama,barbour
## 4 1007    alabama,bibb
## 5 1009  alabama,blount
## 6 1011 alabama,bullock

Basic Map Data

Map data from the map function in package maps consists of the x and y coordinates of polygons and names for the regions.

usa <- map("state", fill = TRUE, plot = FALSE)
plot(usa$x, usa$y, type = "n")
polygon(usa$x, usa$y)

sum(is.na(usa$x))
## [1] 62
length(usa$names)
## [1] 63
usa$names
##  [1] "alabama"                         "arizona"                        
##  [3] "arkansas"                        "california"                     
##  [5] "colorado"                        "connecticut"                    
##  [7] "delaware"                        "district of columbia"           
##  [9] "florida"                         "georgia"                        
## [11] "idaho"                           "illinois"                       
## [13] "indiana"                         "iowa"                           
## [15] "kansas"                          "kentucky"                       
## [17] "louisiana"                       "maine"                          
## [19] "maryland"                        "massachusetts:martha's vineyard"
## [21] "massachusetts:main"              "massachusetts:nantucket"        
## [23] "michigan:north"                  "michigan:south"                 
## [25] "minnesota"                       "mississippi"                    
## [27] "missouri"                        "montana"                        
## [29] "nebraska"                        "nevada"                         
## [31] "new hampshire"                   "new jersey"                     
## [33] "new mexico"                      "new york:manhattan"             
## [35] "new york:main"                   "new york:staten island"         
## [37] "new york:long island"            "north carolina:knotts"          
## [39] "north carolina:main"             "north carolina:spit"            
## [41] "north dakota"                    "ohio"                           
## [43] "oklahoma"                        "oregon"                         
## [45] "pennsylvania"                    "rhode island"                   
## [47] "south carolina"                  "south dakota"                   
## [49] "tennessee"                       "texas"                          
## [51] "utah"                            "vermont"                        
## [53] "virginia:chesapeake"             "virginia:chincoteague"          
## [55] "virginia:main"                   "washington:san juan island"     
## [57] "washington:lopez island"         "washington:orcas island"        
## [59] "washington:whidbey island"       "washington:main"                
## [61] "west virginia"                   "wisconsin"                      
## [63] "wyoming"

The ggplot2 function map_data reorganizes this data into a data frame.

library(ggplot2)
gusa <- map_data("state")
head(gusa)
##        long      lat group order  region subregion
## 1 -87.46201 30.38968     1     1 alabama      <NA>
## 2 -87.48493 30.37249     1     2 alabama      <NA>
## 3 -87.52503 30.37249     1     3 alabama      <NA>
## 4 -87.53076 30.33239     1     4 alabama      <NA>
## 5 -87.57087 30.32665     1     5 alabama      <NA>
## 6 -87.58806 30.32665     1     6 alabama      <NA>
head(filter(gusa, region == "virginia"))
##        long      lat group order   region  subregion
## 1 -75.64188 37.96418    53 13482 virginia chesapeake
## 2 -75.61897 37.99856    53 13483 virginia chesapeake
## 3 -75.36114 38.02721    53 13484 virginia chesapeake
## 4 -75.39552 37.99283    53 13485 virginia chesapeake
## 5 -75.41843 37.96991    53 13486 virginia chesapeake
## 6 -75.42989 37.94127    53 13487 virginia chesapeake
p <- ggplot(gusa) + geom_polygon(aes(long, lat, group = group), color = "white")
p

Approximate Centroids

A quick approximation to the centroids (centers of gravity) of the polygons is to compute the center of the bounding rectangle.

This is easiest to do with the data from map_data.

state_centroids <- summarize(group_by(gusa, region),
                             x = mean(range(long)), y = mean(range(lat)))
names(state_centroids)[1] <- "state"
head(state_centroids)
## # A tibble: 6 x 3
##   state            x     y
##   <chr>        <dbl> <dbl>
## 1 alabama      -86.7  32.6
## 2 arizona     -112.   34.2
## 3 arkansas     -92.1  34.8
## 4 california  -119.   37.3
## 5 colorado    -106.   39.0
## 6 connecticut  -72.8  41.5

More sophisticated approaches to computing a centroid are available.

Projections

Longitude/latitude position points on a sphere; maps are drawn on a flat surface.

A simple approach, the default in map, uses a rectangular projection with the aspect ratio chosen so that longitude and latitude scales are equivalent at the center of the map.

map("state")

Other projections try to preserve angles or areas.

map("state", project = "mercator")

map("state", project = "bonne", param = 45)

Adding points or lines to a map drawn by the map function with the default projection can be done using points or lines.

map("state")
with(state_centroids, points(x, y))

When using other projections with map, any points or lines added to the plot need to have their coordinates projected as well.

This can be done using mapproject from the mapproj package.

library(mapproj)
map("state", project = "mercator")
msc <- mapproject(state_centroids$x, state_centroids$y)
points(msc$x, msc$y)


map("state", project = "bonne", param = 45)
bsc <- mapproject(state_centroids$x, state_centroids$y)
points(bsc$x, bsc$y)

Using coord_quickmap in ggplot seems to be comparable to the map default.

pp <- p + geom_point(aes(x, y), data = state_centroids, color = "red")
pp + coord_quickmap()

pp + coord_map()

pp + coord_map("bonne", parameters = 45)

Symbol Plots of State Population

Aggregate the county populations to the state level:

state_pops <- mutate(pep2018,
                     state = tolower(sub(".*, ", "", GEO.display.label)),
                     pop = respop72018)
unique(state_pops$state)
##  [1] "alabama"              "alaska"               "arizona"             
##  [4] "arkansas"             "california"           "colorado"            
##  [7] "connecticut"          "delaware"             "district of columbia"
## [10] "florida"              "georgia"              "hawaii"              
## [13] "idaho"                "illinois"             "indiana"             
## [16] "iowa"                 "kansas"               "kentucky"            
## [19] "louisiana"            "maine"                "maryland"            
## [22] "massachusetts"        "michigan"             "minnesota"           
## [25] "mississippi"          "missouri"             "montana"             
## [28] "nebraska"             "nevada"               "new hampshire"       
## [31] "new jersey"           "new mexico"           "new york"            
## [34] "north carolina"       "north dakota"         "ohio"                
## [37] "oklahoma"             "oregon"               "pennsylvania"        
## [40] "rhode island"         "south carolina"       "south dakota"        
## [43] "tennessee"            "texas"                "utah"                
## [46] "vermont"              "virginia"             "washington"          
## [49] "west virginia"        "wisconsin"            "wyoming"             
## [52] "puerto rico"
state_pops <- summarize(group_by(state_pops, state),
                        pop = sum(pop, na.rm = TRUE))

Merge in the centroid locations. Using inner_join drops cases not included in the lower-48 map data.

state_pops <- inner_join(state_pops, state_centroids, "state")

head(state_pops)
## # A tibble: 6 x 4
##   state            pop      x     y
##   <chr>          <int>  <dbl> <dbl>
## 1 alabama      4887871  -86.7  32.6
## 2 arizona      7171646 -112.   34.2
## 3 arkansas     3013825  -92.1  34.8
## 4 california  39557045 -119.   37.3
## 5 colorado     5695564 -106.   39.0
## 6 connecticut  3572665  -72.8  41.5

Some dot plots:

ggplot(state_pops) + geom_point(aes(x = pop, y = state))

ggplot(state_pops) + geom_point(aes(x = pop, y = reorder(state, pop)))

The population distribution is heavily skewed; a color coding would need to account for this.

A proportional symbol map using circles and map:

map("state")

with(state_pops,
     symbols(x, y,
             circles = sqrt(pop), add = TRUE,
             inches = 0.1, bg = "black"))

The thermometer symbol corresponds to the approach suggested by Cleveland and McGiill (1984).

map("state")
with(state_pops, symbols(x, y,
                         thermometers = cbind(1, 3, pop / max(pop)),
                         inches = .3, add = TRUE))

A proportional symbol map using ggplot:

ggplot(gusa) +
    geom_polygon(aes(long, lat, group = group),
                 fill = NA, color = "grey") +
    geom_point(aes(x, y, size = pop), data = state_pops) +
    scale_size_area() +
    coord_map("bonne", parameters=45) +
    ggthemes::theme_map()

Building a ggplot thermometer map using geom_rect:

ggplot(gusa) +
    geom_polygon(aes(long, lat, group = group),
                 fill = NA, color = "grey") +
    coord_map() +
    ggthemes::theme_map() +
    geom_rect(aes(xmin = x - .4, xmax = x + .4,
                  ymin = y - 1, ymax = y + 2 * (pop / max(pop)) - 1),
              data = state_pops) +
    geom_rect(aes(xmin = x - .4, xmax = x + .4,
                  ymin = y - 1, ymax = y + 1),
              data = state_pops, fill = NA, color = "black")

Choropleth Maps of State Population

A choropleth map needs to have the information for coloring all the pieces of a region.

For ggplot this can be done by merging:

sp <- select(state_pops, region = state, pop)
gusa_pop <- left_join(gusa, sp, "region")
head(gusa_pop)
##        long      lat group order  region subregion     pop
## 1 -87.46201 30.38968     1     1 alabama      <NA> 4887871
## 2 -87.48493 30.37249     1     2 alabama      <NA> 4887871
## 3 -87.52503 30.37249     1     3 alabama      <NA> 4887871
## 4 -87.53076 30.33239     1     4 alabama      <NA> 4887871
## 5 -87.57087 30.32665     1     5 alabama      <NA> 4887871
## 6 -87.58806 30.32665     1     6 alabama      <NA> 4887871

A first attempt:

ggplot(gusa_pop) +
    geom_polygon(aes(long, lat, group = group, fill = pop)) +
    coord_map("bonne", parameters=45) +
    ggthemes::theme_map()

This image is dominated by the fact that most state populations are small.

Showing population ranks, or percentile values, can help see the variation a bit better.

spr <- mutate(sp, rpop = rank(pop))
gusa_rpop <- left_join(gusa, spr, "region")
ggplot(gusa_rpop) +
    geom_polygon(aes(long, lat, group = group, fill = rpop)) +
    coord_map("bonne", parameters=45) +
    ggthemes::theme_map()

Using quintile bins instead of a continuous scale:

ncls <- 6
spr <- mutate(spr,
              pcls = cut(pop, quantile(pop, seq(0, 1, len = ncls)),
                         include.lowest = TRUE))
gusa_rpop <- left_join(gusa, spr, "region")
ggplot(gusa_rpop) +
    geom_polygon(aes(long, lat, group = group, fill = pcls),
                 color = "grey") +
    coord_map("bonne", parameters=45) +
    ggthemes::theme_map() +
    scale_fill_brewer(palette = "Reds")

A percentile bin map using the map function requires a vector of colors for the regions:

usa_states <- sub(":.*", "", usa$names)
usa_pcls <- spr$pcls[match(usa_states, spr$region)]
pal <- RColorBrewer::brewer.pal(nlevels(usa_pcls), "Reds")
map("state", fill = TRUE, col = pal[usa_pcls], border = "grey")

This uses the match function to find indices for each polygon’s state in the regions vector.

Another way to compute the classes for the pieces:

usa_pieces <- data.frame(names = usa$names)
usa_pieces <- separate(usa_pieces, "names", c("region", "subregion"),
                       sep = ":", fill = "right")
usa_pieces <- left_join(usa_pieces, spr, "region")
map("state", fill = TRUE, col = pal[usa_pieces$pcls], border = "grey")

Choropleth Maps of County Population

For a county-level ggplot map, first get the polygon data frame:

gcounty <- map_data("county")
head(gcounty)
##        long      lat group order  region subregion
## 1 -86.50517 32.34920     1     1 alabama   autauga
## 2 -86.53382 32.35493     1     2 alabama   autauga
## 3 -86.54527 32.36639     1     3 alabama   autauga
## 4 -86.55673 32.37785     1     4 alabama   autauga
## 5 -86.57966 32.38357     1     5 alabama   autauga
## 6 -86.59111 32.37785     1     6 alabama   autauga

To attach the FIPS code we first need to clean up the county.fips table a bit:

head(filter(county.fips, grepl(":", polyname)))
##    fips                        polyname
## 1 12091           florida,okaloosa:main
## 2 12091           florida,okaloosa:spit
## 3 22099       louisiana,st martin:north
## 4 22099       louisiana,st martin:south
## 5 37053 north carolina,currituck:knotts
## 6 37053   north carolina,currituck:main

Remove the sub-county regions, remove duplicate rows, and split the polyname variable into region and subregion:

fipstab <-
    transmute(county.fips, fips, county = sub(":.*", "", polyname))
fipstab <- unique(fipstab)
fipstab <-
    separate(fipstab, county, c("region", "subregion"), sep = ",")
head(fipstab)
##   fips  region subregion
## 1 1001 alabama   autauga
## 2 1003 alabama   baldwin
## 3 1005 alabama   barbour
## 4 1007 alabama      bibb
## 5 1009 alabama    blount
## 6 1011 alabama   bullock

Now use left_join to merge the FIPS code into gcounty:

gcounty <- left_join(gcounty, fipstab, c("region", "subregion"))
head(gcounty)
##        long      lat group order  region subregion fips
## 1 -86.50517 32.34920     1     1 alabama   autauga 1001
## 2 -86.53382 32.35493     1     2 alabama   autauga 1001
## 3 -86.54527 32.36639     1     3 alabama   autauga 1001
## 4 -86.55673 32.37785     1     4 alabama   autauga 1001
## 5 -86.57966 32.38357     1     5 alabama   autauga 1001
## 6 -86.59111 32.37785     1     6 alabama   autauga 1001

Pull together the data for the map:

cpop <- select(pep2018,
               fips = GEO.id2,
               pop10 = rescen42010,
               pop18 = respop72018)
cpop <- mutate(cpop, rpop18 = rank(pop18))
cpop <- mutate(cpop,
               pcls18 = cut(pop18, quantile(pop18, seq(0, 1, len = ncls)),
                            include.lowest = TRUE))
head(cpop)
##   fips  pop10  pop18 rpop18              pcls18
## 1 1001  54571  55601   2289 (3.66e+04,9.22e+04]
## 2 1003 182265 218022   2910 (9.22e+04,1.01e+07]
## 3 1005  27457  24881   1563  (1.9e+04,3.66e+04]
## 4 1007  22915  22400   1461  (1.9e+04,3.66e+04]
## 5 1009  57322  57840   2318 (3.66e+04,9.22e+04]
## 6 1011  10914  10138    729  (8.99e+03,1.9e+04]

Some of the counties in the polygon data base may not appear in the population data:

unique(select(filter(gcounty, ! fips %in% cpop$fips), region, subregion))
##         region subregion
## 1 south dakota   shannon
unique(select(anti_join(gcounty, cpop, "fips"), region, subregion))
##         region subregion
## 1 south dakota   shannon

A left_join with cpop will give these NA values.

gcounty_pop <- left_join(gcounty, cpop, "fips")
unique(select(filter(gcounty_pop, is.na(rpop18)), region, subregion))
##         region subregion
## 1 south dakota   shannon

County level population plots using the default continuous scale:

ggplot(gcounty_pop) +
    geom_polygon(aes(long, lat, group = group, fill = rpop18),
                 color = "grey", size = 0.1) +
    geom_polygon(aes(long, lat, group = group),
                 fill = NA, data = gusa, color = "lightgrey") +
    coord_map("bonne", parameters=45) + ggthemes::theme_map()

A discrete scale with a very different color to highlight the counties with missing information:

ggplot(gcounty_pop) +
    geom_polygon(aes(long, lat, group = group, fill = pcls18),
                 color = "grey", size = 0.1) +
    geom_polygon(aes(long, lat, group = group),
                 fill = NA, data = gusa, color = "lightgrey") +
    coord_map("bonne", parameters=45) + ggthemes::theme_map() +
    scale_fill_brewer(palette = "Reds", na.value = "blue") +
    theme(legend.background = element_rect(fill = NA))

A County Population Symbol Map

Approximate County centroids:

county_centroids <- summarize(group_by(gcounty, fips),
                              x = mean(range(long)), y = mean(range(lat)))

county_pops <- select(cpop, pop = pop18, fips)
county_pops <- inner_join(county_pops, county_centroids, "fips")

A symbol map can also be used to show the county populations:

ggplot(gcounty) +
    geom_polygon(aes(long, lat, group = group), fill = NA, col = "grey",
                 data = gusa) +
    geom_point(aes(x, y, size = pop), data = county_pops) + scale_size_area() +
    coord_map("bonne", parameters=45) + ggthemes::theme_map()

Jittering might be helpful to remove the very regular grid patters in the center.

Comparing Populations in 2010 and 2018

One possible way to compare spatial data for several time periods is to use a faceted display with one map for each period.

To create a faceted display with ggplot we need the data in tidy form:

cpop <- select(pep2018,
               fips = GEO.id2, pop18 = respop72018, pop10 = rescen42010)
cls <- quantile(cpop$pop18, seq(0, 1, len = ncls))
cpop <- mutate(cpop, pcls18 = cut(pop18, cls, include.lowest = TRUE))
cpop <- mutate(cpop, pcls10 = cut(pop10, cls, include.lowest = TRUE))
gcounty_pop <- left_join(gcounty, cpop)

lgcounty <- gather(gcounty_pop, period, pcls, pcls10, pcls18)

A faceted display:

ggplot(lgcounty) +
    geom_polygon(aes(long, lat, group = group, fill = pcls),
                 color = "grey", size = 0.1) +
    geom_polygon(aes(long, lat, group = group),
                 fill = NA, data = gusa, color = "lightgrey") +
    coord_map("bonne", parameters=45) + ggthemes::theme_map() +
    scale_fill_brewer(palette = "Reds", na.value = "blue") +
    facet_wrap(~period, ncol = 1) +
    theme(legend.justification = c(1, 0), legend.position = c(1, 0))

A more effective approach in this case is to plot the relative changes.

Adding these to the cpop data:

cpop <- mutate(cpop, pchange = 100 * (pop18 - pop10) / pop10)
gcounty_pop <- left_join(gcounty, cpop, "fips")

A binned version will also be useful:

bins <- c(-Inf, -20, -10, -5, 5, 10, 20, Inf)
cpop <- mutate(cpop, cpchange = cut(pchange, bins))
gcounty_pop <- left_join(gcounty, cpop, "fips")

Using a continuous scale with the default palette:

pcounty <- ggplot(gcounty_pop) +
    coord_map("bonne", parameters=45) + ggthemes::theme_map()
state_layer <- geom_polygon(aes(long, lat, group = group),
                            fill = NA, data = gusa, color = "lightgrey")
county_cont <- geom_polygon(aes(long, lat, group = group, fill = pchange),
                            color = "grey", size = 0.1)

pcounty + county_cont + state_layer

Using a simple diverging palette:

pcounty + county_cont + state_layer + scale_fill_gradient2()

For the binned version the default discrete color palette produces:

county_disc <- geom_polygon(aes(long, lat, group = group, fill = cpchange),
                            color = "grey", size = 0.1)

pcounty + county_disc + state_layer

Using the Spectral palette from ColorBrewer:

pcounty + county_disc + state_layer + scale_fill_brewer(palette = "Spectral")

Focusing on Iowa

piowa <- ggplot(filter(gcounty_pop, region == "iowa")) +
         coord_map() + ggthemes::theme_map()

The default continuous palette produces

pc_cont_iowa <- geom_polygon(aes(long, lat, group = group, fill = pchange),
                             color = "grey", size = 0.2)
piowa + pc_cont_iowa

Iowa has 99 counties, which seems a peculiar number.

With the default discrete palette:

pc_disc_iowa <- geom_polygon(aes(long, lat, group = group, fill = cpchange),
                             color = "grey", size = 0.2)
piowa + pc_disc_iowa

Using the Spectral palette, for example, will by defualt adjust the palette to the levels present in the data, and Iowa does not have counties in all the bins:

piowa + pc_disc_iowa + scale_fill_brewer(palette = "Spectral")

Two issues:

Adding drop = FALSE to the palette specification preserves the encoding from the full map:

piowa + pc_disc_iowa + scale_fill_brewer(palette = "Spectral", drop = FALSE)

Isopleth Map of Population Density

To estimate the population density we can use the county centroids weighted by the county population values.

The function kde2d from the MASS package can be used to compute kernel density estimates, but does not support the use of weights.

A simple modification that does support weights is available.

By default, kde2d estimates the density on a grid over the range of the two variables.

source("kde2d.R")
ds <- with(county_pops, kde2d(x, y, weights = pop))
str(ds)
## List of 3
##  $ x: num [1:25] -124 -122 -119 -117 -115 ...
##  $ y: num [1:25] 25.5 26.4 27.4 28.4 29.4 ...
##  $ z: num [1:25, 1:25] 1.27e-12 2.49e-11 4.10e-10 5.70e-09 3.72e-08 ...
contour(ds)

To produce contours that work better when filled, it is useful to increase the number of grid points and enlarge the range:

ds <- with(county_pops, kde2d(x, y, weights = pop,
                              n = 50, lims = c(-130,-60, 20, 50)))

A contour map using the map function:

map("state")
contour(ds, add = TRUE, col = "blue")

The ggplot approach requires the density in tidy form:

dsrfc <- expand.grid(Lon = ds$x, Lat = ds$y)
dsrfc$dens <- as.vector(ds$z)

The analogous contour map using ggplot:

pusa <- ggplot(gusa) +
    geom_polygon(aes(long, lat, group = group),
                 fill = NA, color = "grey") +
        coord_quickmap()
pusa + stat_contour(aes(Lon, Lat, z = dens), data = dsrfc)

A filled contour version can be created using stat_contour and fill = ..level..:

pusa + stat_contour(aes(Lon, Lat, z = dens, fill = ..level..),
                    data = dsrfc, geom = "polygon", alpha = 0.2)

To focus on Iowa we need subsets of the populations data and the map data:

iowa_pops <- filter(county_pops, fips %/% 1000 == 19)
giowa <- filter(gcounty, fips %/% 1000 == 19)

A proportional symbols plot for the Iowa county populations:

iowa_base <- ggplot(giowa) +
    geom_polygon(aes(long, lat, group = group), fill = NA, col = "grey") +
    coord_quickmap()

iowa_base +
    geom_point(aes(x, y, size = pop), data = iowa_pops) +
    scale_size_area() +
    ggthemes::theme_map()

For a density plot it is useful to expand the populations used to include nearby large cities, such as Omaha:

iowa_pops <- filter(county_pops, x > -97 & x < -90 & y > 40 & y < 44)

Density estimates in tidy form:

dsi <- with(iowa_pops, kde2d(x, y, weights = pop,
                             n = 50, lims = c(-98, -89, 40, 44.5)))
srfci <- expand.grid(Lon = dsi$x, Lat = dsi$y)
srfci$dens <- as.vector(dsi$z)

A simple contour map:

ggplot(giowa) +
    geom_polygon(aes(long, lat, group = group), fill = NA, color = "grey") +
    geom_contour(aes(Lon, Lat, z = dens), color = "blue", na.rm = TRUE,
                 data = srfci) + coord_quickmap()

A filled contour map

pdiowa <- ggplot(giowa) +
    geom_polygon(aes(long, lat, group = group), fill = NA, color = "grey") +
    stat_contour(aes(Lon, Lat, z = dens, fill = ..level..),
                 color = "black", alpha = 0.2, na.rm = TRUE, data = srfci,
                 geom = "polygon")
pdiowa + coord_quickmap()

Using xlim and ylim arguments to coord_quickmap to trim the range:

pdiowa + coord_quickmap(xlim = c(-96.5,-90), ylim = c(40.5, 43.5))

Shape Files

The ESRI shapefile format is a widely used vector data format for geographic information systems.

Shapefiles are available from many sources; for example:

Other formats and data bases are also available. Some references:

Proportion of Population Without Health Insurance

This example is adapted from two blog posts (part 1, part 2).

The American Community Survey provides estimates of a number of features at the census tract level, including the estimated percentage of the population within each census tract without health insurance.

To show the data on a map we need census tract boundaries. A shapefile of these boundaries is available at

https://www2.census.gov/geo/tiger/GENZ2010/gz_2010_19_140_00_500k.zip

This file is also available locally.

Downloading an unpacking the shapefile data:

getDataDir <- function(dname) {
    data_url <- "http://homepage.divms.uiowa.edu/~luke/data"
    aname <- paste0(dname, ".tgz")
    if (! file.exists(dname)) {
        download.file(paste(data_url, aname, sep = "/"), aname)
        untar(aname)
        file.remove(aname)
    }
}
shape_file <- "gz_2010_19_140_00_500k"
shape_file_dir <- shape_file
getDataDir(shape_file_dir)

The data can be read with readOGR from rgdal. The result is a SpatialPolygonsDataFrame object:

library(rgdal)
library(ggplot2)
library(dplyr)
raw_tract <- readOGR(dsn = shape_file_dir, layer = shape_file)
## OGR data source with driver: ESRI Shapefile 
## Source: "/home/luke/writing/classes/uiowa/STAT4580/webpage/gz_2010_19_140_00_500k", layer: "gz_2010_19_140_00_500k"
## with 825 features
## It has 7 fields
class(raw_tract)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"

For use with ggplot we an use fortify to convert this object to a tidy form:

tract <- fortify(raw_tract, region="GEO_ID")
head(tract)
##        long      lat order  hole piece                   id
## 1 -94.70070 41.41758     1 FALSE     1 1400000US19001960100
## 2 -94.70063 41.46084     2 FALSE     1 1400000US19001960100
## 3 -94.70072 41.46725     3 FALSE     1 1400000US19001960100
## 4 -94.70071 41.46798     4 FALSE     1 1400000US19001960100
## 5 -94.70053 41.48973     5 FALSE     1 1400000US19001960100
## 6 -94.70048 41.49705     6 FALSE     1 1400000US19001960100
##                    group
## 1 1400000US19001960100.1
## 2 1400000US19001960100.1
## 3 1400000US19001960100.1
## 4 1400000US19001960100.1
## 5 1400000US19001960100.1
## 6 1400000US19001960100.1

The outlines of all Iowa cencus tracts:

outline_layer <- geom_polygon(aes(long, lat, group = group),
                              fill = NA, col = "black", size = 0.2)

ggplot(tract) + outline_layer + coord_quickmap()

To look at county level subsets it is useful to be able to subset a data frame on the form of id used, which is based on the county FIPS code.

county.fips.table <-
    mutate(maps::county.fips, county = sub(":.*", "", polyname))

getCountyFIPS <- function(county, state = "iowa") {
    key <- paste(tolower(state), tolower(county), sep = ",")
    idx <- match(key, county.fips.table$county)
    county.fips.table$fips[idx]
}

county_data <- function(tract, county, state = "iowa") {
    fips <- getCountyFIPS(county, state)
    pkey <- paste0("1400000US", fips)
    filter(tract, grepl(pkey, id))
}

getCountyFIPS("johnson")
## [1] 19103
head(county_data(tract, "johnson"))
##        long      lat order  hole piece                   id
## 1 -91.52066 41.66395 20824 FALSE     1 1400000US19103000100
## 2 -91.52120 41.66395 20825 FALSE     1 1400000US19103000100
## 3 -91.52118 41.66360 20826 FALSE     1 1400000US19103000100
## 4 -91.52152 41.66350 20827 FALSE     1 1400000US19103000100
## 5 -91.52281 41.66351 20828 FALSE     1 1400000US19103000100
## 6 -91.52277 41.66571 20829 FALSE     1 1400000US19103000100
##                    group
## 1 1400000US19103000100.1
## 2 1400000US19103000100.1
## 3 1400000US19103000100.1
## 4 1400000US19103000100.1
## 5 1400000US19103000100.1
## 6 1400000US19103000100.1

The census tracts for Johnson county:

ggplot(county_data(tract, "johnson")) + outline_layer + coord_quickmap()

And for Polk county, where Des Moines is located:

ggplot(county_data(tract, "polk")) + outline_layer + coord_quickmap()

The ACS data can be downloaded from

https://factfinder.census.gov/faces/nav/jsf/pages/download_center.xhtml#none

The five-year summary from 2015 is available locally and can be downloaded and unpacked with

acs_table <- "ACS_14_5YR_S2701"
acs_file <- paste0(acs_table, "/", acs_table, "_with_ann.csv")
getDataDir(acs_table)
acs_data <- read.csv(acs_file, stringsAsFactors = FALSE)

Reduce the data to the variables we need, with an id that matches the format from the shape file, and join with the census tract polygons:

acs_data <- select(acs_data,
                   id = GEO.id2, percent = HC03_EST_VC01, pop = HC01_EST_VC01)
acs_data <- mutate(acs_data, id = paste0("1400000US", id))
acs_data <- mutate(acs_data, percent = as.numeric(percent))
## Warning in rlang::eval_tidy(~as.numeric(percent), <environment>): NAs
## introduced by coercion

plot_data <- left_join(tract, acs_data)

Plots for Iowa:

ggplot(plot_data) +
    geom_polygon(aes(long, lat, group = group, fill = percent),
                 col = "black") +
    coord_quickmap() +
    scale_fill_distiller(palette = "Reds", direction = 1)

for Johnson county:

ggplot(county_data(plot_data, "johnson")) +
    geom_polygon(aes(long, lat, group = group, fill = percent),
                 col = "black") +
    coord_quickmap() +
    scale_fill_distiller(palette = "Reds", direction = 1)

and for Polk county:

ggplot(county_data(plot_data, "polk")) +
    geom_polygon(aes(long, lat, group = group, fill = percent),
                 col = "black") +
    coord_quickmap() +
    scale_fill_distiller(palette = "Reds", direction = 1)

For the state-wide map it may help to drop the census tract borders and add county borders:

ggplot(plot_data) +
    geom_polygon(aes(long, lat, group = group, fill = percent), col = NA) +
    geom_polygon(aes(long, lat, group = group),
                 fill = NA, color = "grey", data = giowa, size = 0.2) +
    coord_quickmap() +
    scale_fill_distiller(palette = "Reds", direction = 1)

Using geom_map

An alternative for ggplot maps is to use geom_map.

ggplot(acs_data, aes(map_id = id)) +
    geom_map(aes(fill = percent), map = tract) +
    with(tract, expand_limits(x = long, y = lat)) +
    coord_quickmap() +
    scale_fill_distiller(palette = "Reds", direction = 1)

A Thermometer Map

The map function can handle spatial polygon data frames directy:

map(raw_tract)

As these objects are data frames we can also use standard subset operations to extract the Johnson county polygons:

raw_jc <- raw_tract[grepl("19103", raw_tract$GEO_ID),]
map(raw_jc)

The coordinates function computes centroids for spatial polygon data frames

crd_jc <- as.data.frame(coordinates(raw_jc))
names(crd_jc) <- c("x", "y")
map(raw_jc)
points(crd_jc)

We can add the ACS data with a left_join:

crd_jc$id <- as.character(raw_jc$GEO_ID)
crd_jc <- left_join(crd_jc, acs_data)

This allows us to add thermometer symbols with the symbols function:

map(raw_jc)
with(crd_jc, symbols(x, y, thermometers=cbind(0.5, 3, percent / 100),
                     add = TRUE, inches = 0.3))


map(raw_jc)
with(crd_jc, symbols(x, y, thermometers=cbind(0.5, 3, percent / max(percent)),
                     add = TRUE, inches = 0.3))

A more effective display in this case may be a linked micromap.

A Linked Micromap

The micromap package provides several functions for creating linked micromaps.

This code will produce a linked micromap of the uninsured percentage and the cencus tract populations:

library(micromap)

maptab <- create_map_table(raw_jc, IDcolumn = "GEO_ID")
acs_data_jc <- county_data(acs_data, "johnson")

lmplot(stat.data = acs_data_jc, map.data = maptab,
       grouping = 5, ord.by = "percent",
       panel.types=c("labels", "dot", "dot", "map"),
       panel.data = c("id", "percent", "pop", NA),
       map.link=c("id", "ID"),
       panel.att = list(list(1, text.size = 0.6),
                        list(2, header = "Pct Uninsured"),
                        list(3, header = "Population")))

Leaflet Maps

Leaflet is a JavaScript library for interactive maps.

The leaflet package provides an R interface.

The leaflet package can work with spatial point and polygon data or the data objects returned by the map function.

We can use the raw_tract spatial polygon data frame merged with the ACS data. The merge function can handle the spatial polygon data:

lfdata <- merge(raw_tract, by.x = "GEO_ID", acs_data, by.y = "id", all.x = TRUE)

The base leaflet plot object:

library(leaflet)
lfplot <- leaflet(lfdata)

Evaluating these expressions in the R interpreter opens map windows in the browser:

addPolygons(lfplot,  fillColor = ~colorQuantile("Reds", percent)(percent))

addPolygons(lfplot,  fillColor = ~colorNumeric("Reds", percent)(percent))

addPolygons(lfplot,  fillColor = ~colorNumeric("Reds", percent)(percent),
            weight = 1, fillOpacity = 0.9)

This code produces an embedded leaflet widget:

addPolygons(lfplot,  fillColor = ~colorNumeric("Reds", percent)(percent),
            weight = 0.5, color = "black", fillOpacity = 0.9, opacity = 1.0,
            highlightOptions = highlightOptions(color = "white",
                                                weight = 2,
                                                bringToFront = TRUE),
            label = ~lapply(paste0("pct: ", percent, "<br/>", "pop: ", pop),
                            htmltools::HTML))