Population and Size of Cities Ca. 1800

Data used by Playfair on population and size of some major European cities around 1800 is available in a file at http://www.stat.uiowa.edu/~luke/data/Playfair.

This file can be read using read.table. We can read it directly from the web as

Playfair <- read.table("http://www.stat.uiowa.edu/~luke/data/Playfair")

In an Rmarkdown file you might want to work on when you are not connected to the internet it might be a good idea to download a local copy if you dont have one and then use the local copy:

if (! file.exists("Playfair.dat"))
    download.file("http://www.stat.uiowa.edu/~luke/data/Playfair",
                  "Playfair.dat")

You can hide this chunk with the chunk option include = FALSE.

Using the local file:

Playfair <- read.table("Playfair.dat")
names(Playfair)
## [1] "population" "diameter"
head(Playfair, 2)
##           population diameter
## Edinburgh         60    9.144
## Stockholm         63    9.652

This data frame isn’t in tidy form if we want to be able to use the city names as a variable since read.table stores these as the row names.

One approach to tidying this data frame is to

Playfair$city <- rownames(Playfair)
rownames(Playfair) <- NULL
head(Playfair, 2)
##   population diameter      city
## 1         60    9.144 Edinburgh
## 2         63    9.652 Stockholm

Another option is to read the data as three variables by skipping the first row and then adding the variable names:

Playfair <- read.table("Playfair.dat", skip = 1, stringsAsFactors = FALSE)
names(Playfair)
## [1] "V1" "V2" "V3"
names(Playfair) <- c("city", "population", "diameter")

Useful checks:

str can help:

str(Playfair)
## 'data.frame':    22 obs. of  3 variables:
##  $ city      : chr  "Edinburgh" "Stockholm" "Florence" "Genoa" ...
##  $ population: int  60 63 75 80 80 80 90 120 130 140 ...
##  $ diameter  : num  9.14 9.65 10.16 10.67 10.16 ...

One way to find the number of lines in the file:

length(readLines("Playfair.dat"))
## [1] 23

The number of lines in the file should be one more than the number of rows. It is not a bad idea to put a check in your file:

stopifnot(nrow(Playfair) + 1 == length(readLines("Playfair.dat")))

City Temperatures

The website https://www.timeanddate.com/weather/ provides current temperatures for a number of cities around the world. Values from January 23, 2019, were saved in a file you can download from http://www.stat.uiowa.edu/~luke/data/citytemps.dat.

citytemps <- read.table("citytemps.dat", header=TRUE)
dim(citytemps)
## [1] 141   2
head(citytemps)
##          city temp
## 1       Accra   82
## 2 Addis Ababa   61
## 3    Adelaide   89
## 4     Algiers   60
## 5      Almaty   25
## 6       Amman   50

Barley

The barley data set available in the lattice package records total yield in bushels per acre for 10 varieties at 6 experimental sites in Minnesota in each of two years.

library(lattice)
dim(barley)
## [1] 120   4
head(barley)
##      yield   variety year            site
## 1 27.00000 Manchuria 1931 University Farm
## 2 48.86667 Manchuria 1931          Waseca
## 3 27.43334 Manchuria 1931          Morris
## 4 39.93333 Manchuria 1931       Crookston
## 5 32.96667 Manchuria 1931    Grand Rapids
## 6 28.96667 Manchuria 1931          Duluth

Diamonds

The diamonds data set available in the ggplot2 package contains prices and other attributes of almost 54,000 diamonds.

library(ggplot2)
dim(diamonds)
## [1] 53940    10
head(diamonds)
## # A tibble: 6 x 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48