# Background

• R is a language, or an environment, for data analysis and visualization.

• R is derived form the S language developed at ATT Bell Laboratories.

• R was originally developed for teaching at the University of Auckland, New Zealand, by Ross Ihaka and Robert Gentleman.

• R is now maintained by an international group of about 20 statisticians and computer scientists.

• A great strength of R is the large number of extension packages that have been developed; the number available on CRAN recently reached 10,000.

# Basic Usage

• Interactive R uses a command line interface (CLI)

• The interface runs a read-evaluate-print loop (REPL)

• A simple interaction with the R interpreter:

> 1 + 2
[1] 3
• Values can be assigned to variables using a left arrow <- combination:
> x <- c(1, 3, 5)
> x
[1] 1 3 5
• Basic arithmetic operations work element-wise on vectors,
> x + x
[1]  2  6 10
> 2 * x
[1]  2  6 10

# A Simple Scatter Plot

with(faithful,
plot(eruptions, waiting,
xlab = "Eruption time (min)",
ylab = "Waiting time to next eruption (min)"))

# Fitting a Linear Regression

fit <- with(faithful, lm(waiting ~ eruptions))
fit
##
## Call:
## lm(formula = waiting ~ eruptions)
##
## Coefficients:
## (Intercept)    eruptions
##       33.47        10.73

# Adding the Regression Line to the Plot

with(faithful,
plot(eruptions, waiting,
xlab = "Eruption time (min)",
ylab = "Waiting time to next eruption (min)"))
abline(coef(fit), col = "red", lwd = 3)

# Packages and Package Libraries

• Extension modules and data sets are often made available in packages.

• Packages are stored in folders as collections called libraries.

• .libPaths() will show you the libraries your R process will search.

• The library function is used to make available packages from libraries.

• You can install packages using the install.packages function or the Install Packages item in the RStudio Tools menu.

# A Useful Package: ggplot2

• The ggplot2 package provides a powerful alternative to the base graphics system.

• The geyser example can be done in ggplot2 like this:

library(ggplot2)
p <- ggplot(faithful, aes(x = eruptions, y = waiting))
p + geom_point() + geom_smooth(method = "lm", se = FALSE)

# Contrast to Point-and-Click Interfaces

• Even simple tasks require learning some of the R language.

• Once you can do simple tasks, you have learned some of the R language.

• More complicated tasks become easier.

• Even very complicated tasks become possible.

# R and Reproducability

• Analyses in R are carried out by running code describing the tasks to perform.

• This code can be

• audited to make sure the analysis is right
• replayed to make sure the results are repoducable
• reused after changes in the data or on new data
• Literate data analysis tools like Rmarkdown provide support for this.