--- title: "Assignment 8 Notes" output: html_document: toc: yes --- ```{r global_options, include=FALSE} knitr::opts_chunk\$set(collapse=TRUE) ``` ## 1. Air Pollution Data When there is a clear dependent variable, that variable should go on the vertical axis; here that is `ozone.level`. If you use `ggpairs` it is a good idea to put the dependent variable last so you have a plot with the dependent variable on the vertical axis against each predictor variable: ```{r} library(SemiPar) data(calif.air.poll) library(GGally) ggpairs(calif.air.poll[c(2 : 4, 1)]) ``` The conditional distributions show an increasing relation between ozone level and inversion temperature; the slope decreases with increasing inversion height. ```{r} library(lattice) xyplot(ozone.level ~ inversion.base.temp | equal.count(inversion.base.height, 9, overlap = 0), type = c("p", "smooth"), data = calif.air.poll, col.line="red") ``` The top two height panels both contain points with heights of 5000. ## 2. Olive Oils ```{r, message = FALSE} library(dplyr) library(ggplot2) library(GGally) olives <- read.csv("http://homepage.divms.uiowa.edu/~luke/data/olives.csv") ``` Focus on the northern region: ```{r} olivesN <- filter(olives, Region == "North") olivesN <- droplevels(olivesN) ``` A parallel coordinates plot of all the values suggests looking more closely at `oleic`, `stearic`, and `linolenic`: ```{r} ggparcoord(olivesN, 3:10, groupColumn="Area", scale = "uniminmax") ``` The plot of In the plot of `stearic` against `oleic` shows the the Umbria oils all have `oleic` values above 7870: ```{r} ggplot(olivesN) + geom_point(aes(oleic, stearic, color = Area)) + geom_vline(aes(xintercept = 7870), linetype = 2) ``` Among the oils with `oleic > 7870` all Umbria oils, and only the Umbria oils have values of `stearic < 230` and `linolenic > 15`: ```{r} ggplot(filter(olivesN, oleic > 7870)) + geom_point(aes(linolenic, stearic, color = Area)) + geom_vline(aes(xintercept = 15), linetype = 2) + geom_hline(aes(yintercept = 230), linetype = 2) ```