\documentclass[11pt]{article} \setlength{\topmargin}{-0.25in} \setlength{\oddsidemargin}{0.0in} \setlength{\evensidemargin}{0.0in} \setlength{\textwidth}{5.75in} \setlength{\textheight}{8.5in} \setlength{\parindent}{0in} \setlength{\parsep}{0in} \setlength{\topsep}{0in} \setlength{\parskip}{1.2ex} \usepackage{Sweave} \begin{document} \large \begin{verbatim} Name: ___________________________________ \end{verbatim} \begin{center} {\bf Computing in Statistics}, STAT:5400\\ Midterm 2, Fall 2017 \\ \end{center} \normalsize \vskip 0.2 in \section{Simulation study} The {\tt prop.test} function, with the Yates correction argument set to FALSE, computes large sample normal-theory confidence intervals. With appropriate adjustment to the to the numeric arguments, it can also compute plus-four method intervals. Conduct a simulation study to compare the true coverage probabilities of large-sample normal theory intervals versus plus-four method intervals, when the nominal coverage is 90\%, the sample size is 25 and the true success probability is 0.16. Use a large enough number of replicate datasets so that the standard error of your estimated coverage probabilities will be no greater than 0.0025. Time how long it takes to run your simulation study. Submit: \begin{itemize} \item Your R code. \item Your R output containing the results of the simulation study. \item R output showing how long it took to run the simulation study. \item A brief paragraph interpreting the results. \end{itemize} <<>>= doit <- function( n, conf, se, truep, seed) { getInt <- function(v) { prop.test( v[1], v[2], conf.level = conf, correct = FALSE)$conf.int } # how many replicate datasets, based on required stand error S <- 1/(4 * se^2) # generate data set.seed( seed ) mydat <- rbinom( S , size = n, prob = truep ) normal <- apply( cbind(mydat, rep(n,S) ), 1, getInt ) plus4 <- apply( cbind(mydat+2, rep(n+4,S) ), 1, getInt ) normalcov <- 0.16 > normal[1,] & 0.16 < normal[2,] plus4cov <- 0.16 > plus4[1,] & 0.16 < plus4[2,] coverage <- c(mean(normalcov), mean(plus4cov) ) names(coverage) = c("normal","plus4") coverage } system.time(coverage <- doit( 25, .90, .0025, 0.16, 17 )) print(coverage) @ This is one of the fairly rare conditions under which the plus-four method performs more poorly than the large-sample, normal-theory method. The plus-four actually is anti-conservative here -- its actual coverage is lower than its nominal coverage. However, prop.test must not be doing exactly what is advertised. If you code the computations yourself, you will see both methods coming in right around 90\% coverage! <<>>= mydat <- rbinom(40000, 25, 0.16) z <- qnorm(0.95) computeInt <- function( y, n) { phat <- y/n se <- sqrt( phat * (1-phat) / n ) mult <- c(-1,1) interval <- phat + mult * z * se } normInts <- plus4Ints <- matrix(0, nrow=40000, ncol=2) for(i in 1:40000) { normInts[i,] <- computeInt( mydat[i], 25 ) plus4Ints[i,] <- computeInt( mydat[i] + 2, 29) } normcov <- mean( .16 > normInts[,1] & .16 < normInts[,2]) plus4cov <- mean( .16 > plus4Ints[,1] & .16 < plus4Ints[,2]) print(c(normcov, plus4cov)) @ \section{Root-finding} The function \[ f(x) \ = \ x^4 - 2 x^3 + 1 \] has two real roots in the interval [0.5, 2.0] . Use the {\tt uniroot} function in such a way that you are able to find both roots. Submit: \begin{itemize} \item Your R code. \item R output containing results for each root. \item Any other R output that helped you do this. \end{itemize} <>= myfunc <- function(x) {x^4 - 2 * x^3 + 1} x <- seq(0.5,2.0, 0.01) plot(x, myfunc(x), type="l") result1 <- uniroot( myfunc, c(0.5, 1.5) ) result2 <- uniroot( myfunc, c(1.5, 2.0) ) result1 result2 @ \section{Relational database structures} A journal editor wishes to store information about the papers published in her journal and the authors of those papers. Below are the attributes that she wishes to store. Develop a relational database structure, in third normal form, for these data. For each table, indicate any primary and/or foreign keys. Note that the same person could be an author of more than one paper, and a paper may have more than one author. This is a many-to-many relationship, so a linking table will be needed. \begin{verbatim} Paper table ---------- Paper ID (primary key) Paper title Paper volume number Paper issue number Paper starting page number Author table ------------ Author ID (primary key) Author name Author affiliation Author email Linking table ------------- (Primary key is combination of the two fields) Paper ID (foreign key) Author ID (foreign key) \end{verbatim} \end{document}