\documentclass[11pt]{article} \setlength{\topmargin}{-0.25in} \setlength{\oddsidemargin}{0.0in} \setlength{\evensidemargin}{0.0in} \setlength{\textwidth}{5.75in} \setlength{\textheight}{8.5in} \setlength{\parindent}{0in} \setlength{\parsep}{0in} \setlength{\topsep}{0in} \setlength{\parskip}{1.2ex} \usepackage{Sweave} \begin{document} \large \begin{verbatim} Name: ___________________________________ \end{verbatim} \begin{center} {\bf Computing in Statistics}, STAT:5400\\ Midterm 2, Fall 2015 \\ \end{center} \normalsize \vskip 0.2 in Submit your answers in the ICON drop box as an .Rnw file and .pdf file produced using Sweave, with your name as author. If you can't get your .Rnw file to compile, submit it anyway and include your R output in a separate text file. In your document, have a named section for each problem, and, where needed, a numbered list of answers to multipart questions. You don't have to type any other text except where needed to answer a question. \section{Simulation studies} Given a random sample of $n$ observations $y_1, \ y_2, \ \ldots , y_n$, the maximum likelihood estimator of the variance in the population from which the sample was drawn is \[ m \ = \ \frac{ \sum(y_i - \bar{y})^2 }{n} \] This estimator is known to be biased. The unbiased estimator is \[ s^2 \ = \ \frac{ \sum(y_i - \bar{y})^2 }{n-1} \] Carry out a simulation study to compare these two estimators with respect to {\em mean squared error} for the case when \begin{itemize} \item the true population distribution is Gamma with shape parameter 8 and rate parameter 2 \item the sample size is $n = 10$ \end{itemize} Set your number of replicate datasets large enough that the standard error of your mean-squared-errors estimates is no larger than 0.04. <<>>= set.seed(9) n <- 10 S <- 1000 mygammas <- matrix( rgamma( n * S, 8,2), nrow=S) ssq <- apply(mygammas,1,var) mle <- ((n-1)/n) * ssq truevar <- 8 / (2^2) ssqse <- (ssq - truevar )^2 stderrssq <- sqrt( var(ssqse) / S ) mlese <- (mle - truevar)^2 stderrmle <- sqrt( var(mlese) / S ) results <- rbind( c( mean( ssqse ), mean( mlese )), c(stderrssq, stderrmle) ) rownames(results) <- c( "Mean squared Errors","Standard Errors of MSE") colnames(results) <- c( "s^2", "mle" ) print(results) @ Standard errors are too big by a factor of about 2.5, so we need to rerun with at least $2.5^2 = 6.25$ times as many replicate datasets. <<>>= set.seed(9) n <- 10 S <- 6250 mygammas <- matrix( rgamma( n * S, 8,2), nrow=S) ssq <- apply(mygammas,1,var) mle <- ((n-1)/n) * ssq truevar <- 8 / (2^2) ssqse <- (ssq - truevar )^2 stderrssq <- sqrt( var(ssqse) / S ) mlese <- (mle - truevar)^2 stderrmle <- sqrt( var(mlese) / S ) results <- rbind( c( mean( ssqse ), mean( mlese )), c(stderrssq, stderrmle) ) rownames(results) <- c( "Mean squared Errors","Standard Errors of MSE") colnames(results) <- c( "s^2", "mle" ) print(results) @ \section{Jackknife and Bootstrap} Use your first simulated dataset from the previous problem as your data for this problem. The statistic of interest is the unbiased estimator of variance ($s^2$ as defined above). \begin{enumerate} \item Use the jackknife to estimate the bias in $s^2$ in your gamma dataset of size 10. You may either use the {\tt jackknife} function in the {\tt bootstrap} package or use other code to implement the jackknife. Show your R code and its output. \item Next, use the nonparametric bootstrap for the same purpose. You may either use the {\tt boot} function in the {\tt boot} package or other code to implement the nonparametric bootstrap. Show your R code and its output. \item It is often said that the jackknife is an approximation to the bootstrap. Comment briefly on which procedure performs better in this problem. \end{enumerate} <<>>= library(boot) library(bootstrap) data <- mygammas[1,] jackout <- jackknife(data, var) print(jackout) myvar <- function(x,ind) {var(x[ind])} bootout <- boot(data, myvar, R=1999) print(bootout) @ Since $s^2$ actually is an unbiased estimator, the jackknife is correct to estimate the bias at 0. In this case, jackknife performs better than bootstrap. \section{Root-finding} Consider the function \[ f(x) \ = \ x^4 - 6 x^3 + 5 x^2 - x - 1 \] \begin{enumerate} \item Plot this function on the interval (1,7). Show your R code and the plot. \item Explain briefly why the bisection algorith could be used to find the root of this function on the interval (1,7). $f$ is a continuous function. f(1) is negative and f(7) is positive. \item Find the root of this function on the interval (1,7) in a two-step process: \begin{enumerate} \item First, use the bisection algorithm to find an interval of width not greater than 0.75 that contains the root. \item Then use {\tt uniroot} or any other optimization function of your choice to find the root to within $\pm 0.00001$. \end{enumerate} \end{enumerate} <>= f <- function(x) x^4 - 6 * x^3 + 5 * x^2 -x - 1 print(f(1)) print(f(7)) x <- seq(1,7,by=0.1) plot(x, f(x), type="l") source("bisection.R") bisection.out <- bisection( f, 1,7, 0.75, 15) newinterval <- c( bisection.out$a, bisection.out$b) print(newinterval) print(uniroot(f, newinterval, tol=0.00001)) @ \end{document}