\documentclass[11pt]{article}
\setlength{\topmargin}{-0.25in}
\setlength{\oddsidemargin}{0.0in}
\setlength{\evensidemargin}{0.0in}
\setlength{\textwidth}{5.75in}
\setlength{\textheight}{8.5in}
\setlength{\parindent}{0in}
\setlength{\parsep}{0in}
\setlength{\topsep}{0in}
\setlength{\parskip}{1.2ex}
\usepackage{Sweave}
\begin{document}
\large
\begin{verbatim}
Name: ___________________________________
\end{verbatim}
\begin{center}
{\bf Computing in Statistics}, STAT:5400\\
Midterm 2, Fall 2015 \\
\end{center}
\normalsize

\vskip 0.2 in

Submit your answers in the ICON drop box as an .Rnw file and .pdf file 
produced using Sweave, with your name as author.  If you can't get your
.Rnw file to compile, submit it anyway and include your R output in a 
separate text file.

In your document, have a named section for each problem, and, where needed,
a numbered list of answers to multipart questions.  You don't have to type
any other text except where needed to answer a question.


\section{Simulation studies}

Given a random sample of $n$ observations $y_1, \ y_2, \ \ldots , y_n$, the
maximum likelihood estimator of the variance in the population from which the
sample was drawn is 
\[  m  \ = \  \frac{ \sum(y_i - \bar{y})^2 }{n} \]

This estimator is known to be biased.  The unbiased estimator is

\[  s^2  \ = \  \frac{ \sum(y_i - \bar{y})^2 }{n-1} \]

Carry out a simulation study to compare these two estimators with respect
to {\em mean squared error} for the case when 
\begin{itemize}
   \item the true population distribution is Gamma with shape parameter 8 
         and rate parameter 2
    \item the sample size is $n = 10$
\end{itemize}

Set your number of replicate datasets large enough that the standard error
of your mean-squared-errors estimates is no larger than 0.04.

<<>>=
set.seed(9)
n <- 10
S <- 1000
mygammas <- matrix( rgamma( n * S, 8,2), nrow=S)
ssq <- apply(mygammas,1,var)
mle <- ((n-1)/n) * ssq

truevar <- 8 / (2^2)

ssqse <-  (ssq - truevar )^2 

stderrssq <- sqrt( var(ssqse) / S )

mlese <- (mle - truevar)^2 

stderrmle <- sqrt( var(mlese) / S )

results <- rbind( c( mean( ssqse ), mean( mlese )),
           c(stderrssq, stderrmle) )

rownames(results) <- c( "Mean squared Errors","Standard Errors of MSE")
colnames(results) <- c( "s^2", "mle" )

print(results)
@

Standard errors are too big by a factor of about 2.5, so we
need to rerun with at least $2.5^2 = 6.25$ times as many replicate datasets.

<<>>=
set.seed(9)
n <- 10
S <- 6250
mygammas <- matrix( rgamma( n * S, 8,2), nrow=S)
ssq <- apply(mygammas,1,var)
mle <- ((n-1)/n) * ssq

truevar <- 8 / (2^2)

ssqse <-  (ssq - truevar )^2 

stderrssq <- sqrt( var(ssqse) / S )

mlese <- (mle - truevar)^2 

stderrmle <- sqrt( var(mlese) / S )

results <- rbind( c( mean( ssqse ), mean( mlese )),
           c(stderrssq, stderrmle) )

rownames(results) <- c( "Mean squared Errors","Standard Errors of MSE")
colnames(results) <- c( "s^2", "mle" )

print(results)
@

\section{Jackknife and Bootstrap}

Use your first simulated dataset from the previous problem as your data for
this problem.  The statistic of interest is the unbiased estimator of
variance ($s^2$ as defined above).

\begin{enumerate}
   \item Use the jackknife to estimate the bias in $s^2$ in your gamma
dataset of size 10.  You may either use the {\tt jackknife} function in
the {\tt bootstrap} package or use other code to implement the jackknife.
Show your R code and its output.
   \item Next, use the nonparametric bootstrap for the same purpose.  You
may either use the {\tt boot} function in the {\tt boot} package or other
code to implement the nonparametric bootstrap.  
Show your R code and its output.
   \item It is often said that the jackknife is an approximation to the
bootstrap.  Comment briefly on which procedure performs better in this
problem.
\end{enumerate}

<<>>=
library(boot)
library(bootstrap)
data <- mygammas[1,]

jackout <-  jackknife(data, var)
print(jackout)


myvar <- function(x,ind) {var(x[ind])}
bootout <- boot(data, myvar, R=1999)
print(bootout)

@

Since $s^2$ actually is an unbiased estimator, the jackknife is correct to
estimate the bias at 0.  In this case, jackknife performs better than bootstrap.
\section{Root-finding}

Consider the function
\[ f(x) \ = \ x^4 - 6 x^3 + 5 x^2 - x - 1 \]

\begin{enumerate}
   \item Plot this function on the interval (1,7).  Show your R code and
        the plot.
   \item Explain briefly why the bisection algorith could be used to
         find the root of this function on the interval (1,7).

$f$ is a continuous function. f(1) is negative and f(7) is positive.

   \item Find the root of this function on the interval (1,7) in a two-step
process:
          \begin{enumerate}
                 \item First, use the bisection algorithm to find an
           interval of width not greater than 0.75 that contains the root.
                 \item Then use {\tt uniroot} or any other optimization
                     function of your choice to find the root to within
                       $\pm 0.00001$.
         \end{enumerate}
\end{enumerate} 

<<echo = TRUE, fig = TRUE >>=
f <- function(x)  x^4 - 6 * x^3 + 5 * x^2 -x - 1 
print(f(1))
print(f(7))

x <- seq(1,7,by=0.1)
plot(x, f(x), type="l")

source("bisection.R")

bisection.out <- bisection( f, 1,7, 0.75, 15)

newinterval <- c( bisection.out$a, bisection.out$b)
print(newinterval)

print(uniroot(f, newinterval, tol=0.00001))
@

\end{document}