5 MLE Computation

In mathematical statistics we predominantly, if not completely, restrict ourselves to consider models where the estimators permit closed form expressions. This serves a purpose as it allows for the focus to be on the statistical principles involved, and prevents diverting attention to the computational aspects of calculating the MLE that arise in more realistic models. Also, such courses do not consider data that are truncated or censored; such data are in fact the more common ones in medicine/biostatistics and also in actuarial science. In this note I will expose you to some tools in R that may be useful in computing the MLE in such situations.

5.1 Gamma with unknown shape

This example deals with the Data Set B of the text which is supposed to represent twenty workers compensation medical benefit claims. The data is presented in the tables below - I demonstrate the use of two packages to nicely tabulate your data.

library("kableExtra")
med_claims <- c(27, 82, 115, 126, 155, 161, 243, 294, 340, 384, 457, 680, 855, 877, 974, 1193, 1340, 1884, 2558, 15743);
med_claims_df<-data.frame("Medical Claims"=med_claims <- c(27, 82, 115, 126, 155, 161, 243, 294, 340, 384, 457, 680, 855, 877, 974, 1193, 1340, 1884, 2558, 15743));
med_claims_df%>%
  kable() %>%
  kable_styling() %>%
  scroll_box(width = "100px", height = "200px")

Medical.Claims
27
82
115
126
155
161
243
294
340
384
457
680
855
877
974
1193
1340
1884
2558
15743

library(xtable);
tab<-xtable(t(med_claims_df),caption="Data Set B: Workers Compensation Medical Claims",table.placement="H",type="html",digits=c(0));
print(tab, type="html")

Data Set B: Workers Compensation Medical Claims
	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
Medical.Claims	27	82	115	126	155	161	243	294	340	384	457	680	855	877	974	1193	1340	1884	2558	15743

The gamma family of distributions is chosen as a model for this data, and the goal is to estimate the unknown shape and scale parameters. Just to fix the parametrization, the density is given by \[ f_{\alpha,\theta}(x)=\frac{1}{\Gamma(\alpha)\theta^\alpha} e^{-(x/\theta)} x^{\alpha-1 },\quad x\geq 0; \] the parameter space is given by $(0,\infty)^2$. We note that, the log-likelihood is given by, \[ l(\alpha,\theta):=-n\log\Gamma(\alpha) - n\alpha\log\theta - \frac{1}{\theta}\sum_{i=1}^n x_i +(\alpha-1)\sum_{i=1}^n \log x_i. \] Note that \[ \frac{\partial }{\partial \theta}l(\alpha,\theta) =-\frac{n\alpha}{\theta} +\frac{1}{\theta^2}\sum_{i=1}^n x_i, \] and equating it to zero yields, \[ \hat{\theta}_{MLE}= \frac{\bar{x}}{\hat{\alpha}_{MLE}}. \] This is so as \[ \frac{\partial^2 }{\partial \theta^2}l(\alpha,\theta)\Bigg\vert_{(\alpha=\hat{\alpha}_{MLE},\theta=\hat{\theta}_{MLE})} =\frac{n\hat{\alpha}_{MLE}}{\hat{\theta}_{MLE}^2} -\frac{2}{\hat{\theta}_{MLE}^3}\sum_{i=1}^n x_i=\frac{n\hat{\alpha}_{MLE}}{\hat{\theta}_{MLE}^3} \left(\hat{\theta}_{MLE}-\frac{2\bar{x}}{\hat{\alpha}_{MLE}}\right) = -\frac{n\hat{\alpha}_{MLE}}{\hat{\theta}_{MLE}^2} <0. \]

While we have an expression for $\hat{\theta}_{MLE}$, it involves an unknown - $\hat{\alpha}_{MLE}$. The nice thing about it is that it still helps reduce the optimization problem from one on $\mathbb{R}_+^2$ to $\mathbb{R}_+$. This is so because we reduce the optimization problem to that of maximizing \[ \begin{aligned} l(\alpha,\bar{x}/\alpha)&:=-n\log\Gamma(\alpha) - n\alpha\log\frac{\bar{x}}{\alpha} - \frac{\alpha}{\bar{x}}\sum_{i=1}^n x_i +(\alpha-1)\sum_{i=1}^n \log x_i\\ &=-n\log\Gamma(\alpha) - n\alpha\log{\bar{x}}+ n\alpha\log{\alpha} - n\alpha + (\alpha-1)\sum_{i=1}^n \log x_i. \end{aligned} \]

This can be done by using the optimize function as follows:

# Sufficient Statistics
  xbar<-mean(med_claims);
  sum_logx<-sum(log(med_claims));
  n<-length(med_claims);
# Loglikelihood
  like<-function(a){
    -n*lgamma(a)-n*a*(log(xbar/a)+1)+(a-1)*sum_logx
  }
opt<-optimize(like,c(0.1,4),maximum=TRUE);
# MLE for alpha, theta and the value of the loglikelihood - See Example 13.4 in the text  
round(c(opt$maximum,xbar/opt$maximum,opt$objective),6)

## [1]    0.556165 2561.111354 -162.293403

Good to check by plotting the log-likelihood with $\theta=\bar{x}/\alpha$:

$The log-likelihood with $\theta=\bar{x}/\alpha$$

The log-likelihood with $\theta=\bar{x}/\alpha$

plot(a<-(1:200)/100,sapply(a,like),type="l",ylab=expression(l(alpha,bar(x)/alpha)),xlab=expression(paste(alpha)),xaxt="n");
axis(1,at=round(c(0,opt$maximum,1,1.5,2),3))
lines(rep(opt$maximum,201),-160-40*(0:200)/100,col=2)

An alternate method is to find the root of \[ \begin{aligned} \frac{\partial}{\partial\alpha}l(\alpha,\bar{x}/\alpha)&= -n\frac{\Gamma'(\alpha)}{\Gamma(\alpha)} - n\log{\bar{x}}+ n\log{\alpha} + \sum_{i=1}^n \log x_i\\ &=n\left(-\frac{\Gamma'(\alpha)}{\Gamma(\alpha)} - \log{\bar{x}}+ \log{\alpha} + \frac{1}{n}\sum_{i=1}^n \log x_i\right) \end{aligned} \] Some comments on the expression within the brackets:

Note that the expression is scale invariant i.e. we can wolog assume that $\theta=1$ (Why?). Now observe that for a $Gamma(\alpha,1)$ random variable $X$,

\[ \mathbb{E}\left({log(X)}\right)=\frac{\Gamma'(\alpha)}{\Gamma(\alpha)},\quad \hbox{and} \quad \mathbb{E}\left({X}\right)=\alpha. \]

The above comment implies that we seek a root of (with both sides being non-negative by Jensen’s inequality)

\[ \log{\mathbb{E}_\alpha\left({X}\right)} -\mathbb{E}_\alpha\left({log(X)}\right) = \left(\log{\bar{x}} - \frac{1}{n}\sum_{i=1}^n \log x_i \right) \]

Finally we note that by Jensen’s inequality (as $log$ is a concave function) both sides of the above equation are non-negative. We below use these comments to implement another algorithm to find $\hat{\alpha}_{MLE}$ - this uses the digamma function.

# The rhs
  rhs<-log(xbar)-sum_logx/n;
# The lhs - using the digamma function
  like_der<-function(a){
    log(a)-digamma(a)-rhs
  }
amle<-uniroot(like_der,interval=c(0.01,4))$root
# MLE for alpha, theta
round(c(amle,xbar/amle),6)

## [1]    0.556173 2561.075759

Good to check by plotting the LHS:

$\log{\mathbb{E}_\alpha\left({X}\right)} -\mathbb{E}_\alpha\left({\log(X)}\right)$

plot(a<-(10:300)/100,sapply(a,like_der)+rhs,type="l",xlab=expression(paste(alpha)),ylab="",yaxt="n",xaxt="n")
#plot(a<-(1:200)/100,sapply(a,like),type="l",xlab=expression(paste(alpha)),xaxt="n");
axis(2,at=round(c(rhs,3,6,8),2))
lines((1:300)/100,rep(rhs,300),col=2)
lines(rep(amle,9),0:8,col=3)
axis(1,at=round(c(0,opt$maximum,1.5,3),3))

5.2 Grouped Data

In this example we will look at the analysis of Data Set C in the text which represents 227 payments from a general liability insurance policy block. The data set is given below (see Example 13.5 of the text).

library("kableExtra")
genliab_df<-data.frame("Lower End Point"=c(0,7500,17500,32500,67500,125000,300000),"Upper End Point"=c(7500,17500,32500,67500,125000,300000,Inf), "Payments"=c(99,42,29,28,17,9,3));
genliab_df%>%
  kable(caption="General Liability Payments - Data Set C") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),full_width=F,position="float_right")

Table 5.1: General Liability Payments - Data Set C

Lower.End.Point	Upper.End.Point	Payments
0	7500	99
7500	17500	42
17500	32500	29
32500	67500	28
67500	125000	17
125000	300000	9
300000	Inf	3

This data set is an example of a grouped data set. We fit an exponential model to this. First, we realize that since the exponential is a scale family, we can change the unit to $\$10,000$, estimate the scale parameter $\theta$ and scale it back by a $10,000$. To find the MLE we need the likelihood - we write a function and plot it below.

loglike<-function(theta){
  sum(genliab_df$Payments*log(exp(-genliab_df$Lower.End.Point/(theta*10000))-exp(-genliab_df$Upper.End.Point/(theta*10000))))
}
# Calculate some rough estimate for the range of optimization
sum((genliab_df$Lower.End.Point+genliab_df$Upper.End.Point)[-7]/20000*genliab_df$Payments[-7])/224;

## [1] 2.933036

c("MLE of theta",round(mletheta<-optimize(loglike,c(0.5,5),maximum=TRUE)$maximum*10000,2))

## [1] "MLE of theta" "29720.74"

Good to check by plotting the log-likelihood:

Exp. loglikelihood - Dataset C

plot(a<-(100:400)/100,sapply(a,loglike),type="l",ylab="Log Likelihood",xlab=expression(paste(theta/10000)),xaxt="n");
axis(1,at=round(c(0,1,2,mletheta/10000,4),3))
lines(rep(mletheta/10000,11),-1000*(0:10),col=2)