MAXIMUM LIKELIHOOD FITTING of MULTINOMIAL-POISSON HOMOGENEOUS (MPH) MODELS for CONTINGENCY TABLES using MPH.FIT

Author:   Joseph B. Lang,   Department of Statistics and Actuarial Science, University of Iowa, Iowa City, IA 52242 USA   (September 27, 2002, updated April 15, 2005, February 7, 2007)

Document Description:   Herein, you will find supporting documentation for the R program mph.fit.  The program mph.fit computes maximum likelihood estimates and model assessment statistics for the broad class of   multinomial-Poisson homogeneous (MPH) contingency table models.   This document includes, or provides links to, the following information:

 primary references intro to MPH models numerical examples software availability implementation instructions program description

PRIMARY REFERENCES:

Lang, J.B (2004).  "Multinomial-Poisson Homogeneous Models for Contingency Tables,"   Annals of Statistics, 32, 340-383 .

Lang, J.B. (2005). "Homogeneous Linear Predictor Models for Contingency Tables,"   JASA, 100, 121-134

Lang, J.B. (2002, 2007). "Maximum Likelihood Fitting of Multinomial-Poisson Homogeneous (MPH) Models for Contingency Tables using  MPH.FIT."  online html document.  [This document.]

Lang, J.B. (2002, 2007).  "Multinomial-Poisson Homogeneous and Homogeneous Linear Predictor Models: A Brief Description,"  online html document.

Lang, J.B. (2002, 2007). "ML Fitting of MPH Models using MPH.FIT: Numerical Examples."  online html document.

INTRODUCTION TO MPH MODELS:

Definition of an MPH Model:  Let y  and m be  the vectors of observed and expected counts in a contingency table.  Suppose that

(1) y is a realization of a random vector Y, which comprises independent Poisson components and/or independent  multinomial components,      and

(2) the expected counts satisfy the constraints in h(m) = 0, where h(.) is sufficiently smooth and homogeneous relative to the sampling plan.

Then the corresponding contingency table model is an MPH model.

To better understand the syntax used in the mph.fit program, the reader is urged to  read through (at least one of) the primary reference documents cited above.   In particular, the fourth cited reference gives a brief description of the necessary concepts.  You will find, among other things, a discussion of the sampling plan triple (Z, ZF, n) and the definition of the data cell probabilities p=p(m).   You also will find a formal definition of "homogeneous relative to the sampling plan."

Examples of MPH models:

Homogeneous Linear Predictor (HLP)  models of the form L(m) = Xb.   [Note: HLP models can be re-expressed as h(m) = UTL(m) = 0, where U is an orthogonal complement of X.]

Generalized loglinear models of the form ClogMm = Xb are typically homogeneous linear predictor models. (Includes loglinear, logit, multivariate logit  models, etc.)

Linear probability models, such as mean response models.

Marginal cumulative probit and logit models.

Gini trend and Kappa trend  (and other trend) models.

Zero-order linear predictor models of Grizzle, Starmer, and Koch (Biometrics69).  These have the form F(p) = Xb, where p = p(m) is the vector of cell probabilities and F is some sufficiently smooth response function.

Etc....

Homogeneous Constraint Models of the form h(m) = 0.

p1 = p2 = p3,   where pi's are components of  p = p(m), the vector of cell probabilities.

p1+ = p+1,  where pij's are the components of p = p(m), the vector of two-way cell probabilities.   [Note: in a 2x2 table, this constraint is equivalent to marginal homogeneity.]

AUC(m) - const = 0,  where AUC(m) = P(B > A) + 0.5*P(B=A)  can be interpreted as the 'area under an ROC curve.'    Here A and B are categorical responses.

orij = 1,   where the  orij = orij(m) are the local odds ratios.

Etc...

EXAMPLE  1. Mean Response Model (colds in children)
EXAMPLE  2. Dispersion Linear Trend Model (Danish marital status)
EXAMPLE  3. Marginal Homogeneity Model (eye grade)
EXAMPLE  4. Loglinear Models under Different Sampling Plans (bike helmet use)
EXAMPLE  5. Loglinear Models of Independence (bike helmet use)
EXAMPLE  6. Logit Models under Different Sampling Plans (bike helmet use)
EXAMPLE  7. Sparse Table Example (fine tuning the fitting algorithm)
EXAMPLE  8. Given Marginal Distributions and Odds Ratios, Compute Joint Distribution in a Two-Way Table.
EXAMPLE  9. Model Constraint:  ROC Area Equal to a Constant
EXAMPLE 10. Marginal Homogeneity of Multidimensional Contingency Tables
EXAMPLE 11. Contingency Tables with Given Marginals.
EXAMPLE 12. Testing for (Conditional) Pairwise Independence
EXAMPLE 13. Kappa Regression using Direct and Indirect Smoothing Models.
EXAMPLE 14. Marginal Cumulative Probit Model (in Conjunction with Loglinear Association Model)
EXAMPLE 15. Marginal Cumulative Logit and Linear-by-Linear Loglinear Association Model (Generalized Loglinear and Indirect Smoothing Model Specification)

SOFTWARE AVAILABILITY:

To receive a free copy of the R programs referred to in this document, contact me (Joseph B. Lang) at joseph-lang@uiowa.edu.   [Note:  The statistical package R is freeware that is available at http://cran.r-project.org; its syntax is very similar to that of the commercial software S-plus.]

IMPLEMENTATION of  mph.fit and  accompanying R programs

You should receive two files, mph.Rcode.txt and mph.supp.Rcode.txt The files can be directly sourced into R,  e.g. source("mph.Rcode.txt")  and source("mph.supp.Rcode.txt")

The file mph.Rcode.txt contains the programs mph.fit, mph.summary, num.deriv.fct, and create.U. The main ML fitting program mph.fit uses num.deriv.fct and create.U The program mph.summary is used to create   a useful summary of the object created by mph.fit.

The file mph.supp.Rcode.txt is not a necessary component for the full implementation of mph.fit, but it does include some programs that are useful for MPH modeling.  These programs are block.fct, create.C.LOR, Marg.fct, pop, and rmult.

PROGRAM DESCRIPTION:

The following lines are the comments found at the top of the file mph.Rcode.txt These comments currently serve as user documentation. See also the primary references above.

```#
# Sourcing this R file (> source("mph.Rcode.txt") ) results in the creation
# of the following four functions:
#
#        mph.fit,   mph.summary,   num.deriv.fct,   create.U
#
######################   Begin mph.fit ##########################################
mph.fit  <- function(y, Z=matrix(1,length(y),1), ZF=Z, h.fct=NULL,derht.fct=NULL,
L.fct=NULL,derLt.fct=NULL,X=NULL,maxiter=100,step=1,
norm.diff.conv=1e-5,norm.score.conv=1e-5,
y.eps=0,iter.orig=5, m.initial=y)
{
# mph.fit, version 2.0
# Originally Created: August 25, 2000 (last revised:  2/7/07)
#
# Author:   Joseph B. Lang,
#           Dept. of Statistics and Actuarial Science
#           Univ. of Iowa, Iowa City, IA 52242
#           joseph-lang@uiowa.edu
#
# References:  Lang, J.B. (2004).  "Multinomial-Poisson Homogeneous
#                 Models for Contingency Tables," Annals of Statistics,
#                 32, 340-383.
#
#              Lang, J.B. (2005). "Homogeneous Linear Predictor Models
#                for Contingency Tables,"  JASA, 100, 121-134
#
# DESCRIPTION:
#   This program computes ML estimates, standard errors, and fit
#   statistics for contingency table models of the general form
#       (1) Systematic Component:     h(m) = 0,
#   where m is the vector of expected counts and h is any
#   sufficiently smooth (continuous second derivatives)
#   constraint function that satisfies a non-restrictive
#   homogeneity condition.
#     The ML estimates are based on the observed contingency
#   table counts y,  where y can be specified as coming from
#   a wide variety of sampling models; in particular, y can
#   be a realization of any MP (multinomial-Poisson)
#   random vector that is characterized by a population matrix
#   Z and a sampling constraint matrix ZF.  The matrix Z specifies
#   the stratification sampling plan and ZF specifies which strata
#   sample sizes are fixed. Note that ZF'y = n gives the vector of
#   fixed sample sizes. (Non-fixed sample sizes are assumed
#   to be realizations of Poisson random variables.)  We write
#      (2) Random Component:  y <- Y ~ MP_Z(m|ZF,n).
#
#   In sum,  `mph.fit'  computes ML estimates, standard errors, and fit
#   statistics for models of the form
#
#      (1)  h(m) = 0
#      (2)  y <- Y ~ MP_Z(m|ZF,n),
#
#   where h is any sufficiently smooth constraint function that
#   satisfies a non-restrictive homogeneity condition.  These
#   models are called "multinomial-Poisson homogeneous (MPH)" models.
#   See Lang (Annals04 or JASA05) for details.
#
#   Multinomial-Poisson Homogeneous Linear Predictor (HLP)
#   Model Fitting...
#
#   An important subclass of MPH models is the class of multinomial-
#   Poisson homogeneous linear predictor (MPHLP or simply HLP) models,
#   which have systematic components of the form L(m) = X beta, or
#   equivalently, h(m) = U'L(m) = 0,  where U is an orthogonal
#   complement of X.  The link L and the design matrix X of an HLP model
#   must satisfy the following non-restrictive conditions (see Lang JASA05)
#
#      (1) h(m) = U'L(m) is sufficiently smooth and homogeneous
#      (2) L(m) = a(g) + L(p), where g = Z'm,  p = D^{-1}(ZZ'm)m and
#      (3) a(g1)-a(g2) = a(g1/g2) - a(1).
#
#   HLP Model Remarks:
#           1.  The vector  p = D^{-1}(ZZ'm)m  contains cell probabilities
#               as determined by the stratification plan.
#
#           2.  The link function L(.) is generally many-to-one!  This
#               implies that the class of HLP models is very broad.
#
#           3.  Most reasonable generalized loglinear models, which
#               have the form L(m) = C log Mm  = X beta, satisfy
#               conditions (1)-(3) and hence are HLP models.
#               As an example, loglinear models log m = X beta are
#               HLP models if the range space of X contains the
#               range space of Z.
#
#           4.  Any sufficiently smooth link L(.) that is only a function
#               of m through p, say L(m) = F(p), satisfies conditions (1),
#               (2), and (3).    This class of models, L(m) = F(p) = X beta,
#               was considered in Grizzle, Starmer, and Koch (1969),
#               for product-multinomial sampling.
#
#               When F(p) is many-to-one, these models have typically been
#               fitted using WLS.  With `mph.fit' these models can be
#               easily fitted via ML.
#
#           5.  `mph.fit'  allows for special handling of HLP model
#               fitting.  The user need only specify the link function L
#               and the design matrix X; the corresponding constraint
#               function h is created within the program and need not
#               be created by the user.
#
#
# Subroutines:    `mph.fit' uses two other short subroutines:
#                 `create.U' and `num.deriv.fct'.
#
# Complementary Function:  The function `mph.summary' can be used in
#                          conjunction with `mph.fit'.  It computes
#                          and prints a collection of summary statistics
#                          of the fitted MPH model given in `mph.out',
#                          which is the result of `mph.fit'.  That is,
#                          mph.out <- mph.fit(y,Z,ZF,...)
#
# Computational Note:
#        For computational efficiency (in particular, to speed up
#        multiplications involving diagonal matrices), the expression
#        (A*c(x)) is used in place of diag(c(x))%*%A
#
#             1. "<-" replaces "_" to reflect changes in R, version 1.8.1
#             2. Input variable  "chscore.criterion" added.  This variable
#                is used to restrict the updating move so that the
#                norm of the score changes by no more than a specified
#                amount.  This restriction is used to fine tune the
#                iterative algorithm so that convergence is more likely.
#        Version 1.3 Update:
#             1. Allows user to input initial estimates of m.  See input
#                variate m.initial.
#             2. 1.3.1.  Fixed bug regarding warning message about original
#                        counts NOT being used...
#                   if ((iter <  iter.orig) & (y.eps != 0)) changed to
#                   if ((iter <= iter.orig) & (y.eps != 0))
#        Version 2.0 Update:
#             1. Z = matrix(1,length(y),1) is now the default; i.e. the
#                         "single strata setting" is the default.
#             2. Iterations are fine-tuned a bit so the algorithm is
#                    more robust, especially in sparse count settings.
#                    The 'chscore.criterion' has been replaced.
#             3. h.fct, L.fct, and X now have NULL as the default value.
#                    is.null checks are now made in the code.
#
# INPUT
#   Required Input:
#            y = vector of table counts
#
#   Optional Input:
#            Z     = population matrix describing the stratification
#                    scheme for sampling.
#                    Default:  Z = matrix(1,length(y),1),  i.e.
#                     "single strata setting."
#            ZF    = sampling constraint matrix tells which of the strata
#                    sample sizes are fixed.  Note: Non-fixed sample sizes are
#                    assumed to be realizations of Poisson random variables.
#                    (default: ZF=Z, i.e. "product-multinomial" sampling)
#            h.fct = constraint function, e.g. h.fct <- function(m) {m-m}
#                    If h.fct is not equal to the constant 0 (the default), it
#                    must be a function of the single variable m, and it
#                    must return a column vector.
#                    (default: If L.fct is not specified and h.fct=NULL then
#                              h.fct is set equal to 0; i.e. no constraints are used.)
#            derht.fct = function that computes analytic derivative of
#                        the transpose of the constraint function h(.) with
#                        respect to m.
#                        If h(.) maps from Rp to Rq (i.e. there are q
#                        constraints), then derht.fct returns a
#                        pxq matrix of partial derivatives.
#                        (default: derht.fct=NULL.  This means that the
#                        derivative is calculated numerically.)
#            L.fct = HLP model  link function
#                    e.g. L.fct <- function(m) {log(m)}
#                    L.fct must be a function of the single variable m
#                    and must return a column vector  (default: L.fct=NULL; i.e.
#                    link functioni L is not specified or used.)
#            derLt.fct = function that computes analytic derivative of
#                        the transpose of the link function L(.) with
#                        respect to m.
#                        If L(.) maps from Rp to Rq (i.e. there are q
#                        link components), then derLt.fct returns a
#                        pxq matrix of partial derivatives.
#                        (default: derLt.fct =NULL, i.e. by default this
#                        derivative is calculated numerically.)
#            X     = HLP model design matrix (default: X = NULL; i.e. it is
#                        left unspecified and unused.)
#            maxiter = maximum number of iterations (default: maxiter=100)
#            step  = step-size value  (default: step=1)
#            norm.diff.conv  = convergence criteria value; see norm.diff
#                              in the Output section
#                              (default: norm.diff.conv=1e-5)
#            norm.score.conv = convergence criteria value; see norm.score in
#                              the Output section
#                              (default: norm.score.conv = 1e-5)
#            y.eps  = non-negative constant to be temporarily added
#                     to the original counts in y  (default: y.eps=0)
#            iter.orig = iteration at which the original counts will
#                        be used (default: iter.orig=5).
#            chscore.criterion =  the maximum multiplicative change
#                     allowed for ratio
#                         norm.score.new/norm.score.old
#                     in the iterative updating.
#                     (default: chscore.criterion = 2)
#            m.initial = initial estimate of m (default: m.initial=y)
#                     See Input Note 7 below.
#
#    Input Notes:
#                1. POPULATION MATRIX.
#                   The population matrix Z comprises `0's and `1's and
#                   is of dimension cxK, where c is the number of counts
#                   in y and K is the number of strata or populations.
#                   Thus, the rows correspond to observation numbers and
#                   the columns correspond to the strata.  A `1' in
#                   row i of column j implies that the ith count came
#                   from the jth stratum.  Note that Z will have exactly
#                   one `1' in each row, and at least one `1' in each
#                   column.  The population matrix Z = matrix(1,length(y),1),
#                   which is a column vector of `1's, implies that all
#                   the counts came from the same, and only, stratum.
#
#                2. SAMPLING CONSTRAINT MATRIX.
#                   The sampling constraint matrix ZF is of dimension
#                   cxf, where f <= K, or ZF=0.  For non-zero ZF, the columns
#                   are a subset of the columns in population matrix Z.
#                   If the jth column of Z is included in ZF, then the
#                   jth stratum sample size is considered fixed, otherwise,
#                   if the jth column of Z is NOT included in ZF, the
#                   jth stratum sample size is taken to be a realization
#                   of a Poisson random variable.  When ZF=0, all of the
#                   stratum sample sizes are taken to be realizations of
#                   Poisson random variables.  The default, ZF=Z, means
#                   that all of the stratum sample sizes are fixed; this
#                   is the (product-)multinomial setting.  Note that
#                   ZF'y = n is the vector of fixed sample sizes.
#
#                3. MODEL WITH NO CONSTRAINTS.
#                   The model with no constraints can be specifed using
#                   h.fct = 0; i.e. h.fct is not of type "function" at all,
#                   it is equal to the constant 0.  The no-constraint
#                   model is the default when h.fct=L.fct=NULL.
#
#                4. HLP MODEL SPECIFICATION.
#                   For HLP models, both L.fct and X must be specified.
#                   The constraint function h.fct is typically left
#                   unspecified for HLP models.  If h.fct is left
#                   unspecified it is created within the program as
#                   h(m) = U'L(m), where U is an orthogonal
#                   complement of X.  If h.fct is specified, however,
#                   it overrides the constraint h(m) = U'L(m) = 0.
#
#                5. EXTENDED ML ESTIMATES.
#                   When ML estimates are non-existent owing to
#                   zero counts, the algorithm will not converge;
#                   in particular, norm.diff will not converge to
#                   zero, instead it tends to level off at some constant
#                   positive value.  This is because at least one ML
#                   fitted value is 0, which on the log scale is
#                   log(0)=-Inf, and the log-scale iterates slowly
#                   move toward -Inf.  One solution to this problem is
#                   to set the convergence value norm.diff.conv
#                   to some large number so only the score convergence
#                   criterion is used.  In this case, the algorithm
#                   often converges to a solution that can be viewed
#                   as an extended ML estimate, for which 0 estimates
#                   are allowed.
#
#                6. CONVERGENCE PROBLEMS / FINE TUNING ITERATIONS.
#                   When ML estimates exist, but there are
#                   non-convergence problems (perhaps caused by zero
#                   counts), a modification to the tuning
#                   parameters `step', `y.eps',
#                   and/or `iter.orig' will often lead to convergence.
#
#                   With zero counts, it might help to set
#                   y.eps = 0.1 and iter.orig=5 (the default). This tells
#                   the program to initially use y+y.eps rather than
#                   the original counts y.  At iteration 5=iter.orig,
#                   after the algorithm has had time to move toward a
#                   non-boundary solution, the original counts are again
#                   used.
#
#                   To further mitigate non-convergence problems,
#                   the parameter `step' can be set to a smaller value
#                   (default: step=1) so the iterates do not change as much.
#
#                7. The initial estimate of m is actually
#                      m.initial + y.eps + 0.01*((m.initial+y.eps)==0)
#                   The program defaults are m.initial=y and y.eps=0.
#                   Note: If m.initial > 0 and y.eps=0, then the
#                         initial estimate of m is simply m.initial.
#
#
# OUTPUT
#            y         = vector of counts used in the algorithm
#                        for ML estimation.  Usually, this vector
#                        is identical to the observed table counts
#            m         = vector of ML fitted values
#            covm      = approximate covariance matrix of fitted values
#            p         = vector of cell probability ML estimates
#            covp      = approximate covariance matrix of cell probability estimators
#            lambda    = vector of Lagrange multiplier ML estimates
#            covlambda = approximate covariance matrix of multiplier estimators
#            resid     = vector of raw residuals (i.e. observed minus fitted counts)
#            presid    = vector of Pearson residuals
#            covresid  = approximate covariance matrix of raw residuals
#            Gsq       = likelihood ratio stat for testing Ho: h(m)=0 vs. Ha:not Ho
#            Xsq       = Pearson's score stat (same as Lagrange multiplier stat)
#                        for testing Ho: h(m)=0 vs. Ha: not H0.
#            Wsq       = Generalized Wald stat for testing Ho: h(m)=0 vs.
#                        Ha: not H0.
#            df        = degrees of freedom associated with Gsq and Xsq (df= dim(h))
#            beta      = vector of HLP model  parameter ML estimates
#            covbeta   = approximate covariance matrix of HLP
#                        model parameter estimators
#            L         = vector of HLP model link ML estimates
#            Lobs      = vector of HLP model observed link values, L(y).
#            covL      = approximate covariance matrix of HLP model
#                        (L(obs) – L(fitted))/ase(L(obs)-L(fitted)).
#            iter      = number of update iterations performed
#            norm.diff = distance between last and second last `theta' iterates
#                        (`theta' is the vector of log fitted values and
#                        Lagrange multipliers)
#            norm.score= distance between the score vector and zero
#            h.fct     = constraint function used in algorithm
#            derht.fct = analytic function used in algorithm that computes
#                        derivative of transpose of constraint function h
#            L.fct     = link function used in algorithm
#            derLt.fct = analytic function used in algorithm that computes
#                        derivative of transpose of link function L
#            X         = HLP model design matrix used in algorithm
#            U         = orthogonal complement of design matrix X
#            Z         = population matrix used in algorithm
#            ZF        = sampling constraint matrix used in algorithm
#            version   = `mph.fit' version number
#            warn.message = message stating whether or not the original
#                           counts were used.
#
#    Output Notes
#                 1.  ITERATION HISTORY.
#                     An iteration history is printed out by default.  A single
#                     line of the history looks like the following:
#                     "iter= 18 norm.diff= 3.574936e-08 norm.score= 1.901705e-15"
#                     Here, iter is the number of update iterations performed.
#                     The number in [] gives the number of step size changes
#                     required owing to the chscore.criterion.
#                     norm.diff and norm.score are defined above.
#                     Finally, the time elapsed is output.
#                     Note:  For the model with no restrictions (h.fct <- 0)
#                     the chscore.criterion is not used, so there
#                     are no step size changes.
#
#                 2.  STORING and VIEWING RESULTS.
#                     To store the results of mph.fit, issue a command
#                     like the following example
#                        >  results <-  mph.fit(y,Z,ZF,h.fct=h.fct)
#                     Use program "mph.summary"  to view the results
#                     of mph.fit.   Specifically, if the results of mph.fit
#                     are saved in object "results",  submit the command
#                     mph.summary(results)  [or mph.summary(results,T) or
#                     mph.summary(results,T,T) depending on how much of
#                     the output you need to see.]
#
#                 3.  The output objects beta, covbeta, L, covL, and
#                     Lresid will be set to "NA" unless an HLP model is
#                     specified (i.e. L.fct and X are input)
#```
` `

Please do not distribute this documentation or the R program files without express permission of author, Joseph B. Lang  ( joseph-lang@uiowa.edu)

Page Last Updated: 02/07/2007 11:58 PM -0600,  Joseph B. Lang