Multinomial-Poisson Homogeneous (MPH) and Homogeneous Linear Predictor (HLP) Models: A Brief Description

Author Information:  Joseph B. Lang, Department of Statistics and Actuarial Science, University of Iowa, Iowa City, IA 52242  USA   < joseph-lang@uiowa.edu,   http://www.stat.uiowa.edu/~jblang

Citation Information:  Lang, J.B.  "Multinomial-Poisson Homogeneous (MPH) and Homogeneous Linear Predictor (HLP) Models: A Brief Description,"  Online HTML Document, 02/07/2007 11:59 PM -0600, <http://www.stat.uiowa.edu/~jblang/mph.fitting/mph.hlp.description.htm>.

Primary References:

Lang, J.B (2004).  "Multinomial-Poisson Homogeneous Models for Contingency Tables,"   Annals of Statistics, 32, 340-383 .

Lang, J.B. (2005). "Homogeneous Linear Predictor Models for Contingency Tables,"   JASA, 100, 121-134

Lang, J.B. (2002, 2007). "Maximum Likelihood Fitting of Multinomial-Poisson Homogeneous (MPH) Models for Contingency Tables using  MPH.FIT."  online html document.  

Lang, J.B. (2002, 2007).  "Multinomial-Poisson Homogeneous and Homogeneous Linear Predictor Models: A Brief Description,"  online html document.   [This document.]

Lang, J.B. (2002, 2007). "ML Fitting of MPH Models using MPH.FIT: Numerical Examples."  online html document.


An MPH model is characterized by an independent sampling plan triple (Z,ZF,n) and a sufficiently smooth constraint h(m) = 0, where m is the vector of expected table counts and h(.) is homogeneous relative to the sampling plan.  [Remark: "sufficiently smooth" means second order derivatives are continuous.]

The independent sampling plan (Z, ZF, n) generates sufficient cell counts Y = (Y1, Y2, ..., Yc),  where Yi = # of outcomes of type i,  i=1,2,...,c,    in the following way:

Description of the Sampling Plan Triple (Z, ZF, n)

Z is a cxK population matrix with  (ith row, kth column) elements  Zik that satisfy (1)   Zik  in {0,1}.   Each of the K columns corresponds to a stratum (aka population) and each of the c rows corresponds to an outcome type.   

If Zik = 1 then Yi = # of type i outcomes in a random sample of size Nk  from stratum k. 

If Zik = 0 then there are no type i outcomes in stratum k.

It is assumed that  each outcome type occurs in one and only one stratum and each stratum has at least one outcome type.  These assumptions are   equivalent to assuming that (2) Zi+  = 1 and (3) Z+k  >= 1. They imply that the components in Y are defined simply as Yi = #(type i outcomes).   [Remark: Any matrix Z that satifies properties (1),(2), and (3) is called a population matrix.]

ZF is a cxf (f <= K)  sampling constraint matrix, which comprises columns of Z or is a zero matrix. 

If the kth column in Z is included in ZF then the sample size Nk = nk, with probability one,

If the kth column in Z is NOT included in ZF then  the sample size  Nk ~ Poisson(δk). 

The collection of  f  fixed sample sizes is denoted n, and the collection of K-f unknown expected sample sizes is denoted δ.  It will be useful to denote the entire collection of K expected sample sizes by γ = E(N) = E(ZTY).  [Note that ZFTY = n with probability one.]

Data Cell Probabilities.  Let pi = P(type i outcome from stratum ki),  where the ki th stratum is the one that includes outcomes of  type i.   Note that pi is a  conditional probability--given you sample from stratum ki,    the chance of a type i outcome is pi.    Note that the vector of conditional probabilities p satisfies the constraint ZTp = 1.  The definition of the so-called data cell probabilities  p depends on the sampling plan.  To see this more explicitly, define the unconditional outcome probabilities as Pi = P(type i outcome).  Then it follows that the data cell probabilities and the unconditional outcome probabilities are related according to     p = D-1(ZZTP)P.

Multinomial-Poisson Random Vectors

Assuming that the K random samples are independent and the sample sizes Nk are independent of the outcome types, it follows that Y comprises independent Poisson and/or multinomial random vectors, with parameters (p,γ).  It can be shown that the vector of counts Y is sufficient for the data parameters p and γ.

We say that Y has a multinomial-Poisson distribution generated by sampling plan (Z,ZF,n),  with expected sample sizes γ and data cell probabilities p.  We write Y ~ MP*Z(γ,p|ZF,n).

The Expected Count Parameterization

It is straightforward to see that the distribution of Y can be re-parameterized in terms of  the expected counts m=E(Y)=D(Zγ)p.  In fact,   the mapping

R(γ,p) = D(Zγ)p = m is one-to-one from {(γ, p): γ > 0, p > 0,  ZTp = 1}  onto {m: m > 0},

with inverse defined as  R-1(m) =   (ZTm,  D-1(ZZTm)m)  = (γ, p).

When using the expected count parameterization, we write Y ~ MPZ(m|ZF,n).

Reminder:  m  and (p, γ)  are functionally related.    For example, when we refer to p, we implicitly refer to m through p = p(m) = D-1(ZZTm)m and when we refer to m, we implicitly refer to (p,γ) through m = m(p,γ) = D(Zγ)p.

Examples of MP Distributions.

Full-Multinomial.  Suppose a random sample of fixed size 50 is taken from a single population, and suppose that there are four possible outcome types.  The four counts in Y = (Y1, Y2, Y3, Y4) have the following distribution:

Y ~ MP*Z(γ, p| ZF, n) ~ MPZ(m| ZF, n),     where

Z = ZF = [1,1,1,1]T,   n = 50,  γ = n,  pi = P(type i outcome), mi = npi.

[More commonly,  Y ~ mult(n=50, p1, p2, p3, p4).]

Full-Poisson.  Suppose a random sample of random size N~Poisson(δ) is taken from a single population, and suppose that there are four possible outcome types.  The four counts in Y = (Y1, Y2, Y3, Y4) have the following distribution:

Y ~ MP*Z(γ, p| ZF, n) ~ MPZ(m| ZF, n),     where

Z =  [1,1,1,1]T,   ZF = 0,  n = 0,  γ = δ,   pi = P(type i outcome),  mi = δpi.

[More commonly, Yi indep Poisson(mi=δpi), i=1,2,3,4.]

Product-Multinomial. Suppose two independent random samples of fixed sizes 30 and 20 are taken from two populations, and suppose that  population 1 has outcome types "(1,1)" and "(1,2)" and population 2 has outcome types "(2,1)" and "(2,2)"   for a total of four distinct outcome types.  The four counts in Y = (Y11, Y12, Y21, Y22) have the following distribution:

Y ~ MP*Z(γ, p| ZF, n) ~ MPZ(m| ZF, n),     where

Z =  [1,1,0,0 / 0,0,1,1]T,   ZF = Z,   n = (n1, n2) =  (30,20),  γ = n,  pij = P(type (i,j) outcome from population i),   mij = nipij.

[More commonly, (Yi1, Yi2) indep ~ mult(ni, pi1, pi2),  i = 1, 2.]

Product-Poisson. Suppose two independent random samples of random sizes N1 ~ Poisson(δ1) and N2 ~ Poisson(δ2)  are taken from two populations, and suppose that  population 1 has outcome types "(1,1)" and "(1,2)" and population 2 has outcome types "(2,1)" and "(2,2)"   for a total of four distinct outcome types.  The four counts in Y = (Y11, Y12, Y21, Y22) have the following distribution:

Y ~ MP*Z(γ, p| ZF, n) ~ MPZ(m| ZF, n),     where

Z =  [1,1,0,0 / 0,0,1,1]T,   ZF = 0,   n = 0,  γ = [δ1, δ2]T,   pij = P(type (i,j) outcome from population i),  mij = δipij.

[More commonly,  Yij indep ~ Poisson(mij=δipij),  i, j=1,2.]

Remark:  Note that for the Poisson cases, the expected counts m are explicitly written in terms of the expected sample sizes and data cell probabilities.  This is important when you are deciding which probabilities (or more generally which estimands) can be estimated for a given sampling scheme.  For example, if you were only given that Yij indep ~ Poisson(mij), you could NOT tell whether the data could be used to estimate P(type (i,j) outcome).

Definition of a Z-Homogeneous Function

Let Z be a population matrix and let h() be a function from {x:x>0} to a subset of Ru.   The function h() is said to be Z-homogeneous of orders t  if  h(D(Zb)x) = G(b)h(x), for all b > 0 and all x > 0,  where G(b) = diag{ bt(i)v(i) : i=1,...,u}.  Here, v(i) are members of {1,...,K} and t(i) are any real numbers.  A function is homogeneous relative to sampling plan (Z,ZF,n) if and only if it is  Z-homogenous.

Sufficient (but not necessary) condition for Z-homogeneity.  If h(m) = h*(p(m)) then h is Z-homogeneous.   In words, if h is only a function of the expected counts m through the cell probabilities p then h is Z-homogeneous.

Necessary (but not sufficient) condition for Z-homogeneity.  If h is Z-homogeneous then h(m) = 0  if and only if h(p(m)) = 0.  In words, if h is Z-homogeneous then constraining m via h(m) = 0 is equivalent to constraining p via h(p) = 0.

Example:  The function p(x) = D-1(ZZTx)x  is  Z-homogeneous of order 0.  

Proof:    p(D(Zb)x) = D-1(ZZTD(Zb)x)D(Zb)x = D-1(ZZTx)D-1(Zb)D(Zb)x = p(x),

where the second equality follows from properties of population matrix Z.

Summary Definition of an MPH Model

An MPH model for observed count  y has the form

y ← Y ~ MPZ(m|ZF, n),  h(m) = 0,  where h is sufficiently smooth and Z-homogeneous.

 

Example 1.  MPH Models Specified using Constraints.

Consider the sampling plan (Z,ZF,n), where 

Z =

1

0

 

ZF  =

1

n= n1= 100

1

0

1

0

1

0

0

1

0

This sampling plan implies that a random sample of fixed size n1 =100 is taken from Stratum 1 and a random sample of random size N2 ~ Poisson(δ2) is taken from Stratum 2.  Stratum 1 includes outcomes of types "1" and "2" and Stratum 2 includes outcomes of types "3" and "4."

The sufficient counts  in Y = (Y1, Y2, Y3, Y4)  ~ MPZ(m|ZF,n)  have three independent components with the following distributions...

(Y1, Y2)  ~ mult(n1, p1, p2),   Y3  ~ Pois(δ2p3),   Y4  ~ Pois(δ2p4),    where

p1 = P(type 1 outcome from Stratum 1),    p2  = P(type 2 outcome from Stratum 1),  p3 = P(type 3 outcome from Stratum 2),  p4  = P(type 4 outcome from Stratum 2).   

Note that p1 + p2  = 1  and p3 + p4  = 1,   i.e. ZTp = 1.

The expected sample sizes are γ = (n1, δ2)  = (100, δ2) and the expected counts are  m = D(Zγ)p;  i.e.

m1  = n1p1,   m2 = n1p2,   m3  = δ2p3,   and   m4 = δ2p4

You can easily verify  that p = D-1(ZZTm)m.

Consider the hypothesis p1 = p3,  or equivalently  p1/p2 = p3/p4,  or equivalently (m1m4) /(m2m3).    All three of these constraints, viewed as functions of m, are homogeneous relative to the sampling plan.

For example,  consider h(m) = p1 - p3  = m1/(m1+m2)  - m3/(m3+m4).  Now, h(D(Zb)x) = b1x1/(b1x1+b1x2) - b2x3/(b2x3 + b2x4) = h(x), so h is Z-homogenous of order 0.

It follows that  Y ~ MPZ(m| ZF, n),  h(m) = p1-p3 = 0,  is an MPH model.

Consider the hypothesis p1 = p2,  or equivalently m1 = m2.   Both of these constraints, viewed as functions of m, are homogeneous relative to the sampling plan.  The constraint h(m) = p1 - p2 = m1/(m1+m2) - m2/(m1+m2)  is Z-homogeneous of order 0 and the constraint h(m) = m1  - m2 is Z-homogeneous of order 1.

It follows that  Y ~ MPZ(m| ZF, n),  h(m) = m1  - m2 = 0,  is an MPH model.

Regarding Multi-Dimensional Contingency Tables:   Think of the four outcome types as the possible cross-classifications of two dichotomous random variables A and B.  Specifically,  relabel the outcomes as  "1"=(1,1), "2"=(1,2), "3"=(2,1), and "4"=(2,2).  Then Y = (Y11, Y12,Y21, Y22) and   pij  = P(type (i,j) outcome from stratum i) = P(type (i,j) outcome given (i,1) or (i,2)) =  P(B=j|A=i).  The hypothesis (using the old labels), p1 = p3, is (using the new labels), p11 = p21, or    P(B=1|A=1) = P(B=1|A=2).  That is, this is the hypothesis of independence between A and B.    We point this relabeling  out here to emphasize that MPH models as developed above are quite generally applicable to multi-dimensional contingency tables.   (See Example 4 below for an illustration.)

Characterization of Homogeneous Linear Predictor (HLP) Models

HLP models are important special case examples of MPH models.

An HLP model is characterized by a sampling plan triple (Z,ZF,n) and a constraint of the form   L(m) = Xb   (i.e.  h(m) = UTL(m) = 0,  where U is an orthogonal complement of X).     The constraint satisfies the following conditions:

(1) L(m) = a(γ) + L(p),  where a(γ1) - a(γ2) = a(γ1/γ2) - a(1),    and

(2) h(m) = UTL(m) is sufficiently smooth and  Z-homogeneous.

Remark: Recall that  m = expected counts, γ = expected sample sizes, and p = data cell probabilities.  These parameters are functionally related.  For example, m = D(Zγ)p  and p = D-1(ZZTm)m.

Example 2.  Loglinear Models

Consider the sampling plan (Z, ZF, n) and the  loglinear model L(m) = log(m) = Xb.

Question: Is this loglinear model an HLP model?  

(1) L(m) = log(m) = log(D(Zγ)p) = log(Zγ) + log(p) = Zlog(γ) + log(p).  Also, Zlog(γ1) - Zlog(γ2) = Zlog(γ1/γ2) - Zlog(1).  Thus, condition (1) is satisfied.

(2) h(x) = UTlog(x).   This is a sufficiently smooth function.  Moreover,  h(D(Zb)x) =  UTlog(D(Zb)x) = UT[Zlog(b) + log(x)] = UTlog(x), provided UTZ = 0 (or equivalently, the column space of X contains the column space of Z). 

Answer: Thus,  the loglinear model is an HLP model, provided the column space of X contains the column space of the population matrix Z

Example 3. Zero-Order Linear Predictor Models

Consider the sampling plan (Z, ZF, n) and the model L(m) =  F(p) = Xb, where F is a sufficiently smooth function of p.  Recall that p = p(m) = D-1(ZZTm)m.

Question: Is this an HLP model?  

(1)  L(p) = L(D-1(ZZTm)m) = F(p(D-1(ZZTm)m)) = F(p(m)) = L(m).   [The third equality follows because p() is Z-homogeneous of order 0.]  Thus, L(m) = a(γ) + L(p), with a() equal to the zero function, which  satisfies a(γ1) - a(γ2) = a(γ1/γ2) - a(1).

(2) Note that h(m) = UTL(m) is sufficiently smooth because p() and F() are sufficiently smooth.  Moreoever,  because p() is Z-homogeneous of order 0,  so are  L()  and h().

Answer: Thus,  smooth 0-order link models  F(p) = Xb are HLP models.  Note that there are no restrictions on the design matrix X.

Example 4.  Mean Response Model for 2x3 Table

Suppose that the following 2x3 contingency table is observed:

A B
1 2 3
1 34 32 21
2 10 24 20

The counts y = (34, 32, 21,10, 24,20) are assumed to be the result of two independent random samples, one of fixed size N1=n1 =  87 from the population with A=1 and one of random size  N2 ~ Poisson(δ2) from the population with A=2.  It follows that  y is a realization of Y = (Y11, Y12, Y13, Y21, Y22, Y23) ~ MPZ(m|ZF, n),   where

Z = [1,1,1,0,0,0 // 0,0,0,1,1,1]T,   ZF = [1,1,1,0,0,0]T,  and n = n1 = 87.  This means that

(Y11, Y12,Y13) ~ mult(n1, p11, p12, p13),   Y21 ~ Pois(δ2p21),  Y22 ~ Pois(δ2p22),  Y23 ~ Pois(δ2p23),

and the four components are independent.  The data cell probabilities are defined as pij = P(outcome type (i,j) from population i) = P(B=j|A=i).  The expected counts are mij= γipij,   where γ1 = n1  is known and γ2 = δ2 is unknown.

Consider the mean response model

Mi = b(0) + b(A)i,    where  Mi  = 1*pi1 + 2*pi2 + 3*pi3  and, for identifiability, b(A)1 = 0.

This has the generic form F(p) = Xb,  where F is sufficiently smooth.  By the previous example (Example 3),  we know that the mean response model

y ← Y  ~ MPZ(m|ZF, n),  F(p) = Xb      is an HLP model. 

Question:  How can I fit this mean response model using maximum likelihood? 

Answer: Use mph.fit, an R program written by Joseph B. Lang.  For information about this program, go to the author's home web page <http://www.stat.uiowa.edu/~jblang>  and follow the software link.

mph.fit (ver 1.0) input code and output  for this mean response model.   

Important:  The link function of an HLP model must be defined in terms of the expected counts m.  Therefore, define the link as L(m) = F(p(m)).

y <- scan()
34 32 21
10 24 20

Z <- scan()
1 0
1 0
1 0
0 1
0 1
0 1

Z <- matrix(Z,6,2,byrow=T)

ZF <- Z[,1]

L.fct <- function(m) {
   p <- diag(c(1/Z%*%t(Z)%*%m))%*%m
   mean1 <- 1*p[1] + 2*p[2] + 3*p[3]
   mean2 <- 1*p[4] + 2*p[5] + 3*p[6]
   rbind(mean1,mean2)
}

X <- scan()
1 0
1 1

X <- matrix(X,2,2,byrow=T)

a <- mph.fit(y,Z,ZF,L.fct=L.fct,X=X)
mph.summary(a)

OVERALL GOODNESS OF FIT: TEST of   Ho: h(m)=0 vs. Ha: not Ho...
    Likelihood Ratio Stat (df= 0 ):  Gsq =  0
    Pearson's Score Stat  (df= 0 ):  Xsq =  0
    Generalized Wald Stat (df= 0 ):  Wsq =  0


LINEAR PREDICTOR MODEL RESULTS...
           BETA StdErr(BETA)   Z-ratio     p-value
beta1 1.8505747   0.08372478 22.103070 0.000000000
beta2 0.3346105   0.12908462  2.592179 0.009537007

      OBS LINK  ML LINK  StdErr(L) LINK RESID
link1 1.850575 1.850575 0.08372478          0
link2 2.185185 2.185185 0.09824968          0


CONVERGENCE STATISTICS...
    iterations = 2
    norm.diff  = 5.38103e-07
    norm.score = 1.2662e-12
    Original counts used.

FITTING PROGRAM USED:  mph.fit, version 1.0, 6/5/02 
From the output, we see that 
b(0)hat = 1.851  (ase=0.084),   and  b(A)2hat = 0.335 (ase=0.129).

The observed mean response (2.185) corresponding to A=2 is statistically higher than the observed mean response (1.851) corresponding to A=1. 

For other MPH model fitting examples, click here.


Page Last Updated: 02/07/2007 11:59 PM -0600,  Joseph B. Lang