MPH and HLP Descriptions

n= n₁= 100

Multinomial-Poisson Homogeneous (MPH) and Homogeneous Linear Predictor (HLP) Models: A Brief Description

Author Information: Joseph B. Lang, Department of Statistics and Actuarial Science, University of Iowa, Iowa City, IA 52242 USA < joseph-lang@uiowa.edu, http://www.stat.uiowa.edu/~jblang >

Citation Information: Lang, J.B. "Multinomial-Poisson Homogeneous (MPH) and Homogeneous Linear Predictor (HLP) Models: A Brief Description," Online HTML Document, 02/07/2007 11:59 PM -0600, <http://www.stat.uiowa.edu/~jblang/mph.fitting/mph.hlp.description.htm>.

Primary References:

Lang, J.B (2004). "Multinomial-Poisson Homogeneous Models for Contingency Tables,"   Annals of Statistics, 32, 340-383 .

Lang, J.B. (2005). "Homogeneous Linear Predictor Models for Contingency Tables,"   JASA, 100, 121-134
Lang, J.B. (2002, 2007). "Maximum Likelihood Fitting of Multinomial-Poisson Homogeneous (MPH) Models for Contingency Tables using MPH.FIT." online html document.

Lang, J.B. (2002, 2007). "Multinomial-Poisson Homogeneous and Homogeneous Linear Predictor Models: A Brief Description," online html document.   [This document.]

Lang, J.B. (2002, 2007). "ML Fitting of MPH Models using MPH.FIT: Numerical Examples." online html document.

An MPH model is characterized by an independent sampling plan triple (Z,Z_F,n) and a sufficiently smooth constraint h(m) = 0, where m is the vector of expected table counts and h(.) is homogeneous relative to the sampling plan. [Remark: "sufficiently smooth" means second order derivatives are continuous.]

The independent sampling plan (Z, Z_F, n) generates sufficient cell counts Y = (Y₁, Y₂, ...,Y_c), where Y_i = # of outcomes of type i, i=1,2,...,c, in the following way:

Description of the Sampling Plan Triple (Z, Z_F, n)

Z is a cxK population matrix with (ith row, kth column) elements Z_ik that satisfy (1)   Z_ik in {0,1}.   Each of the K columns corresponds to a stratum (aka population) and each of the c rows corresponds to an outcome type.

If Z_ik = 1 then Y_i = # of type i outcomes in a random sample of size N_k from stratum k.

If Z_ik = 0 then there are no type i outcomes in stratum k.

It is assumed that  each outcome type occurs in one and only one stratum and each stratum has at least one outcome type. These assumptions are   equivalent to assuming that (2) Z_i+ = 1 and (3) Z_+k >= 1. They imply that the components in Y are defined simply as Y_i = #(type i outcomes).   [Remark: Any matrix Z that satifies properties (1),(2), and (3) is called a population matrix.]

Z_F is a cxf (f <= K) sampling constraint matrix, which comprises columns of Z or is a zero matrix.

If the kth column in Z is included in Z_F then the sample size N_k = n_k, with probability one,

If the kth column in Z is NOT included in Z_F then the sample size N_k~ Poisson(δ_k).

The collection of f fixed sample sizes is denoted n, and the collection of K-f unknown expected sample sizes is denoted δ. It will be useful to denote the entire collection of K expected sample sizes by γ = E(N) = E(Z^TY). [Note that Z_F^TY = n with probability one.]

Data Cell Probabilities. Let p_i = P(type i outcome from stratum k_i), where the k_ith stratum is the one that includes outcomes of type i.   Note that p_i is a conditional probability--given you sample from stratum k_i,    the chance of a type i outcome is p_i.    Note that the vector of conditional probabilities p satisfies the constraint Z^Tp = 1. The definition of the so-called data cell probabilities p depends on the sampling plan. To see this more explicitly, define the unconditional outcome probabilities as P_i = P(type i outcome). Then it follows that the data cell probabilities and the unconditional outcome probabilities are related according to     p = D^-1(ZZ^TP)P.

Multinomial-Poisson Random Vectors

Assuming that the K random samples are independent and the sample sizes N_k are independent of the outcome types, it follows that Y comprises independent Poisson and/or multinomial random vectors, with parameters (p,γ). It can be shown that the vector of counts Y is sufficient for the data parameters p and γ.

We say that Y has a multinomial-Poisson distribution generated by sampling plan (Z,Z_F,n), with expected sample sizes γ and data cell probabilities p. We write Y ~ MP^*_Z(γ,p|Z_F,n).

The Expected Count Parameterization

It is straightforward to see that the distribution of Y can be re-parameterized in terms of the expected counts m=E(Y)=D(Zγ)p. In fact,   the mapping

R(γ,p) = D(Zγ)p = m is one-to-one from {(γ, p): γ > 0, p > 0, Z^Tp = 1} onto {m: m > 0},

with inverse defined as R^-1(m) =   (Z^Tm, D^-1(ZZ^Tm)m) = (γ, p).

When using the expected count parameterization, we write Y ~ MP_Z(m|Z_F,n).

Reminder: m and (p, γ) are functionally related.    For example, when we refer to p, we implicitly refer to m through p = p(m) = D^-1(ZZ^Tm)m and when we refer to m, we implicitly refer to (p,γ) through m = m(p,γ) = D(Zγ)p.

Examples of MP Distributions.

Full-Multinomial. Suppose a random sample of fixed size 50 is taken from a single population, and suppose that there are four possible outcome types. The four counts in Y = (Y₁, Y₂, Y₃, Y₄) have the following distribution:

Y ~ MP^*_Z(γ, p| Z_F, n) ~ MP_Z(m| Z_F, n),     where

Z = Z_F = [1,1,1,1]^T,   n = 50, γ = n, p_i = P(type i outcome), m_i = np_i.

[More commonly, Y ~ mult(n=50, p₁, p₂, p₃, p₄).]

Full-Poisson. Suppose a random sample of random size N~Poisson(δ) is taken from a single population, and suppose that there are four possible outcome types. The four counts in Y = (Y₁, Y₂, Y₃, Y₄) have the following distribution:

Y ~ MP^*_Z(γ, p| Z_F, n) ~ MP_Z(m| Z_F, n),     where

Z = [1,1,1,1]^T,   Z_F = 0, n = 0, γ = δ,   p_i = P(type i outcome), m_i = δp_i.

[More commonly, Y_i indep Poisson(m_i=δp_i), i=1,2,3,4.]

Product-Multinomial. Suppose two independent random samples of fixed sizes 30 and 20 are taken from two populations, and suppose that population 1 has outcome types "(1,1)" and "(1,2)" and population 2 has outcome types "(2,1)" and "(2,2)"   for a total of four distinct outcome types. The four counts in Y = (Y₁₁, Y₁₂, Y₂₁, Y₂₂) have the following distribution:

Y ~ MP^*_Z(γ, p| Z_F, n) ~ MP_Z(m| Z_F, n),     where

Z = [1,1,0,0 / 0,0,1,1]^T,   Z_F = Z,   n = (n₁, n₂) = (30,20), γ = n, p_ij = P(type (i,j) outcome from population i),   m_ij = n_ip_ij.

[More commonly, (Y_i1, Y_i2) indep ~ mult(n_i, p_i1, p_i2), i = 1, 2.]

Product-Poisson. Suppose two independent random samples of random sizes N₁ ~ Poisson(δ₁) and N₂ ~ Poisson(δ₂) are taken from two populations, and suppose that population 1 has outcome types "(1,1)" and "(1,2)" and population 2 has outcome types "(2,1)" and "(2,2)"   for a total of four distinct outcome types. The four counts in Y = (Y₁₁, Y₁₂, Y₂₁, Y₂₂) have the following distribution:

Y ~ MP^*_Z(γ, p| Z_F, n) ~ MP_Z(m| Z_F, n),     where

Z = [1,1,0,0 / 0,0,1,1]^T,   Z_F = 0,   n = 0, γ = [δ₁, δ₂]^T,   p_ij = P(type (i,j) outcome from population i), m_ij = δ_ip_ij.

[More commonly, Y_ij indep ~ Poisson(m_ij=δ_ip_ij), i, j=1,2.]

Remark: Note that for the Poisson cases, the expected counts m are explicitly written in terms of the expected sample sizes and data cell probabilities. This is important when you are deciding which probabilities (or more generally which estimands) can be estimated for a given sampling scheme. For example, if you were only given that Y_ij indep ~ Poisson(m_ij), you could NOT tell whether the data could be used to estimate P(type (i,j) outcome).

Definition of a Z-Homogeneous Function

Let Z be a population matrix and let h() be a function from {x:x>0} to a subset of R^u.   The function h() is said to be Z-homogeneous of orders t if h(D(Zb)x) = G(b)h(x), for all b > 0 and all x > 0, where G(b) = diag{b^t(i)_v(i): i=1,...,u}. Here, v(i) are members of {1,...,K} and t(i) are any real numbers. A function is homogeneous relative to sampling plan (Z,Z_F,n) if and only if it is Z-homogenous.

Sufficient (but not necessary) condition for Z-homogeneity. If h(m) = h^*(p(m)) then h is Z-homogeneous.   In words, if h is only a function of the expected counts m through the cell probabilities p then h is Z-homogeneous.

Necessary (but not sufficient) condition for Z-homogeneity. If h is Z-homogeneous then h(m) = 0 if and only if h(p(m)) = 0. In words, if h is Z-homogeneous then constraining m via h(m) = 0 is equivalent to constraining p via h(p) = 0.

Example: The function p(x) = D^-1(ZZ^Tx)x is Z-homogeneous of order 0.

Proof:    p(D(Zb)x) = D^-1(ZZ^TD(Zb)x)D(Zb)x = D^-1(ZZ^Tx)D^-1(Zb)D(Zb)x = p(x),

where the second equality follows from properties of population matrix Z.

Summary Definition of an MPH Model

An MPH model for observed count y has the form

y ← Y ~ MP_Z(m|Z_F, n), h(m) = 0, where h is sufficiently smooth and Z-homogeneous.

Example 1. MPH Models Specified using Constraints.

Consider the sampling plan (Z,Z_F,n), where

This sampling plan implies that a random sample of fixed size n₁ =100 is taken from Stratum 1 and a random sample of random size N₂ ~ Poisson(δ₂) is taken from Stratum 2. Stratum 1 includes outcomes of types "1" and "2" and Stratum 2 includes outcomes of types "3" and "4."

The sufficient counts in Y = (Y₁, Y₂, Y₃, Y₄) ~ MP_Z(m|Z_F,n) have three independent components with the following distributions...

(Y₁, Y₂) ~ mult(n₁, p₁, p₂),   Y₃ ~ Pois(δ₂p₃),   Y₄ ~ Pois(δ₂p₄),    where

p₁ = P(type 1 outcome from Stratum 1),    p₂ = P(type 2 outcome from Stratum 1), p₃ = P(type 3 outcome from Stratum 2), p₄ = P(type 4 outcome from Stratum 2).

Note that p₁ + p₂ = 1 and p₃ + p₄ = 1,   i.e. Z^Tp = 1.

The expected sample sizes are γ = (n₁, δ₂) = (100, δ₂) and the expected counts are m = D(Zγ)p; i.e.

m₁ = n₁p₁,   m₂ = n₁p₂,   m₃ = δ₂p₃,   and   m₄ = δ₂p₄.

You can easily verify that p = D^-1(ZZ^Tm)m.

Consider the hypothesis p₁ = p₃, or equivalently p₁/p₂ = p₃/p₄, or equivalently (m₁m₄)/(m₂m₃).    All three of these constraints, viewed as functions of m, are homogeneous relative to the sampling plan.

For example, consider h(m) = p₁ - p₃ = m₁/(m₁+m₂) - m₃/(m₃+m₄). Now, h(D(Zb)x) = b₁x₁/(b₁x₁+b₁x₂) - b₂x₃/(b₂x₃ + b₂x₄) = h(x), so h is Z-homogenous of order 0.

It follows that Y ~ MP_Z(m| Z_F, n), h(m) = p₁-p₃ = 0, is an MPH model.

Consider the hypothesis p₁ = p_2, or equivalently m₁ = m₂.   Both of these constraints, viewed as functions of m, are homogeneous relative to the sampling plan. The constraint h(m) = p₁- p₂ = m₁/(m₁+m₂) - m₂/(m₁+m₂) is Z-homogeneous of order 0 and the constraint h(m) = m₁ - m₂ is Z-homogeneous of order 1.

It follows that Y ~ MP_Z(m| Z_F, n), h(m) = m₁ - m₂ = 0, is an MPH model.

Regarding Multi-Dimensional Contingency Tables:   Think of the four outcome types as the possible cross-classifications of two dichotomous random variables A and B. Specifically, relabel the outcomes as "1"=(1,1), "2"=(1,2), "3"=(2,1), and "4"=(2,2). Then Y = (Y₁₁, Y₁₂,Y₂₁, Y₂₂) and   p_ij = P(type (i,j) outcome from stratum i) = P(type (i,j) outcome given (i,1) or (i,2)) = P(B=j|A=i). The hypothesis (using the old labels), p₁ = p₃, is (using the new labels), p₁₁ = p₂₁, or    P(B=1|A=1) = P(B=1|A=2). That is, this is the hypothesis of independence between A and B.    We point this relabeling out here to emphasize that MPH models as developed above are quite generally applicable to multi-dimensional contingency tables.   (See Example 4 below for an illustration.)

Characterization of Homogeneous Linear Predictor (HLP) Models

HLP models are important special case examples of MPH models.

An HLP model is characterized by a sampling plan triple (Z,Z_F,n) and a constraint of the form L(m) = Xb   (i.e. h(m) = U^TL(m) = 0, where U is an orthogonal complement of X).     The constraint satisfies the following conditions:

(1) L(m) = a(γ) + L(p), where a(γ₁) - a(γ₂) = a(γ₁/γ₂) - a(1),    and

(2) h(m) = U^TL(m) is sufficiently smooth and Z-homogeneous.

Remark: Recall that m = expected counts, γ = expected sample sizes, and p = data cell probabilities. These parameters are functionally related. For example, m = D(Zγ)p and p = D^-1(ZZ^Tm)m.

Example 2. Loglinear Models

Consider the sampling plan (Z, Z_F, n) and the loglinear model L(m) = log(m) = Xb.

Question: Is this loglinear model an HLP model?

(1) L(m) = log(m) = log(D(Zγ)p) = log(Zγ) + log(p) = Zlog(γ) + log(p). Also, Zlog(γ₁) - Zlog(γ₂) = Zlog(γ₁/γ₂) - Zlog(1). Thus, condition (1) is satisfied.

(2) h(x) = U^Tlog(x).   This is a sufficiently smooth function. Moreover, h(D(Zb)x) = U^Tlog(D(Zb)x) = U^T[Zlog(b) + log(x)] = U^Tlog(x), provided U^TZ = 0 (or equivalently, the column space of X contains the column space of Z).

Answer: Thus, the loglinear model is an HLP model, provided the column space of X contains the column space of the population matrix Z.

Example 3. Zero-Order Linear Predictor Models

Consider the sampling plan (Z, Z_F, n) and the model L(m) = F(p) = Xb, where F is a sufficiently smooth function of p. Recall that p = p(m) = D^-1(ZZ^Tm)m.

Question: Is this an HLP model?

(1) L(p) = L(D^-1(ZZ^Tm)m) = F(p(D^-1(ZZ^Tm)m)) = F(p(m)) = L(m).   [The third equality follows because p() is Z-homogeneous of order 0.] Thus, L(m) = a(γ) + L(p), with a() equal to the zero function, which satisfies a(γ₁) - a(γ₂) = a(γ₁/γ₂) - a(1).

(2) Note that h(m) = U^TL(m) is sufficiently smooth because p() and F() are sufficiently smooth. Moreoever, because p() is Z-homogeneous of order 0, so are L() and h().

Answer: Thus, smooth 0-order link models F(p) = Xb are HLP models. Note that there are no restrictions on the design matrix X.

Example 4. Mean Response Model for 2x3 Table

Suppose that the following 2x3 contingency table is observed:

A B

1 2 3

1 34 32 21

2 10 24 20

The counts y = (34, 32, 21,10, 24,20) are assumed to be the result of two independent random samples, one of fixed size N₁=n₁ = 87 from the population with A=1 and one of random size N₂ ~ Poisson(δ₂) from the population with A=2. It follows that y is a realization of Y = (Y₁₁, Y₁₂, Y₁₃, Y₂₁, Y₂₂, Y₂₃) ~ MP_Z(m|Z_F, n),   where

Z = [1,1,1,0,0,0 // 0,0,0,1,1,1]^T,   Z_F = [1,1,1,0,0,0]^T, and n = n₁ = 87. This means that

(Y₁₁, Y₁₂,Y₁₃) ~ mult(n₁, p₁₁, p₁₂, p₁₃),   Y₂₁ ~ Pois(δ₂p₂₁), Y₂₂ ~ Pois(δ₂p₂₂), Y₂₃ ~ Pois(δ₂p₂₃),

and the four components are independent. The data cell probabilities are defined as p_ij = P(outcome type (i,j) from population i) = P(B=j|A=i). The expected counts are m_ij= γ_ip_ij,   where γ₁ = n₁ is known and γ₂ = δ₂ is unknown.

Consider the mean response model

M_i = b(0) + b(A)_i,    where M_i = 1*p_i1 + 2*p_i2+ 3*p_i3 and, for identifiability, b(A)₁ = 0.

This has the generic form F(p) = Xb, where F is sufficiently smooth. By the previous example (Example 3), we know that the mean response model

y ← Y ~ MP_Z(m|Z_F, n), F(p) = Xb      is an HLP model.

Question: How can I fit this mean response model using maximum likelihood?

Answer: Use mph.fit, an R program written by Joseph B. Lang. For information about this program, go to the author's home web page <http://www.stat.uiowa.edu/~jblang> and follow the software link.

mph.fit (ver 1.0) input code and output for this mean response model.
Important: The link function of an HLP model must be defined in terms of the expected counts m. Therefore, define the link as L(m) = F(p(m)).
y <- scan()
34 32 21
10 24 20

Z <- scan()
1 0
1 0
1 0
0 1
0 1
0 1

Z <- matrix(Z,6,2,byrow=T)

ZF <- Z[,1]

L.fct <- function(m) {
   p <- diag(c(1/Z%*%t(Z)%*%m))%*%m
   mean1 <- 1*p[1] + 2*p[2] + 3*p[3]
   mean2 <- 1*p[4] + 2*p[5] + 3*p[6]
   rbind(mean1,mean2)
}

X <- scan()
1 0
1 1

X <- matrix(X,2,2,byrow=T)

a <- mph.fit(y,Z,ZF,L.fct=L.fct,X=X)
mph.summary(a)

OVERALL GOODNESS OF FIT: TEST of   Ho: h(m)=0 vs. Ha: not Ho...
    Likelihood Ratio Stat (df= 0 ):  Gsq =  0
    Pearson's Score Stat  (df= 0 ):  Xsq =  0
    Generalized Wald Stat (df= 0 ):  Wsq =  0


LINEAR PREDICTOR MODEL RESULTS...
           BETA StdErr(BETA)   Z-ratio     p-value
beta1 1.8505747   0.08372478 22.103070 0.000000000
beta2 0.3346105   0.12908462  2.592179 0.009537007

      OBS LINK  ML LINK  StdErr(L) LINK RESID
link1 1.850575 1.850575 0.08372478          0
link2 2.185185 2.185185 0.09824968          0


CONVERGENCE STATISTICS...
    iterations = 2
    norm.diff  = 5.38103e-07
    norm.score = 1.2662e-12
    Original counts used.

FITTING PROGRAM USED:  mph.fit, version 1.0, 6/5/02 
From the output, we see that 
b(0)hat = 1.851  (ase=0.084),   and  b(A)₂hat = 0.335 (ase=0.129).
The observed mean response (2.185) corresponding to A=2 is statistically higher than the observed mean response (1.851) corresponding to A=1.

For other MPH model fitting examples, click here.

Page Last Updated: 02/07/2007 11:59 PM -0600, Joseph B. Lang