MultinomialPoisson Homogeneous (MPH) and Homogeneous Linear Predictor (HLP) Models: A Brief Description
Author Information: Joseph B. Lang, Department of Statistics and Actuarial Science, University of Iowa, Iowa City, IA 52242 USA < josephlang@uiowa.edu, http://www.stat.uiowa.edu/~jblang >
Citation Information: Lang, J.B. "MultinomialPoisson Homogeneous (MPH) and Homogeneous Linear Predictor (HLP) Models: A Brief Description," Online HTML Document, 02/07/2007 11:59 PM 0600, <http://www.stat.uiowa.edu/~jblang/mph.fitting/mph.hlp.description.htm>.
Primary References:
Lang, J.B (2004). "MultinomialPoisson Homogeneous Models for Contingency Tables," Annals of Statistics, 32, 340383 .
Lang, J.B. (2005). "Homogeneous Linear Predictor Models for Contingency Tables," JASA, 100, 121134
Lang, J.B. (2002, 2007). "Maximum Likelihood Fitting of MultinomialPoisson Homogeneous (MPH) Models for Contingency Tables using MPH.FIT." online html document.
Lang, J.B. (2002, 2007). "MultinomialPoisson Homogeneous and Homogeneous Linear Predictor Models: A Brief Description," online html document. [This document.]
Lang, J.B. (2002, 2007). "ML Fitting of MPH Models using MPH.FIT: Numerical Examples." online html document.
An MPH model is characterized by an independent sampling plan triple (Z,Z_{F},n) and a sufficiently smooth constraint h(m) = 0, where m is the vector of expected table counts and h(.) is homogeneous relative to the sampling plan. [Remark: "sufficiently smooth" means second order derivatives are continuous.]
The independent sampling plan (Z, Z_{F}, n) generates sufficient cell counts Y = (Y_{1}, Y_{2}, ...,_{ }Y_{c}), where Y_{i} = # of outcomes of type i, i=1,2,...,c, in the following way:
Description of the Sampling Plan Triple (Z, Z_{F}, n)
Z is a cxK population matrix with (ith row, kth column) elements Z_{ik} that satisfy (1) Z_{ik} in {0,1}. Each of the K columns corresponds to a stratum (aka population) and each of the c rows corresponds to an outcome type.
If Z_{ik} = 1 then Y_{i} = # of type i outcomes in a random sample of size N_{k } from stratum k.
If Z_{ik} = 0 then there are no type i outcomes in stratum k.
It is assumed that each outcome type occurs in one and only one stratum and each stratum has at least one outcome type. These assumptions are equivalent to assuming that (2) Z_{i+ } = 1 and (3) Z_{+k } >= 1. They imply that the components in Y are defined simply as Y_{i} = #(type i outcomes). [Remark: Any matrix Z that satifies properties (1),(2), and (3) is called a population matrix.]
Z_{F} is a cxf (f <= K) sampling constraint matrix, which comprises columns of Z or is a zero matrix.
If the kth column in Z is included in Z_{F} then the sample size N_{k} = n_{k}, with probability one,
If the kth column in Z is NOT included in Z_{F} then the sample size N_{k }~ Poisson(δ_{k}).
The collection of f fixed sample sizes is denoted n, and the collection of Kf unknown expected sample sizes is denoted δ. It will be useful to denote the entire collection of K expected sample sizes by γ = E(N) = E(Z^{T}Y). [Note that Z_{F}^{T}Y = n with probability one.]
Data Cell Probabilities. Let p_{i} = P(type i outcome from stratum k_{i}), where the k_{i }th stratum is the one that includes outcomes of type i. Note that p_{i} is a conditional probabilitygiven you sample from stratum k_{i}, the chance of a type i outcome is p_{i}. Note that the vector of conditional probabilities p satisfies the constraint Z^{T}p = 1. The definition of the socalled data cell probabilities p depends on the sampling plan. To see this more explicitly, define the unconditional outcome probabilities as P_{i} = P(type i outcome). Then it follows that the data cell probabilities and the unconditional outcome probabilities are related according to p = D^{1}(ZZ^{T}P)P.
MultinomialPoisson Random Vectors
Assuming that the K random samples are independent and the sample sizes N_{k} are independent of the outcome types, it follows that Y comprises independent Poisson and/or multinomial random vectors, with parameters (p,γ). It can be shown that the vector of counts Y is sufficient for the data parameters p and γ.
We say that Y has a multinomialPoisson distribution generated by sampling plan (Z,Z_{F},n), with expected sample sizes γ and data cell probabilities p. We write Y ~ MP^{*}_{Z}(γ,pZ_{F},n).
The Expected Count Parameterization
It is straightforward to see that the distribution of Y can be reparameterized in terms of the expected counts m=E(Y)=D(Zγ)p. In fact, the mapping
R(γ,p) = D(Zγ)p = m is onetoone from {(γ, p): γ > 0, p > 0, Z^{T}p = 1} onto {m: m > 0},
with inverse defined as R^{1}(m) = (Z^{T}m, D^{1}(ZZ^{T}m)m) = (γ, p).
When using the expected count parameterization, we write Y ~ MP_{Z}(mZ_{F},n).
Reminder: m and (p, γ) are functionally related. For example, when we refer to p, we implicitly refer to m through p = p(m) = D^{1}(ZZ^{T}m)m and when we refer to m, we implicitly refer to (p,γ) through m = m(p,γ) = D(Zγ)p.
Examples of MP Distributions.
FullMultinomial. Suppose a random sample of fixed size 50 is taken from a single population, and suppose that there are four possible outcome types. The four counts in Y = (Y_{1}, Y_{2}, Y_{3}, Y_{4}) have the following distribution:
Y ~ MP^{*}_{Z}(γ, p Z_{F}, n) ~ MP_{Z}(m Z_{F}, n), where
Z = Z_{F} = [1,1,1,1]^{T}, n = 50, γ = n, p_{i} = P(type i outcome), m_{i} = np_{i}.
[More commonly, Y ~ mult(n=50, p_{1}, p_{2}, p_{3}, p_{4}).]
FullPoisson. Suppose a random sample of random size N~Poisson(δ) is taken from a single population, and suppose that there are four possible outcome types. The four counts in Y = (Y_{1}, Y_{2}, Y_{3}, Y_{4}) have the following distribution:
Y ~ MP^{*}_{Z}(γ, p Z_{F}, n) ~ MP_{Z}(m Z_{F}, n), where
Z = [1,1,1,1]^{T}, Z_{F} = 0, n = 0, γ = δ, p_{i} = P(type i outcome), m_{i} = δp_{i}.
[More commonly, Y_{i} indep Poisson(m_{i}=δp_{i}), i=1,2,3,4.]
ProductMultinomial. Suppose two independent random samples of fixed sizes 30 and 20 are taken from two populations, and suppose that population 1 has outcome types "(1,1)" and "(1,2)" and population 2 has outcome types "(2,1)" and "(2,2)" for a total of four distinct outcome types. The four counts in Y = (Y_{11}, Y_{12}, Y_{21}, Y_{22}) have the following distribution:
Y ~ MP^{*}_{Z}(γ, p Z_{F}, n) ~ MP_{Z}(m Z_{F}, n), where
Z = [1,1,0,0 / 0,0,1,1]^{T}, Z_{F} = Z, n = (n_{1}, n_{2}) = (30,20), γ = n, p_{ij} = P(type (i,j) outcome from population i), m_{ij} = n_{i}p_{ij}.
[More commonly, (Y_{i1}, Y_{i2}) indep ~ mult(n_{i}, p_{i1}, p_{i2}), i = 1, 2.]
ProductPoisson. Suppose two independent random samples of random sizes N_{1} ~ Poisson(δ_{1}) and N_{2} ~ Poisson(δ_{2}) are taken from two populations, and suppose that population 1 has outcome types "(1,1)" and "(1,2)" and population 2 has outcome types "(2,1)" and "(2,2)" for a total of four distinct outcome types. The four counts in Y = (Y_{11}, Y_{12}, Y_{21}, Y_{22}) have the following distribution:
Y ~ MP^{*}_{Z}(γ, p Z_{F}, n) ~ MP_{Z}(m Z_{F}, n), where
Z = [1,1,0,0 / 0,0,1,1]^{T}, Z_{F} = 0, n = 0, γ = [δ_{1}, δ_{2}]^{T}, p_{ij} = P(type (i,j) outcome from population i), m_{ij} = δ_{i}p_{ij}.
[More commonly, Y_{ij} indep ~ Poisson(m_{ij}=δ_{i}p_{ij}), i, j=1,2.]
Remark: Note that for the Poisson cases, the expected counts m are explicitly written in terms of the expected sample sizes and data cell probabilities. This is important when you are deciding which probabilities (or more generally which estimands) can be estimated for a given sampling scheme. For example, if you were only given that Y_{ij} indep ~ Poisson(m_{ij}), you could NOT tell whether the data could be used to estimate P(type (i,j) outcome).
Definition of a ZHomogeneous Function
Let Z be a population matrix and let h() be a function from {x:x>0} to a subset of R^{u}. The function h() is said to be Zhomogeneous of orders t if h(D(Zb)x) = G(b)h(x), for all b > 0 and all x > 0, where G(b) = diag{_{ }b^{t(i)}_{v(i)}^{ }: i=1,...,u}. Here, v(i) are members of {1,...,K} and t(i) are any real numbers. A function is homogeneous relative to sampling plan (Z,Z_{F},n) if and only if it is Zhomogenous.
Sufficient (but not necessary) condition for Zhomogeneity. If h(m) = h^{*}(p(m)) then h is Zhomogeneous. In words, if h is only a function of the expected counts m through the cell probabilities p then h is Zhomogeneous.
Necessary (but not sufficient) condition for Zhomogeneity. If h is Zhomogeneous then h(m) = 0 if and only if h(p(m)) = 0. In words, if h is Zhomogeneous then constraining m via h(m) = 0 is equivalent to constraining p via h(p) = 0.
Example: The function p(x) = D^{1}(ZZ^{T}x)x is Zhomogeneous of order 0.
Proof: p(D(Zb)x) = D^{1}(ZZ^{T}D(Zb)x)D(Zb)x = D^{1}(ZZ^{T}x)D^{1}(Zb)D(Zb)x = p(x),
where the second equality follows from properties of population matrix Z.
Summary Definition of an MPH Model
An MPH model for observed count y has the form
y ← Y ~ MP_{Z}(mZ_{F}, n), h(m) = 0, where h is sufficiently smooth and Zhomogeneous.
Example 1. MPH Models Specified using Constraints.
Consider the sampling plan (Z,Z_{F},n), where

1 
0 

Z_{F } = 
1 
n= n_{1}= 100 
1 
0 
1 

0 
1 
0 

0 
1 
0 
This sampling plan implies that a random sample of fixed size n_{1} =100 is taken from Stratum 1 and a random sample of random size N_{2} ~ Poisson(δ_{2}) is taken from Stratum 2. Stratum 1 includes outcomes of types "1" and "2" and Stratum 2 includes outcomes of types "3" and "4."
The sufficient counts in Y = (Y_{1}, Y_{2}, Y_{3}, Y_{4}) ~ MP_{Z}(mZ_{F},n) have three independent components with the following distributions...
(Y_{1}, Y_{2}) ~ mult(n_{1}, p_{1}, p_{2}), Y_{3 } ~ Pois(δ_{2}p_{3}), Y_{4 } ~ Pois(δ_{2}p_{4}), where
p_{1} = P(type 1 outcome from Stratum 1), p_{2 } = P(type 2 outcome from Stratum 1), p_{3} = P(type 3 outcome from Stratum 2), p_{4 } = P(type 4 outcome from Stratum 2).
Note that p_{1} + p_{2 } = 1 and p_{3} + p_{4 } = 1, i.e. Z^{T}p = 1.
The expected sample sizes are γ = (n_{1}, δ_{2}) = (100, δ_{2}) and the expected counts are m = D(Zγ)p; i.e.
m_{1 } = n_{1}p_{1}, m_{2} = n_{1}p_{2}, m_{3 } = δ_{2}p_{3}, and m_{4} = δ_{2}p_{4}.
You can easily verify that p = D^{1}(ZZ^{T}m)m.
Consider the hypothesis p_{1} = p_{3}, or equivalently p_{1}/p_{2} = p_{3}/p_{4}, or equivalently (m_{1}m_{4})_{ }/(m_{2}m_{3}). All three of these constraints, viewed as functions of m, are homogeneous relative to the sampling plan.
For example, consider h(m) = p_{1}  p_{3 } = m_{1}/(m_{1}+m_{2})  m_{3}/(m_{3}+m_{4}). Now, h(D(Zb)x) = b_{1}x_{1}/(b_{1}x_{1}+b_{1}x_{2})  b_{2}x_{3}/(b_{2}x_{3} + b_{2}x_{4}) = h(x), so h is Zhomogenous of order 0.
It follows that Y ~ MP_{Z}(m Z_{F}, n), h(m) = p_{1}p_{3} = 0, is an MPH model.
Consider the hypothesis p_{1} = p_{2, } or equivalently m_{1} = m_{2}. Both of these constraints, viewed as functions of m, are homogeneous relative to the sampling plan. The constraint h(m) = p_{1 } p_{2} = m_{1}/(m_{1}+m_{2})  m_{2}/(m_{1}+m_{2}) is Zhomogeneous of order 0 and the constraint h(m) = m_{1 }  m_{2} is Zhomogeneous of order 1.
It follows that Y ~ MP_{Z}(m Z_{F}, n), h(m) = m_{1 }  m_{2} = 0, is an MPH model.
Regarding MultiDimensional Contingency Tables: Think of the four outcome types as the possible crossclassifications of two dichotomous random variables A and B. Specifically, relabel the outcomes as "1"=(1,1), "2"=(1,2), "3"=(2,1), and "4"=(2,2). Then Y = (Y_{11}, Y_{12},Y_{21}, Y_{22}) and p_{ij } = P(type (i,j) outcome from stratum i) = P(type (i,j) outcome given (i,1) or (i,2)) = P(B=jA=i). The hypothesis (using the old labels), p_{1} = p_{3}, is (using the new labels), p_{11} = p_{21}, or P(B=1A=1) = P(B=1A=2). That is, this is the hypothesis of independence between A and B. We point this relabeling out here to emphasize that MPH models as developed above are quite generally applicable to multidimensional contingency tables. (See Example 4 below for an illustration.)
Characterization of Homogeneous Linear Predictor (HLP) Models
HLP models are important special case examples of MPH models.
An HLP model is characterized by a sampling plan triple (Z,Z_{F},n) and a constraint of the form L(m) = Xb (i.e. h(m) = U^{T}L(m) = 0, where U is an orthogonal complement of X). The constraint satisfies the following conditions:
(1) L(m) = a(γ) + L(p), where a(γ_{1})  a(γ_{2}) = a(γ_{1}/γ_{2})  a(1), and
(2) h(m) = U^{T}L(m) is sufficiently smooth and Zhomogeneous.
Remark: Recall that m = expected counts, γ = expected sample sizes, and p = data cell probabilities. These parameters are functionally related. For example, m = D(Zγ)p and p = D^{1}(ZZ^{T}m)m.
Example 2. Loglinear Models
Consider the sampling plan (Z, Z_{F}, n) and the loglinear model L(m) = log(m) = Xb.
Question: Is this loglinear model an HLP model?
(1) L(m) = log(m) = log(D(Zγ)p) = log(Zγ) + log(p) = Zlog(γ) + log(p). Also, Zlog(γ_{1})  Zlog(γ_{2}) = Zlog(γ_{1}/γ_{2})  Zlog(1). Thus, condition (1) is satisfied.
(2) h(x) = U^{T}log(x). This is a sufficiently smooth function. Moreover, h(D(Zb)x) = U^{T}log(D(Zb)x) = U^{T}[Zlog(b) + log(x)] = U^{T}log(x), provided U^{T}Z = 0 (or equivalently, the column space of X contains the column space of Z).
Answer: Thus, the loglinear model is an HLP model, provided the column space of X contains the column space of the population matrix Z.
Example 3. ZeroOrder Linear Predictor Models
Consider the sampling plan (Z, Z_{F}, n) and the model L(m) = F(p) = Xb, where F is a sufficiently smooth function of p. Recall that p = p(m) = D^{1}(ZZ^{T}m)m.
Question: Is this an HLP model?
(1) L(p) = L(D^{1}(ZZ^{T}m)m) = F(p(D^{1}(ZZ^{T}m)m)) = F(p(m)) = L(m). [The third equality follows because p() is Zhomogeneous of order 0.] Thus, L(m) = a(γ) + L(p), with a() equal to the zero function, which satisfies a(γ_{1})  a(γ_{2}) = a(γ_{1}/γ_{2})  a(1).
(2) Note that h(m) = U^{T}L(m) is sufficiently smooth because p() and F() are sufficiently smooth. Moreoever, because p() is Zhomogeneous of order 0, so are L() and h().
Answer: Thus, smooth 0order link models F(p) = Xb are HLP models. Note that there are no restrictions on the design matrix X.
Example 4. Mean Response Model for 2x3 Table
Suppose that the following 2x3 contingency table is observed:
A B 1 2 3 1 34 32 21 2 10 24 20 The counts y = (34, 32, 21,10, 24,20) are assumed to be the result of two independent random samples, one of fixed size N_{1}=n_{1} = 87 from the population with A=1 and one of random size N_{2} ~ Poisson(δ_{2}) from the population with A=2. It follows that y is a realization of Y = (Y_{11}, Y_{12}, Y_{13}, Y_{21}, Y_{22}, Y_{23}) ~ MP_{Z}(mZ_{F}, n), where
Z = [1,1,1,0,0,0 // 0,0,0,1,1,1]^{T}, Z_{F} = [1,1,1,0,0,0]^{T}, and n = n_{1} = 87. This means that
(Y_{11}, Y_{12},Y_{13}) ~ mult(n_{1}, p_{11}, p_{12}, p_{13}), Y_{21} ~ Pois(δ_{2}p_{21}), Y_{22} ~ Pois(δ_{2}p_{22}), Y_{23} ~ Pois(δ_{2}p_{23}),
and the four components are independent. The data cell probabilities are defined as p_{ij} = P(outcome type (i,j) from population i) = P(B=jA=i). The expected counts are m_{ij}= γ_{i}p_{ij}, where γ_{1} = n_{1 } is known and γ_{2} = δ_{2} is unknown.
Consider the mean response model
M_{i} = b(0) + b(A)_{i}, where M_{i } = 1*p_{i1} + 2*p_{i2 }+ 3*p_{i3 } and, for identifiability, b(A)_{1} = 0.
This has the generic form F(p) = Xb, where F is sufficiently smooth. By the previous example (Example 3), we know that the mean response model
y ← Y ~ MP_{Z}(mZ_{F}, n), F(p) = Xb is an HLP model.
Question: How can I fit this mean response model using maximum likelihood?
Answer: Use mph.fit, an R program written by Joseph B. Lang. For information about this program, go to the author's home web page <http://www.stat.uiowa.edu/~jblang> and follow the software link.
mph.fit (ver 1.0) input code and output for this mean response model.
Important: The link function of an HLP model must be defined in terms of the expected counts m. Therefore, define the link as L(m) = F(p(m)).
y < scan() 34 32 21 10 24 20 Z < scan() 1 0 1 0 1 0 0 1 0 1 0 1 Z < matrix(Z,6,2,byrow=T) ZF < Z[,1] L.fct < function(m) { p < diag(c(1/Z%*%t(Z)%*%m))%*%m mean1 < 1*p[1] + 2*p[2] + 3*p[3] mean2 < 1*p[4] + 2*p[5] + 3*p[6] rbind(mean1,mean2) } X < scan() 1 0 1 1 X < matrix(X,2,2,byrow=T) a < mph.fit(y,Z,ZF,L.fct=L.fct,X=X)mph.summary(a) OVERALL GOODNESS OF FIT: TEST of Ho: h(m)=0 vs. Ha: not Ho... Likelihood Ratio Stat (df= 0 ): Gsq = 0 Pearson's Score Stat (df= 0 ): Xsq = 0 Generalized Wald Stat (df= 0 ): Wsq = 0 LINEAR PREDICTOR MODEL RESULTS... BETA StdErr(BETA) Zratio pvalue beta1 1.8505747 0.08372478 22.103070 0.000000000 beta2 0.3346105 0.12908462 2.592179 0.009537007 OBS LINK ML LINK StdErr(L) LINK RESID link1 1.850575 1.850575 0.08372478 0 link2 2.185185 2.185185 0.09824968 0 CONVERGENCE STATISTICS... iterations = 2 norm.diff = 5.38103e07 norm.score = 1.2662e12 Original counts used. FITTING PROGRAM USED: mph.fit, version 1.0, 6/5/02From the output, we see thatb(0)hat = 1.851 (ase=0.084), and b(A)_{2}hat = 0.335 (ase=0.129).The observed mean response (2.185) corresponding to A=2 is statistically higher than the observed mean response (1.851) corresponding to A=1.
For other MPH model fitting examples, click here.
Page Last Updated: 02/07/2007 11:59 PM 0600, Joseph B. Lang