software is intended to be useful
in planning statistical studies. It is not intended to be
for analysis of data that have already been collected.
Each selection provides a graphical
interface for studying the power of one or more tests. They
(convertible to number-entry fields) for varying parameters, and a
provision for graphing one variable against another.
Each dialog window also
offers a Help
menu (on Macs, the Options and Help menus are added at the top of the
screen). Please read the Help menus before
contacting me with
The "Balanced ANOVA" selection
provides another dialog with a list of several popular experimental
a provision for specifying your own model.
The dialogs open in separate
windows. If you're
running this on an Apple Macintosh, the applets' menus are added to the
screen menubar -- so, for example,
two "Help" menus there!
may also downloadthis
software to run it
on your own PC.
These require a web
browser capable of running Java applets (version 1.3 or higher). If you
do not see a selection list above, chances are that you either have
disabled Java, or you have an outdated implementation of
the latter case, you need to download and install the JRE plug-in from www.oracle.com/technetwork/java/.
Due to a
compatibility bug, many plug-ins size the applet window before allowing
for an additional strip with a security warning.; to
compensate, drag the bottom of
the window downward a bit.
Please read this comment
I receive quite a few questions that start with something like this:
"I'm not much of a stats person, but I tried [details...] -- am I doing it right?"
Please compare this with:
"I don't know much about heart surgery, but my wife is suffering from ... and I plan to operate ... can you advise me?"
Folks, just because you can plug numbers into a program doesn't change
the fact that if you don't know what you're doing, you're almost
guaranteed to get meaningless results -- if not dangerously misleading
ones. Statistics really is like rocket science; it isn't easy, even to
us who have studied it for a long time. Anybody who think it's
easy surely lacks a deep enough knowledge to understand why it isn't!
If your scientific integrity matters, and statistics is a mystery to
you, then you need expert help. Find a statistician in your company or
at a nearby university, and talk to her face-to-face if possible. It
may well cost money. It's worth it.
If you're blocked by a security setting
You may get into a situation, especially after
updating Java, where this applet is blocked by a security setting in your system. In that
case, in order to run the applets, you need to this site to the list of exceptions to
your security rules. To do this in Windows, find "Configure Java" on the Start
menu, go to the security tab, and near the bottom, click
on the "Edit site list" button, and add "https://homepage.divms.uiowa.edu" to the list. Then things should work.
If you have questions about sample size or power, I suggest going to Cross Validated
which is frequented by a large number of people. The answer you need may already be available -- so do a search first!
If you use this software in preparing a research paper, grant proposal,
or other publication, I would appreciate your acknowledging it by
citing it in the references. Here is a suggested bibliography
entry in APA or "author (date)" style:
Lenth, R. V.
(2006-9). Java Applets
for Power and Sample Size [Computer software]. Retrieved month
day, year, from
This form of the citation is appropriate whether you run it online
(give the date you ran it) or the stand-alone version (give the date
you downloaded it).
to run locally
The file piface.jar
downloaded so that you can run these applications locally. [Note: Some mail software
it is smarter than you) renames this file piface.zip.
If this happens, simply rename it piface.jar;
unzip the file.]
may also want the icon file piface.ico
if you put it on your desktop or a toolbar. You
will need to have the Java Runtime Environment (JRE) or the Java
Development Kit (JDK) installed on your system. You probably
already have it; but if not, these are available for free download for
several platforms from Sun.
you have JDK or JRE version 1.2 or later, then you can probably run the
application just by double-clicking on piface.jar.
you may run it from the command line in a terminal or DOS window, using
a command like
java -jar piface.jar
This will bring up a selector list similar to the one in this web
page. A particular dialog can also be run directly from the
command line, if you know its name (can be discovered by browsing piface.jar
with a zip file utility such as WinZip).
For example, the two-sample t-test
dialog may be run using
formula(s) do you use in these calculations?
In most cases, power is an exact calculation based on the
distributional situation in question. Typically it is a
probability associated with a non-central distribution. In a
cases, an approximation is used, and is labeled as such.
sizes are calculated using root-finding methods in conjunction with
power calculations. There are usually not nice neat
formulas. That's why we need this software.
on using a particular applet
I am willing to
minimal support if you truly don't understand what inputs are
required. However, each applet has a help menu, and I do
that you carefully read that before you e-mail me with
need consulting help
I am providing this software for free, but I do not have time to also
answer substantive questions on power/sample size for your research
project. If you need statistical advice on your research
contact a statistical consultant; and if you want expert advice, you
should expect to pay for it. Most universities with
departments or statistics programs also offer a consulting
service. If you think your research is important, then it is
important to get good advice on the statistical design
and analysis (do this before
to do... Retrospective power ... Cohen's effect sizes
I recommend against these (see Advice
below). I have been asked why the Options menu in every
applet has links for retrospective power and Cohen effect sizes.
It seems to some to be placing undue emphasis on methods I
like. The technical answer to the question is that these menu items are
inherited from a base class, along with some other things (e.g., the
graphics capabilities). The other answer is that people ask
about this all the time,
spite of everything I say on this site. If you follow those
links, you get explanations of why not to do it. I'm
proud of the dialog for retrospective power.
This software is made available as-is, with no guarantees; use it at
your own risk. I welcome comments on bugs, additional
capabilities you'd like to see, etc.
questions If you have
above FAQs, and still find it
appropriate to contact me, my e-mail address is firstname.lastname@example.org.
Here are two
very wrong things that people try to do with my software:
(a.k.a. observed power, post hoc power). You've got the data,
the analysis, and did not achieve "significance." So you
power retrospectively to see if the test was powerful enough or
not. This is an empty question. Of course it wasn't
powerful enough -- that's why the result isn't significant.
calculations are useful for design, not analysis.
(Note: These comments refer to power computed based on
observed effect size and sample size. Considering a different
sample size is obviously prospective in nature. Considering a
different effect size might make sense, but probably what you really
need to do instead is an equivalence test; see Hoenig and Heisey, 2001.)
T-shirt effect sizes
("small", "medium", and "large"). This is an elaborate way to
arrive at the same sample size that has been used in past social
science studies of large, medium, and small size
The method uses a standardized effect size as the goal. Think
about it: for a "medium" effect size, you'll choose the same n regardless of the
reliability of your instrument, or the narrowness or diversity of your
subjects. Clearly, important considerations are being ignored
here. "Medium" is definitely not the message!
Here are three
very right things you can do:
prospectively for planning future studies.
as is provided on this website is useful for determining an appropriate
sample size, or for evaluating a planned study to see if it is likely
to yield useful information.
before statistics. It is easy to get caught up
statistical significance and such; but studies should be designed to
meet scientific goals, and you need to keep those in sight at all times
(in planning and
analysis). The appropriate inputs to power/sample-size
calculations are effect sizes that are deemed clinically important,
based on careful considerations of the underlying scientific (not
statistical) goals of the study. Statistical considerations
used to identify a plan that is effective in meeting scientific goals
-- not the other way around.
Investigators tend to try to answer all the world's questions with one
study. However, you usually cannot do a definitive study in
step. It is far better to work incrementally. A
helps you establish procedures, understand and protect against things
that can go wrong, and obtain variance estimates needed in determining
sample size. A pilot study with 20-30 degrees of freedom for
error is generally quite adequate for obtaining reasonably reliable
Many funding agencies require a power/sample-size section in grant
proposals. Following the above guidelines is good for
your chances of being funded. You will have established that
have thought through the scientific issues, that your procedures are
sound, and that you have a defensible sample size based on realistic
variance estimates and scientifically tenable effect-size
To read more, please see the following references:
Lenth, R. V. (2001), ``Some Practical Guidelines for
Sample Size Determination,'' The American Statistician, 55,
Hoenig, John M. and Heisey, Dennis M. (2001), ``The Abuse
Power: The Pervasive Fallacy of Power Calculations for Data Analysis,''
The American Statistician, 55,
An earlier draft of the Lenth reference above is _here_,
and a shorter summary of some comments I made in a panel discussion at
the 2000 Joint Statistical Meetings in Indianapolis is _here_.
Additional brief comments, prepared as a handout for my
presentation at the 2001 Joint Statistical Meetings in Atlanta, are _here_.
For your amusement (or despair), here is a video I found that shows how not to ask a statistician about sample-size:
(Thanks to Susan Geyer, Morsani College of Medicine, Health Informatics Institute
University of South Florida -- aka
JavaMama926. The URL is http://www.youtube.com/watch?v=PbODigCZqL8.
Dr. Geyer created this video for use in a workshop she teaches at the American Society of Hematology's Clinical Research Training Institute.)
Most computations are ``exact'' in the sense that they are based on
exact formulas for sample size, power, etc. The exception is
Satterthwaite approximations; see below.
Even with exact formulas, computed values are inexact, as are all
double-precision floating-point computations. Many
noncentral distributions) require summing one or more series, and there
is a serious tradeoff between speed and accuracy. The error
set for cdfs is 1E-8 or smaller, and for quantiles the bound is
Actual errors can be much larger due to accumulated errors or other
Quantiles, for example, are computed by numerically solving an equation
involving the cdf; thus, in extreme cases, a small error in the cdf can
create a large error in the quantile.
A warning (typically, ``too many iterations'') is generated when an
error bound is not detected to have been achieved. However,
the case of quantile computations, no warning message is generated for
extreme quantiles. If you want a power of .9999 at
you can expect the computed
sample size to not be accurate to the nearest
specify reasonable criteria, the answers will be pretty reliable.
Some of the dialogs (two-sample t, mixed ANOVA) implement Satterthwaite
approximations when certain combinations of inputs require an error
to be constructed. These are of course not exact, even in
formulation. Moreover, the Satterthwaite degrees of freedom
used as-is in computing power from a noncentral t or
noncentral F distribution, and this introduces
that could be large in some cases.
In the two-sample t setting, I'd expect the worst
when there is a huge imbalance in sample sizes and/or
the dialogs for mixed ANOVA models (either F tests
comparisons/contrasts), I expect these errors to get worse as more
variance components are involved, especially when one or more of them
is given negative weight.
This page was last modified The views and opinions expressed in this page are strictly
those of the page author.
The contents of this page have not been approved by Mathematical Sciences,
the College of Liberal Arts, or The University of Iowa. Visitors
since August 14, 2006: