\documentclass[titlepage]{article}
\newcommand{\macbold}[1]{{\bf #1}}
\newcommand{\NIL}{{\tt NIL}}
\newcommand{\COMMAND}[1]{COMMAND-#1}
\newenvironment{lispdoc}%
{\begin{list}{}{\setlength{\leftmargin}{0 in}}}%
{\end{list}}
\newcommand{\docitem}[2]{\item #1 \hfill [#2]\\}
\newcommand{\fdocitem}[1]{\docitem{#1}{Function}\index{#1}}
\newcommand{\mdocitem}[1]{\docitem{#1}{Macro}\index{#1}}
\newcommand{\odocitem}[1]{\docitem{#1}{Object Method}\index{messages:#1}}
\newcommand{\dcode}[1]{{\tt #1}}
\usepackage{graphicx}
\usepackage{html}
\begin{latexonly}
\special{header=mac68.pro}
\end{latexonly}
\newcommand{\paramref}[1]{{\em #1}}
\newcommand{\param}[1]{$<$\paramref{#1}$>$}
\newcommand{\ncaption}[2]{\parbox[t]{3.8in}{\caption{#1}\label{#2}}}
\newcommand{\wcaption}[2]{\caption{#1}\label{#2}}
\def\displaycode{\trivlist \tt \item[]}
\setlength{\textwidth}{6in}
\setlength{\textheight}{8.5in}
\setlength{\topmargin}{-0.25in}
\setlength{\oddsidemargin}{0.25in}
\setlength{\evensidemargin}{0.25in}
\title{
XLISP-STAT\\
A Statistical Environment Based on \\
the XLISP Language\\
{\large (Version 2.0)}}
\author{
Luke Tierney\\
School of Statistics \\
University of Minnesota}
\makeindex
\begin{document}
\begin{titlepage}
\begin{center}
\vspace*{.5in}
{\Huge \bf XLISP-STAT}
\vspace{.125in}
{\huge A Statistical Environment Based on}
\vspace{.125in}
{\huge the XLISP Language}
\vspace{.125in}
{\large (Version 2.0)}
\vspace{.75in}
{\Large by}
\vspace{.0625in}
{\Large Luke Tierney}
\vspace{1.25in}
\centerline{\includegraphics{postscript/umlogo.ps}}
{\Large University of Minnesota}
\vspace{.0625in}
{\Large School of Statistics}
\vspace{.0625in}
{\Large Technical Report Number 528}
\vspace{.75in}
{\Large July 1989}
\end{center}
\end{titlepage}
\tableofcontents
\newpage
\section*{Preface}
\addcontentsline{toc}{subsection}{Preface}
XLISP-STAT is a statistical environment built on top of the XLISP
programming language. This document is intended to be a tutorial
introduction to the basics of XLISP-STAT. It is written primarily for
the Apple Macintosh version, but most of the material applies to other
versions as well; some points where other versions differ are outlined
in an appendix. The first three sections contain the information you
will need to do elementary statistical calculations and plotting. The
fourth section introduces some additional methods for generating and
modifying data. The fifth section describes some additional features
of the Macintosh user interface that may be helpful. The remaining
sections deal with more advanced topics, such as interactive plots,
regression models, and writing your own functions. All sections are
organized around examples, and most contain some suggested exercises
for the reader.
This document is not intended to be a complete manual. However,
documentation for many of the commands that are available is given in
the appendix. Brief help messages for these and other commands are
also available through the interactive help facility described in
Section \ref{Shortcuts.Help} below.
XLISP itself is a high-level programming language developed by David
Betz and made available for unrestricted, non-commercial use. It is a
dialect of Lisp, most closely related to the Common Lisp dialect.
XLISP also contains some extensions to Lisp to support object-oriented
programming. These facilities have been modified in XLISP-STAT to
implement the screen menus, plots and regression models. Several
excellent books on Common Lisp are available. One example is Winston
and Horn \cite{WinstonHorn}. A book on XLISP itself has recently been
published. Unfortunately it is based on XLISP 1.7, which differs
significantly from XLISP 2.0, the basis of XLISP-STAT 2.0.
XLISP-STAT was originally developed for the Apple Macintosh. It is now
also available for UNIX systems using the {\em X11}\/ window system,
for {\em Sun}\/ workstations under the {\em SunView}\/ window system,
and, with only rudimentary graphics, for generic 4.[23]BSD UNIX
systems. The Macintosh version of XLISP-STAT was developed and
compiled using the Lightspeed C compiler from Think Technologies, Inc.
The Macintosh user interface is based on Paul DuBois' TransSkel and
TransEdit libraries. Some of the linear algebra and probability
functions are based on code given in Press, Flannery, Teukolsky and
Vetterling
\cite{CRecipes}. Regression computations are carried out using the sweep
algorithm as described in Weisberg \cite{Multreg}.
This tutorial has borrowed several ideas from Gary Oehlert's {\em
MacAnova User's Guide}\/ \cite{Macanova}. Many of the on-line help
entries have been adopted directly or with minor modifications from
the {\em Kyoto Common Lisp System}. Most of the examples used in this
tutorial have been taken from Devore and Peck \cite{DevorePeck}. Many
of the functions added to XLISP-STAT were motivated by similar
functions in the {\em S} statistical environment
\cite{oldSbook,newSbook}.
The present version of XLISP-STAT, Version 2.0, seems to run fairly
comfortably on a Mac II or Mac Plus with 2MB of memory, but is a bit
cramped with only 1MB. It will not run in less than 1Mb of memory. The
program will occasionally bomb with an ID=28 if it gets into a
recursion that is too deep for the Macintosh stack to handle. On a 1MB
Mac it may also bomb with an ID=15 if too much memory has been used
for the segment loader to be able to bring in a required code segment.
Development of XLISP-STAT was supported in part by grants of an Apple
Macintosh Plus computer and hard disk and a Macintosh II computer from
the MinneMac Project at the University of Minnesota, by a single
quarter leave granted to the author by the University of Minnesota, by
grant DMS-8705646 from the National Science Foundation, and by a
research contract with Bell Communications Research.
\subsection*{Using this Tutorial}
The best way to learn about a new computer program is usually to use
it. You will get most out of this tutorial if you read it at your
computer and work through the examples yourself. To make this easier
the named data sets used in this tutorial have been stored on the file
\dcode{tutorial.lsp} in the \dcode{Data} folder of the Macintosh
distribution disk. To read in this file select the \macbold{Load} item
on the \macbold{File} menu. This will bring up an \macbold{Open File}
dialog window. Use this dialog to open the \dcode{Data} folder on the
distribution disk. Now select the file \dcode{tutorial.lsp} and press
the \macbold{Open} button. The file will be loaded and some variables
will be defined for you.
\footnote{On a UNIX system you can use the function \dcode{load-data} to
load the tutorial data. After starting up XLISP-STAT enter the expression
\dcode{(load-data "tutorial")}, followed by a {\em return}.}
\subsection*{Why XLISP-STAT Exists}
There are three primary reasons behind my decision to produce the
XLISP-STAT environment. The first is to provide a vehicle for
experimenting with dynamic graphics and for using dynamic graphics in
instruction. Second, I wanted to be able to experiment with an
environment supporting functional data, such as mean functions in
nonlinear regression models and prior density and likelihood functions
in Bayesian analyses. Finally, I was interested in exploring the use
of object-oriented programming ideas for building and analyzing
statistical models. I will discuss each of these points in a little
more detail in the following paragraphs.
The development of high resolution graphical computer displays has
made it possible to consider the use of dynamic graphics for
understanding higher-dimensional structure. One of the earliest
examples is the real time rotation of a three dimensional point cloud
on a screen -- an effort to use motion to recover a third dimension
from a two dimensional display. Other techniques that have been
developed include {\em brushing}\/ a scatterplot -- highlighting points
in one plot and seeing where the corresponding points fall in other
plots. A considerable amount of research has been done in this area,
see for example the discussion in Becker and Cleveland
\cite{BeckerCleveland} and the papers reproduced in Cleveland
and McGill\cite{ClevelandMcGill}. However most of the software
developed to date has been developed on specialized hardware, such as
the TTY 5620 terminal or Lisp machines. As a result, very few
statisticians have had an opportunity to experiment with dynamic
graphics first hand, and still fewer have had access to an environment
that would allow them to implement dynamic graphics ideas of their
own. Several commercial packages for microcomputers now contain some
form of dynamic graphics, but most do not allow users to customize
their plots or develop functions for producing specialized plots, such
as dynamic residual plots. XLISP-STAT provides at least a partial
solution to these problems. It allows the user to modify a scatter
plot with Lisp functions and provides means for modifying the way in
which a plot responds to mouse actions. It is also possible to add
functions written in C to the program. On the Macintosh this has to be
done by adding to the source code. On some unix systems it is also
possible to compile and dynamically load code written in C or FORTRAN.
An integrated environment for statistical calculations and graphics is
essential for developing an understanding of the uses of dynamic
graphics in statistics and for developing new graphical techniques.
Such an environment must essentially be a programming language. Its
basic data types must include types that allow groups of numbers --
data sets -- to be manipulated as entire objects. But in model-based
analyses numerical data are only part of the information being used.
The remainder is the model itself. Sometimes a model is easily
characterized by specifying a set of numbers. A normal linear
regression model with $i. i. d.$ errors might be described by the
number of covariates, the coefficients and the error variance. On the
other hand, in many cases it is easier to specify a model by
specifying a function. To specify a normal nonlinear regression model,
for example, one might specify the mean function. If our language is
to allow us to specify this function within the language itself then
the language must support a functional data type with full rights: It
has to be possible to define functions that manipulate functions,
return functions, apply functions to arguments, etc.. The choice I
faced was to define a language from scratch or use an existing
language. Because of the complexity of issues involved in functional
programming I decided to use a dialect of a well understood functional
language, Lisp. The syntax of Lisp is somewhat unfamiliar to most
users of statistical packages, but it is easy to learn and several
good tutorials are available in local book stores. I considered the
possibility of using Lisp to write a top level interface with a more
``natural'' syntax, but I did not see any way of doing this without
complicating access to some of the more powerful features of Lisp or
running into some of the pitfalls of functional programming. I
therefore decided to retain the basic Lisp top level syntax. To make
the manipulation of numerical data sets easier I have redefined the
arithmetic operators and basic numerical functions to work on lists
and arrays of data.
Having decided to use Lisp as the basis for my environment XLISP was a
natural choice for several reasons. It has been made available for
unrestricted, non-commercial use by its author, David Betz. It is
small (for a Lisp system), its source code is available in C, and it
is easily extensible. Finally, it includes support for object-oriented
programming. Object-oriented programming has received considerable
attention in recent years and is particularly natural for use in
describing and manipulating graphical objects. It may also be useful
for the analysis of statistical data and models. A collection of data
and assumptions may be represented as an {\em object}. The model
object can then be examined and modified by sending it {\em messages}.
Many different kinds of models will answer similar questions, thus
fitting naturally into an {\em inheritance structure}. XLISP-STAT's
implementation of linear and nonlinear regression models as {\em
objects}, with nonlinear regression inheriting many of its {\em
methods} from linear regression, is a first, primitive attempt to
exploit this programming technique in statistical analysis.
\subsection*{Availability}
Source code for XLISP-STAT for {\em X11}, {\em Sun}, 4.[23]BSD UNIX
and Macintosh versions and executables for the Macintosh are available
free of charge for non-commercial use. You should, however, be
prepared to bear the cost of copying, for example by supplying a disk
or tape and a stamped mailing envelope. You can also obtain the source
code and Macintosh executables by anonymous {\em ftp}\/ over the
internet from {\em umnstat.stat.umn.edu} (128.101.51.1).
A version for the Mac II requiring the MC68881 coprocessor is
available as well. This version is compiled to access the coprocessor
directly and will therefore not run on a machine without the
coprocessor. Numerical computations with this version are about 7 to
10 times faster on a Mac II than without the direct coprocessor
access.
\subsection*{Disclaimer}
XLISP-STAT is an experimental program. It has not been extensively
tested. The University of Minnesota, the School of Statistics, the
author of the statistical extensions and the author of XLISP take no
responsibility for losses or damages resulting directly or indirectly
from the use of this program.
XLISP-STAT is an evolving system. Over time new features will be
introduced, and existing features that do not work may be changed.
Every effort will be made to keep XLISP-STAT consistent with the
information in this tutorial, but if this is not possible the help
information should give accurate information about the current use of
a commend.
\newpage
\section{Starting and Finishing}
You should have the program XLISP-STAT on a Macintosh disk. XLISP-STAT
needs to have several files available for it to work properly. These
files are
\footnote{The file \dcode{xlisp.help} is optional. It may be
replaced by a reduced file \dcode{xlisp.help.small} or it may be
omitted entirely. If it is not present interactive help will not be
available.}:
\begin{center}{\bf
init.lsp\\
common.lsp\\
help.lsp\\
objects.lsp\\
menus.lsp\\
statistics.lsp\\
dialogs.lsp\\
graphics.lsp\\
graphics2.lsp\\
regression.lsp\\
xlisp.help}
\end{center}
Before starting XLISP-STAT you should make sure that these files are
in the same folder as the XLISP-STAT application.
To start XLISP-STAT double click on its icon. The program will need a
little time to start up and read in the files mentioned above. When
XLISP-STAT is ready the text in its command window will look something
like this\footnote{On a Macintosh with limited memory a dialog
warning about memory restrictions may be appear at this point. On a
Mac II it takes about a minute to load these files; on a Mac Plus or
an SE it takes about 3.5 minutes.}:
\begin{verbatim}
XLISP version 2.0, Copyright (c) 1988, by David Betz
XLISP-STAT version 2.0 , Copyright (c) 1988, by Luke Tierney.
Several files will be loaded; this may take a few minutes.
; loading "init.lsp"
; loading "common.lsp"
; loading "help.lsp"
; loading "objects.lsp"
; loading "menus.lsp"
; loading "statistics.lsp"
; loading "dialogs.lsp"
; loading "graphics.lsp"
; loading "graphics2.lsp"
; loading "regression.lsp"
>
\end{verbatim}
The final ``\verb+>+'' in the window is the XLISP-STAT prompt. Any
characters you type while the command window is active will be added
to the line after the final prompt. When you hit a {\em return},
XLISP-STAT will try to interpret what you have typed and will print a
response. For example, if you type a 1 and hit {\em return}\/ then
XLISP-STAT will respond by simply printing a 1 on the following line
and then give you a new prompt:
\begin{verbatim}
> 1
1
>
\end{verbatim}
If you type an {\em expression} like \dcode{(+ 1 2)}, then XLISP-STAT
will print the result of evaluating the expression and give you a new
prompt:
\begin{verbatim}
> (+ 1 2)
3
>
\end{verbatim}
As you have probably guessed, this expression means that the numbers 1
and 2 are to be added together. The next section will give more
details on how XLISP-STAT expressions work. In this tutorial I will
always show interactions with the program as I have done here: The
``\verb+>+'' prompt will appear before lines you should type.
XLISP-STAT will supply this prompt when it is ready; you should not
type it yourself. In later sections I will omit the new prompt
following the result in order to save space.
Now that you have seen how to start up XLISP-STAT it is a good idea to
make sure you know how to get out. As with many Macintosh programs the
easiest way to get out is to choose the \index{quit} \macbold{Quit}
command from the \macbold{File} menu. You can also use the command key
shortcut \COMMAND{Q}, or you can type the expression
\begin{verbatim}
> (exit)
\end{verbatim}
\index{exit}
Any one of these methods should cause the program to exit and return
you to the Finder.
\newpage
\section{Introduction to Basics}
Before we can start to use XLISP-STAT for statistical work we need to
learn a little about the kind of data XLISP-STAT uses and about how
the XLISP-STAT {\em listener}\/ and {\em evaluator}\/ work.
\subsection{Data}
XLISP-STAT works with two kinds of data: {\em simple
data}\/\index{simple data} and {\em compound data}\/\index{compound
data}. Simple data are numbers
\begin{verbatim}
1 ; an integer
-3.14 ; a floating point number
#C(0 1) ; a complex number (the imaginary unit)
\end{verbatim}
logical values
\begin{verbatim}
T ; true
nil ; false
\end{verbatim}
strings (always enclosed in double quotes)
\begin{verbatim}
"This is a string 1 2 3 4"
\end{verbatim}
and symbols (used for naming things; see the following section)
\begin{verbatim}
x
x12
12x
this-is-a-symbol
\end{verbatim}
Compound data are lists\index{list}
\begin{verbatim}
(this is a list with 7 elements)
(+ 1 2 3)
(sqrt 2)
\end{verbatim}
or \index{vector} vectors
\begin{verbatim}
#(this is a vector with 7 elements)
#(1 2 3)
\end{verbatim}
Higher dimensional arrays are another form of compound data; they will
be discussed below in Section \ref{Arrays}.
All the examples given above can be typed directly into the command
window as they are shown here. The next subsection describes what
XLISP-STAT will do with these expressions.
\subsection{The Listener and the Evaluator}
A session with XLISP-STAT basically consists of a conversation between
you and the {\em listener}. The listener is the window into which you
type your commands. When it is ready to receive a command it gives you
a prompt. At the prompt you can type in an expression. You can use the
mouse or the {\em backspace} key to correct any mistakes you make
while typing in your expression. When the expression is complete and
you type a {\em return}\/ the listener passes the expression on to the
{\em evaluator}. The evaluator evaluates the expression and returns
the result to the listener for printing. \footnote{It is possible to
make a finer distinction. The {\em reader} takes a string of
characters from the listener and converts it into an expression. The
{\em evaluator} evaluates the expression and the {\em printer}
converts the result into another string of characters for the listener
to print. For simplicity I will use {\em evaluator} to describe the
combination of these functions.} The evaluator is the heart of the
system.
The basic rule to remember in trying to understand how the evaluator
works is that everything is evaluated. Numbers and strings evaluate to
themselves:
\begin{verbatim}
> 1
1
> "Hello"
"Hello"
>
\end{verbatim}
Lists are more complicated. Suppose you type the list
\dcode{(+ 1 2 3)} at the listener. This list has four
elements: the symbol \dcode{+}\index{+} followed by the numbers 1, 2
and 3. Here is what happens:
\begin{verbatim}
> (+ 1 2 3)
6
>
\end{verbatim}
A list is evaluated as a function application. The first element
is a symbol representing a function, in this case the symbol \dcode{+}
representing the addition function. The remaining elements are the
arguments. Thus the list in the example above is interpreted to mean
``Apply the function \dcode{+} to the numbers 1, 2 and 3''.
Actually, the arguments to a function are always evaluated before the
function is applied. In the previous example the arguments are all
numbers and thus evaluate to themselves. On the other hand, consider
\begin{verbatim}
> (+ (* 2 3) 4)
10
>
\end{verbatim}
The evaluator has to evaluate the first argument to the function
\dcode{+} before it can apply the function.
Occasionally you may want to tell the evaluator {\em not}\/ to evaluate
something. For example, suppose we wanted to get the evaluator to
simply return the list \dcode{(+ 1 2)} back to us, instead of
evaluating it. To do this we need to {\em quote} our
list:\index{quote}
\begin{verbatim}
> (quote (+ 1 2))
(+ 1 2)
>
\end{verbatim}
\dcode{quote} is not a function. It does not obey the rules of function
evaluation described above: Its argument is not evaluated.
\dcode{quote} is called a {\em special form} -- special because it has
special rules for the treatment of its arguments. There are a few
other special forms that we will need; I will introduce them as they
are needed. Together with the basic evaluation rules described here
these special forms make up the basics of the Lisp language. The
special form \dcode{quote} is used so often that a shorthand notation
has been developed, a single quote before the expression you want to
quote:
\begin{verbatim}
> '(+ 1 2) ; single quote shorthand
\end{verbatim}
This is equivalent to \dcode{(quote (+ 1 2))}. Note that there is no
matching quote following the expression.
By the way, the semicolon ``\dcode{;}'' is the Lisp comment character.
Anything you type after a semicolon up to the next time you hit a
{\em return}\/ is ignored by the evaluator.
\subsection*{Exercises}
For each of the following expressions try to predict what the
evaluator will return. Then type them in, see what happens and try to
explain any differences.
\begin{enumerate}
\item \dcode{(+ 3 5 6)}
\item \dcode{(+ (- 1 2) 3)}
\item \dcode{'(+ 3 5 6)}
\item \dcode{'( + (- 1 2) 3)}
\item \dcode{(+ (- (* 2 3) (/ 6 2)) 7)}
\item \dcode{'x}
\end{enumerate}
Remember, to quit from XLISP-STAT choose \macbold{Quit} from the
\macbold{File} menu or type \dcode{(exit)}.
\newpage
\section{Elementary Statistical Operations}
\label{Elementary}
This section introduces some of the basic graphical and numerical
statistical operations that are available in XLISP-STAT.
\subsection{First Steps}
\label{Elementary.First}
Statistical data usually consists of groups of numbers. Devore and Peck
\cite[Exercise 2.11]{DevorePeck} describe an experiment in which 22
consumers reported the number of times they had purchased a product
during the previous 48 week period. The results are given as a table:
\begin{center} \begin{tabular}{rrrrrrrrrrr}
0 & 2 & 5 & 0 & 3 & 1 & 8 & 0 &
3 & 1 & 1\\
9 & 2 & 4 & 0 & 2 &
9 & 3 & 0 & 1 & 9 & 8
\end{tabular}
\end{center}
To examine this data in XLISP-STAT we represent it as a list of
numbers using the \dcode{list}\index{list} function:
\begin{verbatim}
> (list 0 2 5 0 3 1 8 0 3 1 1 9 2 4 0 2 9 3 0 1 9 8)
(0 2 5 0 3 1 8 0 3 1 1 9 2 4 0 2 9 3 0 1 9 8)
>
\end{verbatim}
Note that the numbers are separated by white space (spaces, tabs or even
returns), not commas.
The \dcode{mean}\index{mean} function can be used to compute the
average of a list of numbers. We can combine it with the \dcode{list}
function to find the average number of purchases for our sample:
\begin{verbatim}
> (mean (list 0 2 5 0 3 1 8 0 3 1 1 9 2 4 0 2 9 3 0 1 9 8))
3.227273
>
\end{verbatim}
The median\index{median} of these numbers can be computed as
\begin{verbatim}
> (median (list 0 2 5 0 3 1 8 0 3 1 1 9 2 4 0 2 9 3 0 1 9 8))
2
>
\end{verbatim}
It is of course a nuisance to have to type in the list of 22 numbers
every time we want to compute a statistic for the sample. To avoid
having to do this I will give this list a name using the
\dcode{def}\index{def} special form
\footnote{\dcode{def} acts like a special form, rather than a function,
since its first argument is not evaluated (otherwise you would have to
quote the symbol). Technically \dcode{def} is a macro, not a special
form, but I will not worry about this distinction in this tutorial.
\dcode{def} is closely related to the standard Lisp special forms
\dcode{setf} and \dcode{setq}. The advantage of using \dcode{def} is that
it adds your variable name to a list of \dcode{def}'ed variables that
you can retrieve using the function \dcode{variables}. If you use
\dcode{setf} or \dcode{setq} there is no easy way to find variables
you have defined, as opposed to ones that are predefined. \dcode{def}
always affects top level symbol bindings, not local bindings. It can
not be used in function definitions to change local bindings.}:
\begin{verbatim}
> (def purchases (list 0 2 5 0 3 1 8 0 3 1 1 9 2 4 0 2 9 3 0 1 9 8))
PURCHASES
>
\end{verbatim}
Now the symbol \dcode{purchases} has a value\index{symbol
value}\index{value} associated with it: Its value is our list of 22
numbers. If you give the symbol \dcode{purchases} to the evaluator
then it will find the value of this symbol and return that value:
\begin{verbatim}
> purchases
(0 2 5 0 3 1 8 0 3 1 1 9 2 4 0 2 9 3 0 1 9 8)
>
\end{verbatim}
We can now easily compute various numerical descriptive statistics for
this data set:
\begin{verbatim}
> (mean purchases)
3.227273
> (median purchases)
2
> (standard-deviation purchases)
3.279544
> (interquartile-range purchases)
3.5
>
\end{verbatim}
\index{standard-deviation}
\index{interquartile-range}
XLISP-STAT also supports elementwise arithmetic\index{elementwise
arithmetic} operations on lists of numbers. For example, we can add 1
to each of the purchases:\index{+}
\begin{verbatim}
> (+ 1 purchases)
(1 3 6 1 4 2 9 1 4 2 2 10 3 5 1 3 10 4 1 2 10 9)
>
\end{verbatim}
and after adding 1 we can compute the natural logarithms of the
results:
\index{log}
\begin{verbatim}
> (log (+ 1 purchases))
(0 1.098612 1.791759 0 1.386294 0.6931472 2.197225 0 1.386294 0.6931472
0.6931472 2.302585 1.098612 1.609438 0 1.098612 2.302585 1.386294 0
0.6931472 2.302585 2.197225)
>
\end{verbatim}
\subsection*{Exercises}
For each of the following expressions try to predict what the
evaluator will return. Then type them in, see what happens and try to
explain any differences.
\begin{enumerate}
\item \dcode{(mean (list 1 2 3))}
\item \dcode{(+ (list 1 2 3) 4)}
\item \dcode{(* (list 1 2 3) (list 4 5 6))}
\item \dcode{(+ (list 1 2 3) (list 4 5))}
\end{enumerate}
\subsection{Summary Statistics and Plots}
\label{Elementary.Summary}
Devore and Peck \cite[page 54, Table 10]{DevorePeck} give
precipitation levels recorded during the month of March in the
Minneapolis - St. Paul area over a 30 year period. Let's enter these
data into XLISP-STAT with the name \dcode{precipitation}:
\begin{verbatim}
> (def precipitation
(list .77 1.74 .81 1.20 1.95 1.20 .47 1.43 3.37 2.20 3.30
3.09 1.51 2.10 .52 1.62 1.31 .32 .59 .81 2.81 1.87
1.18 1.35 4.75 2.48 .96 1.89 .90 2.05))
PRECIPITATION
>
\end{verbatim}
In typing this expression I have hit the {\em return}\/ and {\em
tab}\/ keys a few times in order to make the typed expression easier
to read. The tab key indents the next line to a reasonable point to
make the expression more readable.
The \dcode{histogram}\index{histogram} and
\dcode{boxplot}\index{boxplot} functions can be used to obtain
graphical representations of this data set:
\begin{verbatim}
> (histogram precipitation)
#
> (boxplot precipitation)
#
>
\end{verbatim}
\begin{figure}
\centering
%\vspace{2.2in}
\centerline{\includegraphics{postscript/fig1.ps}}
\ncaption{Histogram of precipitation levels.}{Histogram}
\end{figure}
Each of these commands should cause a window with the appropriate
graph to appear on your screen. The windows should look something
like Figures \ref{Histogram} and \ref{Boxplot1}.
\begin{figure}\centering
%\vspace{3.8in}
\centerline{\includegraphics{postscript/fig2.ps}}
\ncaption{Boxplot of precipitation levels.}{Boxplot1}
\end{figure}
Note that as each graph appears it becomes the active window. To get
XLISP-STAT to accept further commands you have to click on the
XLISP-STAT listener window. You will have to click on the listener
window between the two commands shown here.
The two functions return results that are printed something like this:
\begin{verbatim}
#
\end{verbatim}
These result will be used later to identify the window containing the
plot. For the moment you can ignore them.
When you have several plot windows open you might want to close the
listener window so you can rearrange the plots more easily. You can do
this by clicking in the listener window's close box. You can later
re-open the listener window by selecting the \macbold{Show XLISP-STAT}
item on the \macbold{Command} menu.
Here are some numerical summaries:
\begin{verbatim}
> (mean precipitation)
1.685
> (median precipitation)
1.47
> (standard-deviation precipitation)
1.0157
> (interquartile-range precipitation)
1.145
>
\end{verbatim}
The distribution of this data set is somewhat skewed to the right.
Notice the separation between the mean and the median. You might want
to try a few simple transformations to see if you can symmetrize the
data. Square root and log transformations can be computed using the
expressions
\begin{verbatim}
(sqrt precipitation)
\end{verbatim}
and
\begin{verbatim}
(log precipitation).
\end{verbatim}
You should look at plots of the data to see if these transformations
do indeed lead to a more symmetric shape. The means and medians of
the transformed data are
\begin{verbatim}
> (mean (sqrt precipitation))
1.243006
> (median (sqrt precipitation))
1.212323
> (mean (log precipitation))
0.3405517
> (median (log precipitation))
0.384892
>
\end{verbatim}
The boxplot function can also be used to produce parallel
boxplots\index{parallel boxplot} of two or more samples. It will do so
if it is given a list of lists as its argument instead of a single
list. As an example, let's use this function to compare serum total
cholesterol values for samples of rural and urban Guatemalans (Devore
and Peck \cite[page 19, Example 3]{DevorePeck}):
\begin{verbatim}
> (def urban (list 184 196 217 284 184 236 189 206 179 170 205 190
204 330 217 242 222 242 249 241))
URBAN
> (def rural (list 166 146 144 204 158 143 158 180 223 194 194 175
171 155 143 145 131 181 148 144 220 129))
RURAL
>
\end{verbatim}
The parallel boxplot is obtained by
\begin{verbatim}
> (boxplot (list urban rural))
#
>
\end{verbatim}
and is shown in Figure \ref{Boxplot2}; the urban group is on the left.
\begin{figure}\centering
%\vspace{3.8in}
\centerline{\includegraphics{postscript/fig3.ps}}
\ncaption{Parallel box plots of cholesterol levels for urban and
rural guatemalans.}{Boxplot2}
\end{figure}
\subsection*{Exercises}
The following exercises involve examples and problems from Devore and
Peck \cite{DevorePeck}. The data sets are in files in the folder
\macbold{Data} on the XLISP-STAT distribution disk and can be read in
using the \index{load}\macbold{Load} item in the \macbold{File} menu
or using the \dcode{load} function (see Section \ref{Shortcuts.Load}
below). To use the \macbold{Load} item on the \macbold{File} menu
select this item from the menu. This will bring up an \macbold{Open
File} dialog window. Use this dialog to open the \macbold{Data} folder
on the distribution disk. Now select one of the \dcode{.lsp} files
(\dcode{car-prices.lsp} for the first exercise) and press the
\macbold{Open} button. The file will be loaded and some variables will
be defined for you. Loading file \dcode{car-prices.lsp} will define
the single variable \dcode{car-prices}. Loading file
\dcode{heating.lsp} will define two variables, \dcode{gas-heat} and
\dcode{electric-heat}.\footnote{On UNIX systems use the function
\dcode{load-data}. For example, evaluating the expression
\dcode{(load-data "car-prices")} should load the file
\dcode{car-prices.lsp}.}
\begin{enumerate}
\item
Devore and Peck \cite[page 18, Example 2]{DevorePeck} give advertised
prices for a sample of 50 used Japanese subcompact cars. Obtain some
plots and summary statistics for this data. Experiment with some
transformations of the data as well. The data set is called
\dcode{car-prices} in the file \dcode{car-prices.lsp}. The prices are
given in units of \$1000; thus the price 2.39 represents \$2390. The
data have been sorted by their leading digit.
\item
In Exercise 2.40 Devore and Peck \cite{DevorePeck} give heating costs
for a sample of apartments heated by gas and a sample of apartments
heated by electricity. Obtain plots and summary statistics for these
samples separately and look at a parallel box plot for the two
samples. These data sets are called \dcode{gas-heat} and
\dcode{electric-heat} in the file \dcode{heating.lsp}.
\end{enumerate}
\subsection{Two Dimensional Plots}
\label{Elementary.TwoDPlots}
Many single samples are actually collected over time. The
precipitation data set used above is an example of this kind of data.
In some cases it is reasonable to assume that the observations are
independent of one another, but in other cases it is not. One way to
check the data for some form of serial correlation or trend is to plot
the observations against time, or against the order in which they were
obtained. I will use the \dcode{plot-points} function to produce a
scatterplot of the precipitation data versus time. The
\dcode{plot-points} \index{plot-points} function is called as
\begin{verbatim}
(plot-points x-variable y-variable)
\end{verbatim}
Our $y$-variable will be \dcode{precipitation}, the variable we
defined earlier. As our $x$-variable we would like to use a sequence
of integers from 1 to 30. We could type these in ourselves, but there
is an easier way. The function \dcode{iseq},
\index{dcode} short for {\em integer-sequence}, generates a list of
consecutive integers between two specified values. The general form
for a call to this function is
\begin{verbatim}
(iseq start end).
\end{verbatim}
To generate the sequence we need we use
\begin{verbatim}
(iseq 1 30).
\end{verbatim}
Thus to generate the scatter plot we type
\begin{verbatim}
> (plot-points (iseq 1 30) precipitation)
#
>
\end{verbatim}
and the result will look like Figure \ref{Scatterplot1}.
\begin{figure}\centering
%\vspace{3.8in}
\centerline{\includegraphics{postscript/fig4.ps}}
\ncaption{Scatterplot of precipitation levels against
time.}{Scatterplot1}
\end{figure}
There does not appear to be much of a pattern to the data; an
independence assumption may be reasonable.
Sometimes it is easier to see temporal patterns in a plot if the
points are connected by lines. Try the above command with
\dcode{plot-points} replaced by
\dcode{plot-lines}.
The \dcode{plot-lines} \index{plot-lines} function can also be used to
construct graphs of functions. Suppose you would like a plot of
$\sin(x)$ from $-\pi$ to $+\pi$. The constant $\pi$ is predefined as
the variable \dcode{pi}\index{pi}. You can construct a list of $n$
equally spaced real numbers between $a$ and $b$ using the expression
\index{rseq}
\begin{verbatim}
(rseq a b n).
\end{verbatim}
Thus to draw the plot of $\sin(x)$ using 50 equally spaced points type
\begin{verbatim}
> (plot-lines (rseq (- pi) pi 50) (sin (rseq (- pi) pi 50)))
#
>
\end{verbatim}
The plot should look like Figure \ref{Lineplot1}.
\begin{figure}\centering
%\vspace{3.8in}
\centerline{\includegraphics[height=3.25in]{postscript/fig5.ps}}
\ncaption{A plot of $\sin(x)$.}{Lineplot1}
\end{figure}
Scatterplots are of course particularly useful for examining the
relationship between two numerical observations taken on the same
subject. Devore and Peck \cite[Exercise 2.33]{DevorePeck} give data
for HC and CO emission recorded for 46 automobiles. The results can be
placed in two variables, \dcode{hc} and \dcode{co}, and these variable
can then be plotted against one another with the \dcode{plot-points}
function:
\begin{verbatim}
> (def hc (list .5 .46 .41 .44 .72 .83 .38 .60 .83 .34 .37 .87
.65 .48 .51 .47 .56 .51 .57 .36 .52 .58 .47 .65
.41 .39 .55 .64 .38 .50 .73 .57 .41 1.02 1.10 .43
.41 .41 .52 .70 .52 .51 .49 .61 .46 .55))
HC
> (def co (list 5.01 8.60 4.95 7.51 14.59 11.53 5.21 9.62 15.13
3.95 4.12 19.00 11.20 3.45 4.10 4.74 5.36 5.69
6.02 2.03 6.78 6.02 5.22 14.67 4.42 7.24 12.30
7.98 4.10 12.10 14.97 5.04 3.38 23.53 22.92 3.81
1.85 2.26 4.29 14.93 6.35 5.79 4.62 8.43 3.99 7.47))
CO
> (plot-points hc co)
#
>
\end{verbatim}
The resulting plot is shown in Figure \ref{Scatterplot2}.
\begin{figure}\centering
%\vspace{3.8in}
\centerline{\includegraphics[height=3.25in]{postscript/fig6.ps}}
\ncaption{Plot of HC against CO.}{Scatterplot2}
\end{figure}
\subsection*{Exercises}
\begin{enumerate}
\item
Draw a graph of the function $f(x) = 2x + x^{2}$ between -2 and 3.
\item
Devore and Peck \cite[Exercise 4.2]{DevorePeck} give the age and CPK
concentration, a measure of metabolic activity, recorded for 18 cross
country skiers during a relay race. These data are in the variables
\dcode{age} and \dcode{cpk} in the file \dcode{metabolism.lsp}. Plot
the data and describe any relationship you observe between age and CPK
concentration.
\end{enumerate}
\subsection{Plotting Functions}
Plotting the sine function in the previous section was a bit
cumbersome. As an alternative we can use the function
\dcode{plot-function} \index{plot-function} to plot a function of one
argument over a specified range. We can plot the sine function using
the expression
\begin{verbatim}
(plot-function (function sin) (- pi) pi)
\end{verbatim}
The expression \dcode{(function sin)} is needed to extract the
function associated with the symbol \dcode{sin}. Just using
\dcode{sin} will not work. The reason is that a symbol in Lisp can
have both a {\em value}, perhaps set using \dcode{def}, and a {\em
function definition} at the same time.
\footnote{As an aside, a Lisp symbol can be thought of as a ``thing''
with four cells. These cells contain the symbol's print name, its
value, its function definition, and its property list. Lisp symbols
are thus much more like physical entities than variable identifiers in
FORTRAN or C.} This may seem a bit cumbersome at first, but it has one
great advantage: Typing an innocent expression like
\begin{verbatim}
(def list '(2 3 4))
\end{verbatim}
will not destroy the \dcode{list} function.
Extracting a function definition from a symbol is done almost as often
as quoting an expression, so again a simple shorthand notation is
available. The expression
\begin{verbatim}
#'sin
\end{verbatim}
is equivalent to the expression \dcode{(function sin)}. The short
form \dcode{\#'} is usually pronounced {\em sharp-quote}. Using this
abbreviation the expression for producing the sine plot can be written
as
\begin{verbatim}
(plot-function #'sin (- pi) pi).
\end{verbatim}
\newpage
\section{More on Generating and Modifying Data}
This section briefly summarizes some techniques for generating random
and systematic data.
\subsection{Generating Random Data}
XLISP-STAT has several functions for generating pseudo-random numbers.
For example, the expression
\begin{verbatim}
(uniform-rand 50)
\end{verbatim}
will generate a list of 50 independent uniform random variables. The
functions \dcode{normal-rand} and \dcode{cauchy-rand} work similarly.
Other generating functions require additional arguments to specify
distribution parameters. Here is a list of the functions available for
dealing with probability distributions:
\begin{center}
\begin{tabular}{llll}
\tt normal-cdf & \tt normal-quant & \tt normal-rand & \tt normal-dens\\
\tt cauchy-cdf & \tt cauchy-quant & \tt cauchy-rand & \tt cauchy-dens\\
\tt beta-cdf & \tt beta-quant & \tt beta-rand & \tt beta-dens\\
\tt gamma-cdf & \tt gamma-quant & \tt gamma-rand & \tt gamma-dens\\
\tt chisq-cdf & \tt chisq-quant & \tt chisq-rand & \tt chisq-dens\\
\tt t-cdf & \tt t-quant & \tt t-rand & \tt t-dens\\
\tt f-cdf & \tt f-quant & \tt f-rand & \tt f-dens\\
\\
\tt binomial-cdf & \tt binomial-quant & \tt binomial-rand & \tt binomial-pmf\\
\tt poisson-cdf & \tt poisson-quant & \tt poisson-rand & \tt poisson-pmf\\
\\
\tt bivnorm-cdf
\end{tabular}
\end{center}
More information on the required arguments is given in the appendix
in Section \ref{Appendix.Statistical}. The discrete quantile functions
{\tt binomial-quant} and {\tt poisson-quant} return values of a left
continuous inverse of the cdf. The pmf's for these distributions are
only defined for integer arguments. The quantile functions and random
variable generators for the beta, gamma, $\chi^{2}$, t and F
distributions are presently calculated by inverting the cdf and may
be a bit slow.
The state of the internal random number generator can be ``randomly''
reseeded, and the current value of the generator state can be saved. The
mechanism used is the standard Common Lisp mechanism. The current random
state is held in the variable \dcode{*random-state*}. The function
\dcode{make-random-state} can be used to set and save the state. It
takes an optional argument. If the argument is \NIL\ or omitted
\dcode{make-random-state} returns a copy of the current value of
\dcode{*random-state*}. If the argument is a state object a copy of it
is returned. If the argument is \dcode{t} a new, ``randomly''
initialized state object is produced and returned.
\footnote{The generator used is Marsaglia's portable generator from
the {\em Core Math Libraries} distributed by the National Bureau of
Standards. A state object is a vector containing the state information of
the generator. ``Random'' reseeding occurs off the system clock.}
\subsection{Generating Systematic Data}
We have already used the functions \dcode{iseq} and \dcode{rseq} to generate
equally spaced sequences of integers and real numbers. The function
\dcode{repeat} \index{repeat} is useful for generating sequences with a
particular pattern. The general form of a call to \dcode{repeat} is
\begin{verbatim}
(repeat list pattern)
\end{verbatim}
\dcode{pattern} must be either a single number or a list of numbers of
the same length as \dcode{list}. If \dcode{pattern} is a single number
then \dcode{repeat} simply repeats \dcode{list} \dcode{pattern} times.
For example
\begin{verbatim}
> (repeat (list 1 2 3) 2)
(1 2 3 1 2 3)
\end{verbatim}
If \dcode{pattern} is a list then each element of \dcode{list} is
repeated the number of times indicated by the corresponding element of
\dcode{pattern}. For example
\begin{verbatim}
> (repeat (list 1 2 3) (list 3 2 1))
(1 1 1 2 2 3)
\end{verbatim}
In Section \ref{MorePlots.Scatmat} below I generate the variables
\dcode{density} and \dcode{variety} by typing them in directly. Using
the \dcode{repeat} function we could have generated them like this:
\begin{verbatim}
(def density (repeat (repeat (list 1 2 3 4) (list 3 3 3 3)) 3))
(def variety (repeat (list 1 2 3) (list 12 12 12)))
\end{verbatim}
\subsection{Forming Subsets and Deleting Cases}
\label{MoreData.Subsets}
The \dcode{select} \index{dcode} function allows you to select a single element
or a group of elements from a list or vector. For example, if we define
\dcode{x} by
\begin{verbatim}
(def x (list 3 7 5 9 12 3 14 2))
\end{verbatim}
then \dcode{(select x i)} will return the $i$-th element of
\dcode{x}. Lisp, like the language C but in contrast to FORTRAN,
numbers elements of list and vectors starting at zero. Thus the
indices for the elements of \dcode{x} are 0, 1, 2, 3, 4, 5, 6, 7
\index{index base}. So
\begin{verbatim}
> (select x 0)
3
> (select x 2)
5
\end{verbatim}
To get a group of elements at once we can use a list of indices
instead of a single index:
\begin{verbatim}
> (select x (list 0 2))
(3 5)
\end{verbatim}
If you want to select all elements of \dcode{x} except element 2 you
can use the expression
\begin{verbatim}
(remove 2 (iseq 0 7))
\end{verbatim}
\index{remove} as the second argument to the function \dcode{select}:
\begin{verbatim}
> (remove 2 (iseq 0 7))
(0 1 3 4 5 6 7)
> (select x (remove 2 (iseq 0 7)))
(3 7 9 12 3 14 2)
\end{verbatim}
Another approach is to use the logical function \dcode{/=} \index{/=}
(meaning not equal) to tell you which indices are not equal to 2. The
function \dcode{which} \index{which} can then be used to return a list
of all the indices for which the elements of its argument are not \NIL:
\begin{verbatim}
> (/= 2 (iseq 0 7))
(T T NIL T T T T T)
> (which (/= 2 (iseq 0 7)))
(0 1 3 4 5 6 7)
> (select x (which (/= 2 (iseq 0 7))))
(3 7 9 12 3 14 2)
\end{verbatim}
This approach is a little more cumbersome for deleting a single element,
but it is more general. The expression
\dcode{(select x (which (< 3 x)))},
for example, returns all elements in \dcode{x} that are greater than 3:
\begin{verbatim}
> (select x (which (< 3 x)))
(7 5 9 12 14)
\end{verbatim}
\subsection{Combining Several Lists}
At times you may want to combine several short lists into a single
longer list. This can be done using the \dcode{append}
\index{append} function. For example, if you have three variables
\dcode{x}, \dcode{y} and \dcode{z} constructed by the expressions
\begin{verbatim}
(def x (list 1 2 3))
(def y (list 4))
(def z (list 5 6 7 8))
\end{verbatim}
then the expression
\begin{verbatim}
(append x y z)
\end{verbatim}
will return the list
\begin{verbatim}
(1 2 3 4 5 6 7 8).
\end{verbatim}
\subsection{Modifying Data}
So far when I have asked you to type in a list of numbers I have been
assuming that you will type the list correctly. If you made an error
you had to retype the entire \dcode{def} expression. Since you can use
cut--and--paste this is really not too serious. However it would be
nice to be able to replace the values in a list after you have typed
it in. The \dcode{setf} \index{setf} special form is used for this.
Suppose you would like to change the 12 in the list \dcode{x} used in
the Section \ref{MoreData.Subsets} to 11. The expression
\begin{verbatim}
(setf (select x 4) 11)
\end{verbatim}
will make this replacement:
\begin{verbatim}
> (setf (select x 4) 11)
11
> x
(3 7 5 9 11 3 14 2)
\end{verbatim}
The general form of \dcode{setf} is
\begin{verbatim}
(setf form value)
\end{verbatim}
where \dcode{form} is the expression you would use to select a single
element or a group of elements from \dcode{x} and \dcode{value} is the
value you would like that element to have, or the list of the values
for the elements in the group. Thus the expression
\begin{verbatim}
(setf (select x (list 0 2)) (list 15 16))
\end{verbatim}
changes the values of elements 0 and 2 to 15 and 16:
\begin{verbatim}
> (setf (select x (list 0 2)) (list 15 16))
(15 16)
> x
(15 7 16 9 11 3 14 2)
\end{verbatim}
A note of caution is needed here. Lisp symbols are merely labels for
different items. When you assign a name to an item with the
\dcode{def} command you are not producing a new item. Thus
\begin{verbatim}
(def x (list 1 2 3 4))
(def y x)
\end{verbatim}
means that \dcode{x} and \dcode{y} are two different names for the
same thing. As a result, if we change an element of (the item
referred to by) \dcode{x} with \dcode{setf} then we are also changing
the element of (the item referred to by) \dcode{y}, since both
\dcode{x} and \dcode{y} refer to the same item. If you want to make
a copy of \dcode{x} and store it in \dcode{y} before you make changes
to \dcode{x} then you must do so explicitly using, say, the
\dcode{copy-list} \index{copy-list} function. The expression
\begin{verbatim}
(def y (copy-list x))
\end{verbatim}
will make a copy of \dcode{x} and set the value of \dcode{y} to that
copy. Now \dcode{x} and \dcode{y} refer to different items and
changes to \dcode{x} will not affect \dcode{y}.
\newpage
\section{Some Useful Shortcuts}
\label{Shortcuts}
This section describes some additional features of XLISP-STAT that you
may find useful.
\subsection{Getting Help}
\label{Shortcuts.Help}
On line help is available for many of the functions in XLISP-STAT
\footnote{The on line help file may not be available on a single disk
version that includes a system file. Alternatively, there may be a
small help file available that does not contain documentation for all
functions.}.
As an example, here is how you would get help \index{help}
for the function \dcode{median}: \begin{verbatim} > (help 'median)
MEDIAN [function-doc]
Args: (x)
Returns the median of the elements of X.
NIL
>
\end{verbatim}
Note the quote in front of \dcode{median}. \dcode{help} is itself a
function, and its argument is the symbol representing the function
\dcode{median}. To make sure \dcode{help} receives the symbol, not
the value of the symbol, you need to quote the symbol.
If you are not sure about the name of a function you may still be able
to get some help. Suppose you want to find out about functions related
to the normal distribution. Most such functions will have ``norm'' as
part of their name. The expression \index{help*}
\begin{verbatim}
(help* 'norm)
\end{verbatim}
will print the help information for all symbols whose names contain
the string ``norm'':
\begin{verbatim}
> (help* 'norm)
------------------------------------------------------------------------------
Sorry, no help available on NORM
------------------------------------------------------------------------------
Sorry, no help available on NORM-2
------------------------------------------------------------------------------
Sorry, no help available on NORMAL
------------------------------------------------------------------------------
NORMAL-CDF [function-doc]
Args: (x)
Returns the value of the standard normal distribution function at X.
Vectorized.
------------------------------------------------------------------------------
NORMAL-DENS [function-doc]
Args: (x)
Returns the density at X of the standard normal distribution. Vectorized.
------------------------------------------------------------------------------
NORMAL-QUANT [function-doc]
Args (p)
Returns the P-th quantile of the standard normal distribution. Vectorized.
------------------------------------------------------------------------------
NORMAL-RAND [function-doc]
Args: (n)
Returns a list of N standard normal random numbers. Vectorized.
------------------------------------------------------------------------------
NIL
>
\end{verbatim}
The symbols \dcode{norm}, \dcode{norm-2} and \dcode{normal} do not have
function definitions and thus there is no help available for them. The
term {\em vectorized} in a function's documentation means the function
can be applied to arguments that are lists or arrays; the result will be a
list or array of the results of applying the function to each element of
its arguments. \footnote{This process of applying a function elementwise
to its arguments is called {\em mapping}.}
A related term appearing in the documentation of some functions is {\em
vector reducing}. A function is vector reducing if it is applied
recursively to its arguments until a single number results. The
functions \dcode{sum}, \dcode{prod}, \dcode{max} and \dcode{min} are
vector reducing.
The first time a help function is used will take some time -- the help
file is scanned and the positions of all entries in the file are
recorded. Subsequent calls will be faster.
Let me briefly explain the notation used in the information printed by
\dcode{help} to describe the arguments a function expects
\footnote{The notation used corresponds to the specification of the argument
lists in Lisp function definitions. See Section \ref{Fundefs} for more
information on defining functions.}. Most functions expect a fixed
set of arguments, described in the help message by a line like
\begin{verbatim} Args: (x y z)
\end{verbatim}
Some functions can take one or more optional arguments. The arguments
for such a function might be listed as
\begin{verbatim}
Args: (x &optional y (z t))
\end{verbatim}
This means that \dcode{x} is required and \dcode{y} and \dcode{z} are
optional. If the function is named \dcode{f}, it can be called as
\dcode{(f x-val)}, \dcode{(f x-val y-val)} or \dcode{(f x-val y-val z-val)}.
The list \dcode{(z t)} means that if \dcode{z} is not supplied its
default value is \dcode{T}. No explicit default value is specified for
\dcode{y}; its default value is therefore \dcode{NIL}. The arguments
must be supplied in the order in which they are listed. Thus if you
want to give the argument \dcode{z} you must also give a value for
\dcode{y}.
Another form of optional argument is the {\em keyword argument}. The
histogram function for example takes arguments
\begin{verbatim}
Args: (data &key (title "Histogram"))
\end{verbatim}
The \dcode{data} argument is required, the \dcode{title} argument is
an optional keyword argument. The default title is
\dcode{"Histogram"}. If you want to create a histogram of the
purchases data set used in Section \ref{Elementary.First} with title
\dcode{"Purchases"} use the expression
\begin{verbatim}
(histogram purchases :title "Purchases")
\end{verbatim}
Thus to give a value for a keyword argument you give the keyword
\footnote{Note that the keyword \dcode{:title} has not been quoted.
{\em Keyword symbols}, symbols starting with a colon, are somewhat
special. When a keyword symbol is created its value is set to itself.
Thus a keyword symbol effectively evaluates to itself and does not
need to be quoted.} for the argument, a symbol consisting of a colon
followed by the argument name, and then the value for the argument. If
a function can take several keyword arguments then these may be
specified in any order, following the required and optional arguments.
Finally, some functions can take an arbitrary number of arguments. This
is denoted by a line like
\begin{verbatim}
Args: (x &rest args)
\end{verbatim}
The argument \dcode{x} is required, and zero or more additional
arguments can be supplied.
In addition to providing information about functions \dcode{help} also
gives information about data types and certain variables. For example,
\begin{verbatim}
> (help 'complex)
COMPLEX [function-doc]
Args: (realpart &optional (imagpart 0))
Returns a complex number with the given real and imaginary parts.
COMPLEX [type-doc]
A complex number
NIL
>\end{verbatim}
shows the function and type documentation for {\tt complex}, and
\begin{verbatim}
> (help 'pi)
PI [variable-doc]
The floating-point number that is approximately equal to the ratio of the
circumference of a circle to its diameter.
NIL
> \end{verbatim}
shows the variable documentation for {\tt pi}\footnote{Actually {\tt pi}
represents a constant, produced with \dcode{defconst}. Its value can't be changed
by simple assignment.}.
\subsection{Listing and Undefining Variables}
After you have been working for a while you may want to find out what
variables you have defined (using \dcode{def}). The function
\dcode{variables} \index{variables} will produce a listing:
\begin{verbatim}
> (variables)
CO
HC
RURAL
URBAN
PRECIPITATION
PURCHASES
NIL
>
\end{verbatim}
If you are working on a 1Mb Macintosh you may occasionally want to free up some
space by getting rid of some variables you no longer need. You can do this
using the \dcode{undef} \index{undef} function:
\begin{verbatim}
> (undef 'co)
CO
> (variables)
HC
RURAL
URBAN
PRECIPITATION
PURCHASES
NIL
>
\end{verbatim}
\subsection{More on the XLISP-STAT Listener}
Because of the large number of parentheses involved, Lisp expressions
can be hard to read and type correctly. To make it easier to type
readable, correct expressions the listener window on the Macintosh has
the following features: \begin{itemize}
\item
Typing a closing parenthesis flashes the matching opening parenthesis.
\item
You can move the cursor with the arrow keys, the mouse or the {\em
backspace}\/ key to any position in the current input expression, not
just within the last line.
\item
If the current expression is more than one line long, hitting the {\em
tab}\/ key in any line but the first indents the line to its appropriate
position according to (more or less) standard rules for Lisp code
indentation.
\item
If the insertion point is at the end of the current expression hitting
the {\em enter}\/ key is equivalent to hitting a {\em return}. If the
insertion point is not at the end of the expression, hitting {\em
enter}\/ moves the insertion point to the end of the expression.
\end{itemize}
These four features should make typing expressions correctly much
easier. In particular, in translating mathematical formulas to Lisp it
sometimes seems that you have to do things backwards. Using these
features you can build up your expression from the inside out
\footnote{
To support these features the listener checks the current expression
each time you type a {\em return}\/ to see if it has a complete
expression. If so, the expression is passed to the reader and the
evaluator. If not, you can continue typing. There are some heuristics
involved here, and an expression with lots of quotes and comments may
cause trouble, but it seems to work. Redefining the read table in
major ways may not work as you might expect since some knowledge of
standard Lisp syntax is built in to the listener.}.
XLISP 2.0 provides a simple {\em command history}\/ mechanism. The
symbols {\tt -}, {\tt *}, {\tt **}, {\tt ***}, {\tt +}, {\tt ++}, and
{\tt +++} are used for this purpose. The top level reader binds these
symbols as follows:
\begin{center}
\begin{tabular}{rl}
{\tt -} & the current input expression\\
{\tt +} & the last expression read\\
{\tt ++} & the previous value of {\tt +}\\
{\tt +++} & the previous value of {\tt ++}\\
{\tt *} & the result of the last evaluation\\
{\tt **} & the previous value of {\tt *}\\
{\tt ***} & the previous value of {\tt **}
\end{tabular}
\end{center}
The variables {\tt *}, {\tt **} and {\tt ***} are probably most
useful. For example, if you construct a plot but forget to assign the
resulting plot object to a variable you can recover it using one of the
history variables: \begin{verbatim}
> (histogram (normal-rand 50))
#
> (def w *)
W
> w
#
>
\end{verbatim}
The symbol {\tt W} now has the histogram object as its value and can
be used to modify the plot, as described in Section
\ref{MorePlots.Modifying} below.
Like most interactive systems, XLISP needs a system for dynamically
managing memory. The system used by XLISP is to grab memory out of a
fixed bin until the bin is exhausted. At that point the system pauses
to reclaim memory that is no longer being used. This process, called
{\em garbage collection}, will occasionally cause the system to freeze
up for about a second. When the system garbage collects the Macintosh
cursor changes to a trash bag.
Occasionally a calculation will take too long, or it will appear to
have gotten stuck in some kind of loop. If you want to interrupt
\index{interrupt} the calculation {\em hold down}\/ the COMMAND key and
the PERIOD. This should return you to the listener. You must continue
to hold down the key until the calculation stops.
\subsection{Loading Files}
\label{Shortcuts.Load}
The data for the examples and exercises in this tutorial have been
stored on files with names ending in \dcode{.lsp}. On the
XLISP-STAT distribution disk they can be found in the folder
\dcode{Data}. Any variables you save (see the next subsection for
details) will also be saved on files of this form. The data in these
files can be read into XLISP-STAT with the \dcode{load} function. To
load \index{load} a file named \dcode{randu.lsp} type the expression
\begin{verbatim}
(load "randu.lsp")
\end{verbatim}
or just
\begin{verbatim}
(load "randu")
\end{verbatim}
If you give \dcode{load} a name that does not end in \dcode{.lsp} then load
will add this suffix. Alternatively, you can use the \macbold{Load}
command in the \macbold{File} menu. After loading a file using the
{\bf File} menu {\bf Load} item the system does not print a prompt.
Instead it prints a message indicating that the load is done:
\begin{verbatim}
> ; loading "temp.lsp"
; finished loading "temp.lsp"
\end{verbatim}
\subsection{Saving Your Work}
If you want to record a session with XLISP-STAT you can do so using the
\dcode{dribble} \index{dribble} function. The expression
\begin{verbatim}
(dribble "myfile")
\end{verbatim}
starts a recording. All expressions typed by you and all results typed
by XLISP-STAT will be entered into the file named \dcode{myfile}.
The expression
\begin{verbatim}
(dribble)
\end{verbatim}
stops the recording. Note that \dcode{(dribble "myfile")} starts a new
file by the name \dcode{myfile}. If you already have a file by that
name its contents will be lost. Thus you can't use dribble to toggle
on and off recording to a single file. You can also turn dribbling on
and off using the \macbold{Dribble} item on the \macbold{Command}
menu.
\dcode{dribble} only records text that is typed, not plots. However,
you can use the standard Macintosh shortcut COMMAND-SHIFT-3 to save a
MacPaint image of the current screen. You can also choose the
\macbold{Copy} command from the \macbold{Edit} menu, or its command
key shortcut \COMMAND{C}, while a plot window is the active window to
save the contents of the plot window to the clip board\index{clip
board}. You can then open the scrap book from the apple menu and paste
the plot into the scrap book.
Variables you define in XLISP-STAT only exist for the duration of the
current session. If you quit from XLISP-STAT, or the program crashes,
your data will be lost. To preserve your data you can use the
\dcode{savevar} \index{savevar} function. This function allows you to
save one or more variables into a file. Again a new file is created
and any existing file by the same name is destroyed. To save the
variable \dcode{precipitation} in a file called
\dcode{precipitation.lsp} type
\begin{verbatim}
(savevar 'precipitation "precipitation")
\end{verbatim}
Do not add the \dcode{.lsp} suffix yourself; \dcode{savevar} will
supply it. To save the two variables \dcode{precipitation} and
\dcode{purchases} in the file \dcode{examples.lsp} type
\footnote{I have used a quoted list \dcode{'(purchases precipitation)}
in this expression to pass the list of symbols to the \dcode{savevar}
function. A longer alternative would be the expression \dcode{(list
'purchases 'precipitation).}}.
\begin{verbatim}
(savevar '(purchases precipitation) "examples")
\end{verbatim}
The files \dcode{precipitation.lsp} and \dcode{examples.lsp} now
contain a set of expression that, when read in with the \dcode{load}
command, will recreate the variables \dcode{precipitation} and
\dcode{purchases}. You can look at these files with an editor like
MacWrite or the XLISP-STAT editor and you can prepare files with your
own data by following these examples.
\subsection{The XLISP-STAT Editor}
\label{Shortcuts.Editor}
The Macintosh version of XLISP-STAT includes a simple editor for
preparing data files and function definitions. To edit an existing
file select the \macbold{Open Edit} item on the \macbold{File} menu;
to start a new file select \macbold{New Edit}. Other commands on the
\macbold{File} menu can be used to save your file and to revert back
to the saved version. The editor can only handle text files of less
than 32K characters. As in the listener window, hitting the {\em tab}
key in any line but the first of a multiline expression will indent
the expression to a reasonable point. The editor also allows you to
select a section of text representing one or more Lisp expressions and
have these evaluated. To do this select the expressions you want to
evaluate and then choose \macbold{Eval Selection} from the
\macbold{Edit} menu. The returned values are not available, so this is
only useful for producing side effects, such as defining variables or
functions.
\subsection{Reading Data Files}
\index{reading data}
The data files we have used so far in this tutorial have been
processed to contain XLISP-STAT expressions. XLISP-STAT also provides
two functions for reading raw data files. The simpler of the two is
\dcode{read-data-file}. \index{ read- data-file} The expression
\begin{verbatim}
(read-data-file file)
\end{verbatim}
where \dcode{file} is a string representing the name of the data file,
returns a list of all the items in the file. Items can be separated by
any amount of white space, but not by commas or other punctuation
marks. Items can be any valid Lisp expressions. In particular they
can be numbers, strings or symbols. The list can then be manipulated
into the appropriate form within XLISP-STAT.
The function \dcode{read-data-columns} \index{read-data-columns} is
provided for reading data files in which each row represents a case
and each column a variable. The expression
\begin{verbatim}
(read-data-columns file cols)
\end{verbatim}
will return a list of \dcode{cols} lists, each representing a column
of the file. Note that this function determines the column structure
from the value of \dcode{cols}, not from the structure of the file.
The entries of \dcode{file} can be as for \dcode{read-data-file}.
These two functions should be adequate for most purposes. If you have
to read a file that does not fit into the form considered here you can
use the raw file handling functions of XLISP.
\subsection{User Initialization File}
After loading in all its program files and before giving you your
first prompt XLISP-STAT looks to see if there is a file named
\dcode{statinit.lsp} \index{statinit.lsp} in the startup folder. If
there is one it will be loaded. You can use this file to load any
data sets you would like to have available or to define functions of
your own.
\newpage
\section{More Elaborate Plots}
\label{MorePlots}
The plots described so far were designed for exploring the
distribution of a single variable and the relationship among two
variables. This section describes some plots and techniques that are
useful for exploring the relationship among three or more variables.
The techniques and plots described in the first four subsections are
simple, requiring only simple commands to the listener and mouse
actions. The fifth subsection introduces the concept of a {\em plot
object} and the idea of {\em sending messages} to an object from the
listener window. These ideas are used to add lines and curves to
scatter plots, and the basic concepts of objects and messages will be
used again in the next section on regression models. The final
subsection shows how Lisp iteration can be used to construct a dynamic
simulation -- a plot movie.
\subsection{Spinning Plots}
\label{MorePlots.Spinplot}
If we are interested in exploring the relationship among three
variables then it is natural to imagine constructing a three
dimensional scatterplot of the data. Of course we can only see a two
dimensional projection of the plot on a computer screen -- any depth
that you might be able to perceive by looking at a wire model of the
data is lost. One approach to try to recover some of this depth
perception is to rotate the points around some axis. The
\dcode{spin-plot} \index{spin-plot} function allows you to construct a
rotatable three dimensional plot.
As an example let's look a some data collected to examine the
relationship between a phosphate absorption index and the amount of
extractable iron and aluminum in a sediment (Devore and
Peck \cite[page 509, Example 6]{DevorePeck}). The data can
be entered with the expressions
\begin{verbatim}
(def iron (list 61 175 111 124 130 173 169 169 160 224 257 333 199))
(def aluminum (list 13 21 24 23 64 38 33 61 39 71 112 88 54))
(def absorption (list 4 18 14 18 26 26 21 30 28 36 65 62 40))
\end{verbatim}
The expression
\begin{verbatim}
(spin-plot (list absorption iron aluminum))
\end{verbatim}
produces the plot on the left in Figure \ref{Spinplot1}.
\begin{figure}\centering
%\vspace{3.8in}
\centerline{\includegraphics{postscript/fig7.ps}}
\wcaption{Two views of a rotatable plot of data on iron content,
aluminum content and phosphate absorption in sediment
samples.}{Spinplot1}
\end{figure}
The argument to \dcode{spin-plot} is a list of three lists or vectors,
representing the $x$, $y$ and $z$ variables. The $z$ axis is
initially pointing out of the screen. You can rotate the plot by pointing
at one of the \dcode{Pitch}, \dcode{Roll} or \dcode{Yaw} squares and
pressing the mouse button. By rotating the plot you can see that the
points seem to fall close to a plane. The plot on the right of Figure
\ref{Spinplot1} shows the data viewed along the plane. A linear model
should describe this data quite well.
As a second example, with the data defined by
\begin{verbatim}
(def strength (list 14.7 48.0 25.6 10.0 16.0 16.8 20.7 38.8
16.9 27.0 16.0 24.9 7.3 12.8))
(def depth (list 8.9 36.6 36.8 6.1 6.9 6.9 7.3 8.4 6.5 8.0 4.5 9.9 2.9 2.0))
(def water (list 31.5 27.0 25.9 39.1 39.2 38.3 33.9 33.8
27.9 33.1 26.3 37.8 34.6 36.4))
\end{verbatim}
(Devore and Peck\cite[Problem 12.18]{DevorePeck}) the expression
\begin{verbatim}
(spin-plot (list water depth strength)
:variable-labels (list "Water" "Depth" "Strength"))
\end{verbatim}
\begin{figure}\centering
%\vspace{3.8in}
\centerline{\includegraphics{postscript/fig8.ps}}
\ncaption{Rotatable plot of measurements on permafrost
samples.}{Spinplot2}
\end{figure}
produces a plot that can be rotated to produce the view in Figure
\ref{Spinplot2}. These data concern samples of thawed permafrost soil.
\dcode{strength} is the shear strength, and \dcode{water} is the
percentage water content. \dcode{depth} is the depth at which the
sample was taken. The plot shows that a linear model will not fit well.
Devore and Peck \cite{DevorePeck} suggest fitting a model with
quadratic terms to this data.
\index{spin-plot:changing origin}
The function \dcode{spin-plot} also accepts the additional keyword
argument \dcode{scale}. If \dcode{scale} is \dcode{T}, the default,
then the data are centered at the midranges of the three variables,
and all three variables are scaled to fit the plot. If \dcode{scale}
is \NIL\ the data are assumed to be scaled between -1 and 1, and the
plot is rotated about the origin. Thus if you want to center your plot
at the means of the variables and scale all observations by the same
amount you can use the expression
\begin{verbatim}
(spin-plot (list (/ (- water (mean water)) 20)
(/ (- depth (mean depth)) 20)
(/ (- strength (mean strength)) 20))
:scale nil)
\end{verbatim}
Note that the \dcode{scale} keyword argument is given using the
corresponding keyword symbol, the symbol \dcode{scale} preceded by a
colon.
Rotation speed can be changed using the plot menu or the keyboard
equivalents \COMMAND{F}\ for Faster and \COMMAND{S}\ for Slower.
Depth cuing and showing of the axes are controlled by items on the
plot menu.
If you click the mouse in one of the \dcode{Pitch}, \dcode{Roll} or
\dcode{Yaw} squares while holding down the {\em shift}\/ key the plot
will start to rotate and continue to rotate after the mouse button has
been released.
\subsection*{Exercises}
\begin{enumerate}
\item
An upstate New York business machine company used to include a random
number generator called RANDU in its software library. This generator
was supposed to produce numbers that behaved like $i. i. d.$ uniform
random variables. The data set \dcode{randu} in the file
\dcode{randu.lsp} in the \macbold{Data} folder consists of a list of
three lists of numbers. These lists are consecutive triples of numbers
produced by RANDU. Use \dcode{spin-plot} to see if you can spot any
unusual features in this sample.
\end{enumerate}
\subsection{Scatterplot Matrices}
\label{MorePlots.Scatmat}
Another approach to graphing a set of variables is to look at a matrix
of all possible pairwise scatterplots of the variables. The
\dcode{scatterplot-matrix}\index{scatterplot-matrix} function will
produce such a plot. The data
\begin{verbatim}
(def hardness (list 45 55 61 66 71 71 81 86 53 60 64 68 79 81 56
68 75 83 88 59 71 80 82 89 51 59 65 74 81 86))
(def tensile-strength (list 162 233 232 231 231 237 224 219 203 189
210 210 196 180 200 173 188 161 119 161
151 165 151 128 161 146 148 144 134 127))
(def abrasion-loss (list 372 206 175 154 136 112 55 45 221 166 164
113 82 32 228 196 128 97 64 249 219 186
155 114 341 340 284 267 215 148))
\end{verbatim}
were produced in a study of the abrasion loss in rubber tires and the
expression
\begin{verbatim}
(scatterplot-matrix (list hardness tensile-strength abrasion-loss)
:variable-labels
(list "Hardness" "Tensile Strength" "Abrasion Loss"))
\end{verbatim}
produces the scatterplot matrix in Figure \ref{Scatmat1}.
\begin{figure}\center
%\vspace{3.47in}
\centerline{\includegraphics{postscript/fig9.ps}}
\ncaption{Scatterplot matrix of abrasion loss data.}{Scatmat1}
\end{figure}
The plot of \dcode{abrasion-loss} against \dcode{tensile-strength}
gives you an idea of the joint variation in these two variables. But
\dcode{hardness} varies from point to point as well. To get an
understanding of the relationship among all three variables it would
be nice to be able to fix \dcode{hardness} at various levels and look
at the way the plot of \dcode{abrasion-loss} against
\dcode{tensile-strength} changes as you change these levels. You can
do this kind of exploration in the scatterplot matrix by using the two
highlighting techniques {\em selecting}\/ \index{selecting} and {\em
brushing}\index{brushing}.
\begin{itemize}
\item[]
{\em Selecting}. Your plot is in the selecting mode when the cursor is
an arrow. This is the default setting. In this mode you can select a
point by clicking the mouse on top of it. To select a group of points
drag a selection rectangle around the group. If the group does not fit
in a rectangle you can build up your selection by holding down the
{\em shift}\/ key as you click or drag. If you click without the {\em
shift}\/ key any existing selection will be unselected; when the {\em
shift} key is down selected points remain selected.
\item[]
{\em Brushing}. You can enter the brushing mode by choosing
\macbold{Mouse Mode...} from the \macbold{Scatmat} menu and selecting
\macbold{Brushing} from the dialog box that is popped up. In this mode
the cursor will look like a paint brush and a dashed rectangle, the
{\em brush}, will be attached to your cursor. As you move the brush
across the plot points in the brush will be highlighted. Points
outside of the brush will not be highlighted unless they are marked as
selected. To select points in the brushing mode (make their
highlighting permanent) hold the mouse button down as you move the
brush.
\end{itemize}
In the plot in Figure \ref{Scatmat2}
\begin{figure}\centering
%\vspace{3.47in}
\centerline{\includegraphics{postscript/fig10.ps}}
\ncaption{Scatterplot matrix with middle hardness values
highlighted.}{Scatmat2}
\end{figure}
the points within the middle of the \dcode{hardness} range have been
highlighted using a long, thin brush (you can change the size of your
brush using the \macbold{Resize Brush} command on the \macbold{Scatmat}
menu). In the plot of \dcode{abrasion-loss} against
\dcode{tensile-strength} you can see that the highlighted points seem
to follow a curve. If you want to fit a model to this data this
suggests fitting a model that accounts for this curvature.
A scatterplot matrix is also useful for examining the relationship
between a quantitative variable and several categorical variables.
In the data \begin{verbatim}
(def yield (list 7.9 9.2 10.5 11.2 12.8 13.3 12.1 12.6 14.0 9.1 10.8 12.5
8.1 8.6 10.1 11.5 12.7 13.7 13.7 14.4 15.5 11.3 12.5 14.5
15.3 16.1 17.5 16.6 18.5 19.2 18.0 20.8 21 17.2 18.4 18.9 ))
(def density (list 1 1 1 2 2 2 3 3 3 4 4 4 1 1 1 2 2 2 3 3 3 4 4 4
1 1 1 2 2 2 3 3 3 4 4 4))
(def variety (list 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2
2 2 2 3 3 3 3 3 3 3 3 3 3 3 3))
\end{verbatim}
(Devore and Peck \cite[page 595, Example 14]{DevorePeck}) the
yield of tomato plants is recorded for an experiment run at four
different planting densities and using three different varieties. In
the plot in Figure \ref{Scatmat3}
\begin{figure}\centering
%\vspace{3.47in}
\centerline{\includegraphics{postscript/fig11.ps}}
\ncaption{Scatterplot matrix for tomato yield data with points from
the third variety highlighted.}{Scatmat3}
\end{figure}
a long, thin brush has been used to highlight the points in the third
variety. If there is no interaction between the varieties and the
density then the shape of the highlighted points should move
approximately in parallel as the brush is moved from one variety to
another.
Like \dcode{spin-plot}, the function \dcode{scatterplot-matrix} also
accepts the optional keyword argument \dcode{scale}.
\subsection*{Exercises}
\begin{enumerate}
\item
Devore and Peck \cite[Exercise 13.62]{DevorePeck} describe an
experiment to determine the effect of oxygen concentration on
fermentation end products. Four oxygen concentrations and two types of
sugar were used. The data are
\begin{verbatim}
(def ethanol (list .59 .30 .25 .03 .44 .18 .13 .02 .22 .23 .07
.00 .12 .13 .00 .01))
(def oxygen (list 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4))
(def sugar (list 1 1 2 2 1 1 2 2 1 1 2 2 1 1 2 2 ))
\end{verbatim}
and are on file the \dcode{oxygen.lsp}. Use a scatterplot matrix to
examine these data.
\item
Use scatterplot matrices to examine the data in the examples and
exercises of Section \ref{MorePlots.Spinplot} above.
\end{enumerate}
\subsection{Interacting with Individual Plots}
Rotating plots and scatterplot matrices are interactive plots. Simple
scatter plots also allow some interaction: If you select the
\macbold{Show Labels} option in the plot menu a label will appear next
to a highlighted point. You can use either the selecting or the
brushing mode to highlight points. The default labels are of the form
``0'', ``1'', \ldots~. (In Lisp it is conventional to start numbering
indices with 0, rather than 1.) Most plotting functions allow you to
supply a list of case labels using the \dcode{:point-labels} keyword.
Another option, useful in viewing large data sets, is to remove a
subset of the points from your plot. This can be done by selecting
the points you want to remove and then choosing \macbold{Remove
Selection} from the plot menu. The plot can then be rescaled using the
\macbold{Rescale Plot} option. Alternatively, the \macbold{Focus on
Selection} menu item removes all unselected points from the plot.
When a set of points is selected in a plot you can change the symbol
used to display the points using the \macbold{Selection Symbol} item.
On systems with color monitors you can set the color of selected
points with the \macbold{Selection Color} item.
You can save the indices of the selected points in a variable by
choosing the \macbold{Selection...} item in the plot menu. A dialog
will ask you for a name for the selection. When no points are
selected you can use the \macbold{Selection...} menu item to specify
the indices of the points to select. A dialog will ask you for an
expression for determining the selection to use. The expression can
be any Lisp expression that evaluates to a list of indices.
\subsection{Linked Plots}
When you brush or select in a scatterplot matrix you are looking at
the interaction of a set of separate scatterplots\footnote{According
to Stuetzle \cite{Stuetzle} the idea to link several plots was first
suggested by McDonald \cite{McDonaldThesis}.}. You can construct your
own set of interacting plots by choosing the \index{link view}
\macbold{Link View} option from the menus of the plots you want to
link. For example, using the data from Exercise 1 in Section
\ref{MorePlots.Scatmat} we can put \dcode{ethanol} and \dcode{oxygen}
in a scatterplot and \dcode{sugar} in a histogram. If we link these
two plots then selecting one of the two sugar groups in the histogram
highlights the corresponding points in the scatterplot, as shown in
Figure \ref{ScatterplotHist}.
\begin{figure}\centering
%\vspace{3.47in}
\centerline{\includegraphics{postscript/fig12.ps}}
\wcaption{Scatterplot and histogram with points from one sugar group
highlighted.}{ScatterplotHist}
\end{figure}
If you want to be able to select the points with particular labels
you can use the \dcode{name-list} function to generate a window with a
list of names in it. This window can be linked with any plot, and
selecting a name in a name list will then highlight the
corresponding points in the linked plots. You can use the
\dcode{name-list} \index{name-list} function with a numerical argument;
e. g. \begin{verbatim}
(name-list 10)
\end{verbatim}
will generate a list with the names ``0'' , \ldots, ``9'', or
you can give it a list of case labels of your own.
\subsection*{Exercise}
Try to use linked scatter plots and histograms on the data in the
examples and exercises of Sections \ref{MorePlots.Spinplot} and
\ref{MorePlots.Scatmat}.
\subsection{Modifying a Scatter Plot}
\label{MorePlots.Modifying}
After producing a scatterplot of a data set we might like to add a
line, a regression line for example, to the plot. As an example,
Devore and Peck \cite[page 105, Example 2]{DevorePeck} describe a
data set collected to examine the effect of bicycle lanes on drivers
and bicyclists. The variables given by
\begin{verbatim}
(def travel-space (list 12.8 12.9 12.9 13.6 14.5 14.6 15.1 17.5 19.5 20.8))
(def separation (list 5.5 6.2 6.3 7.0 7.8 8.3 7.1 10.0 10.8 11.0))
\end{verbatim}
represent the distance between the cyclist and the roadway center line
and the distance between the cyclist and a passing car, respectively,
recorded in ten cases. A regression line fit to these data, with
\dcode{separation} as the dependent variable, has a slope of 0.66 and an
intercept of -2.18. Let's see how to add this line to a scatterplot of
the data.
We can use the expression
\begin{verbatim}
(plot-points travel-space separation)
\end{verbatim}
to produce a scatterplot of these points. To be able to add a line to
the plot, however, we must be able to refer to it within XLISP-STAT.
To accomplish this let's assign the result returned by the
\dcode{plot-points} function to a symbol
\footnote{The result returned by \dcode{plot-points} is printed something
like \dcode{\#}. This
is not the value returned by the function, just its printed
representation . There are several other data types that are printed
this way; {\em file streams}, as returned by the \dcode{open} function,
are one example. For the most part you can ignore these printed
results. There is one unfortunate feature, however: the form
\dcode{\#<...>} means that there is no printed form of this data type
that the Lisp reader can understand. As a result, if you forget to give
your plot a name you can't cut and paste the result into a
\dcode{def} expression -- you have to redo the plot or use the history
mechanism.}:
\begin{verbatim}
(def myplot (plot-points travel-space separation))
\end{verbatim}
The result returned by plot-points is an XLISP-STAT {\em object}
\index{object}. To use an object you have to send it {\em messages}.
\index{message} This is done using the \dcode{send} function, as in
the expression
\begin{verbatim}
(send object message argument1 ... )
\end{verbatim}
I will use the expression
\begin{verbatim}
(send myplot :abline -2.18 0.66)
\end{verbatim}
to tell \dcode{myplot} to add the graph of a line $a + b x$, with $a =
-2.18$ and $b = 0.66$, to itself. The {\em message selector} is
\index{plot messages:abline} \dcode{:abline}, the numbers -2.18 and
0.66 are the arguments. The message consists of the selector
and the arguments. Message selectors are always Lisp keywords; that is,
they are symbols that start with a colon. Before typing in this
expression you might want to resize and rearrange the listener window
so you can see the plot and type commands at the same time. Once you
have resized the listener window you can easily switch between the
small window and a full size window by clicking in the zoom box at the
right corner of the window title bar. The resulting plot is shown
in Figure \ref{Scatterplot4}
\begin{figure}\centering
%\vspace{3.47in}
\centerline{\includegraphics{postscript/fig13.ps}}
\ncaption{Scatterplot of bicycle data with fitted line.}{Scatterplot4}
\end{figure}
Scatter plot objects understand a number of other messages. One other
message is the \index{plot messages:help} \dcode{:help} message
\footnote{To keep things simple I will use the term {\em message}
to refer to a message corresponding to a message selector.}:
\begin{verbatim}
> (send myplot :help)
> (send scatterplot-proto :help)
SCATTERPLOT-PROTO
Scatterplot.
Help is available on the following:
:ABLINE :ACTIVATE :ADD-FUNCTION :ADD-LINES :ADD-METHOD :ADD-MOUSE-MODE
:ADD-POINTS :ADD-SLOT :ADD-STRINGS :ADJUST-HILITE-STATE
:ADJUST-POINT-SCREEN-STATES :ADJUST-POINTS-IN-RECT :ADJUST-TO-DATA
:ALL-POINTS-SHOWING-P :ALL-POINTS-UNMASKED-P :ALLOCATE
:ANY-POINTS-SELECTED-P :APPLY-TRANSFORMATION :BACK-COLOR :BRUSH
:BUFFER-TO-SCREEN :CANVAS-HEIGHT :CANVAS-WIDTH :CLEAR :CLEAR-LINES
:CLEAR-MASKS :CLEAR-POINTS :CLEAR-STRINGS ....
\end{verbatim}
The list of topics will be the same for all scatter plots but will be
somewhat different for rotating plots, scatterplot matrices or
histograms.
The \index{plot messages:clear} \dcode{:clear} message, as its name
suggests, clears the plot and allows you to build up a new plot from
scratch. Two other useful messages are \index{plot
messages:add-points} \dcode{:add-points} and \index{plot
messages:add-lines} \dcode{:add-lines}. To find out how to use them we
can use the \dcode{:help} message with \dcode{:add-points} or
\dcode{:add-lines} as arguments:
\begin{verbatim}
> (send myplot :help :add-points)
:ADD-POINTS
Method args: (points &key point-labels (draw t))
Adds points to plot. POINTS is a list of sequences, POINT-LABELS a list of
strings. If DRAW is true the new points are added to the screen.
NIL
> (send myplot :help :add-lines)
:ADD-LINES
Method args: (lines &key type (draw t))
Adds lines to plot. LINES is a list of sequences, the coordinates of the line
starts. TYPE is normal or dashed. If DRAW is true the new lines are added to the
screen.
NIL
>
\end{verbatim}
The plot produced above shows some curvature in the data. A
regression of \dcode{separation} on a linear and a quadratic term in
\dcode{travel-space} produces estimates of -16.41924 for the constant,
2.432667 as the coefficient of the linear term and -0.05339121 as the
coefficient of the quadratic term. Let's use the \dcode{:clear},
\dcode{:add-points} and \dcode{:add-lines} messages to change \dcode{myplot} to
show the data along with the fitted quadratic model. First we use the
expressions
\begin{verbatim}
(def x (rseq 12 22 50))
(def y (+ -16.41924 (* 2.432667 x) (* -0.05339121 (* x x))))
\end{verbatim}
to define \dcode{x} as a grid of 50 equally spaced points between 12 and 22 and
\dcode{y} as the fitted response at these \dcode{x} values. Then the expressions
\begin{verbatim}
(send myplot :clear)
(send myplot :add-points travel-space separation)
(send myplot :add-lines x y)
\end{verbatim}
change myplot to look like Figure \ref{Scatterplot5}.
\begin{figure}\centering
%\vspace{3.47in}
\centerline{\includegraphics{postscript/fig14.ps}}
\ncaption{Scatterplot of bicycle data with fitted curve.}{Scatterplot5}
\end{figure}
Of course we could have used \dcode{plot-points} to get a new plot and just
modified that plot with \dcode{:add-lines}, but the approach used here
allowed us to try out all three messages.
\subsection{Dynamic Simulations}
\label{MorePlots.Dynamic}
As another illustration of what you can do by modifying existing plots let's
construct a \index{dynamic simulation} dynamic simulation -- a movie -- to
examine the variation in the shape of histograms of samples from a standard
normal distribution. To start off, use the expression
\begin{verbatim}
(def h (histogram (normal-rand 20)))
\end{verbatim}
to construct a single histogram and save its plot object as \dcode{h}. The
\dcode{:clear} message is available for histograms as well. As you can
see from its help message
\begin{verbatim}
> (send h :help :clear)
:CLEAR
Message args: (&key (draw t))
Clears the plot data. If DRAW is nil the plot is redrawn; otherwise its
current screen image remains unchanged.
NIL
>
\end{verbatim}
the \dcode{:clear} message takes an optional keyword argument. If this
argument is \NIL\ then the plot will not actually be redrawn until
some other event causes it to be redrawn. This is useful for dynamic
simulations. Rearrange and resize your windows until you can see the
histogram window even when the listener window is the active window.
Then type the expression\index{dotimes}
\footnote{\dcode{dotimes} is one of several Lisp looping constructs. It is
a special form with the syntax \dcode{(dotimes (var count) expr ....)}.
The loop is repeated \dcode{count} times, with \dcode{var} bound to 0, 1,
\ldots, \dcode{count} - 1. Other looping constructs are \dcode{dolist},
\dcode{do} and \dcode{do*}.}
\begin{verbatim}
(dotimes (i 50)
(send h :clear :draw nil)
(send h :add-points (normal-rand 20)))
\end{verbatim}
This should produce a sequence of 50 histograms, each based on a
sample of size 20. By giving the keyword argument \dcode{draw} with
value \NIL\ to the \dcode{:clear} message you have insured that each
histogram stays on the screen until the next one is ready to be drawn.
Try the example again without this argument and see what difference it
makes.
Programmed dynamic simulations may provide another approach to viewing
the relation among several variables. As a simple example, suppose we
want to examine the relation between the variables \dcode{abrasion-loss}
and \dcode{hardness} introduced in Section \ref{MorePlots.Scatmat} above.
Let's start with a histogram of \dcode{abrasion-loss} produced by the
expression
\begin{verbatim}
(def h (histogram abrasion-loss))
\end{verbatim}
The messages \dcode{:point-selected} \index{plot messages:point-selected},
\dcode{:point-hilited} \index{plot messages:point-hilited} and
\dcode{:point-showing} \index{plot messages:point-showing} are
particularly useful for dynamic simulations. Here is the help information
for \dcode{:point-selected} in a histogram:
\begin{verbatim}
> (send h :help :point-selected)
:POINT-SELECTED
Method args: (point &optional selected)
Sets or returns selection status (true or NIL) of POINT. Sends
:ADJUST-POINT-SCREEN-STATES message if states are set. Vectorized.
NIL
>
\end{verbatim}
Thus you can use this message to determine whether a point is
currently selected and also to select or unselect it. Again
rearrange the windows so you can see the histogram while typing in the
listener window and type the expression\index{dolist}
\begin{verbatim}
(dolist (i (order hardness))
(send h :point-selected i t)
(send h :point-selected i nil))
\end{verbatim}
The expression \dcode{(order hardness)} \index{order} produces the
list of indices of the ordered values of hardness. Thus the first
element of the result is the index of the smallest element of
hardness, the second element is the index of the second smallest
element of hardness, etc.. The loop moves through each of these
indices and selects and unselects the corresponding point.
The result on the screen is very similar to the result of brushing a
\dcode{hardness} histogram linked to an \dcode{abrasion-loss} histogram
from left to right. The drawback of this approach is that it is harder to
write an expression than to use a mouse. On the other hand, when brushing
with a mouse you tend to focus your attention on the plot you are brushing,
rather than on the other linked plots. When you run a dynamic simulation
you do not have to do anything while the simulation is running and can
therefore concentrate fully on the results.
An intermediate solution is possible: You can set up a dialog window
with a scroll bar that will run through the indices in the list
\dcode{(order hardness)}, selecting the corresponding point as it is
scrolled. An example in Section \ref{Fundefs} will show you how to do this.
Like most Lisp systems XLISP-STAT is not ideally suited to real time
simulations because of the need for garbage collection, to reclaim
dynamically allocated storage. This is the reason that the simulations
in this section will occasionally pause for a second or two.
Nevertheless, in a simple simulation like the last one each iteration
may still be too fast for you to be able to pick up any pattern. To
slow things down you can add some extra busy work to each iteration.
For example, you could use
\begin{verbatim}
(dolist (i (order hardness))
(send h :point-selected i t)
(dotimes (i 100))
(send h :point-selected i nil))
\end{verbatim}
in place of the previous expression.
\newpage
\section{Regression}
\label{Regression}
Regression models have been implemented using XLISP-STAT's object and
message sending facilities. These were introduced above in Section
\ref{MorePlots.Modifying}. You might want to review that section
briefly before reading on.
Let's fit a simple regression \index{regression} model to the bicycle
data of Section \ref{MorePlots.Modifying}. The dependent variable is
\dcode{separation} and the independent variable is \dcode{travel-space}. To
form a regression model use the \dcode{regression-model}
\index{regression-model} function:
\begin{verbatim}
> (regression-model travel-space separation)
Least Squares Estimates:
Constant -2.182472 (1.056688)
Variable 0 0.6603419 (0.06747931)
R Squared: 0.922901
Sigma hat: 0.5821083
Number of cases: 10
Degrees of freedom: 8
#
>
\end{verbatim}
The basic syntax for the \dcode{regression-model} function is
\begin{verbatim}
(regression-model x y)
\end{verbatim}
For a simple regression \index{simple regression} \dcode{x} can be a single
list or vector. For a multiple regression \index{multiple regression}
\dcode{x} can be a list of lists or vectors or a matrix. The
\dcode{regression-model} function also takes several optional keyword
arguments. The most important ones are \dcode{:intercept},
\dcode{:print}, and \dcode{:weights}. Both \dcode{:intercept} and
\dcode{:print} are \dcode{T} by default. To get a model without an
intercept \index{intercept} use the expression
\begin{verbatim}
(regression-model x y :intercept nil)
\end{verbatim}
To form a weighted regression model use the expression
\begin{verbatim}
(regression-model x y :weights w)
\end{verbatim}
where \dcode{w} is a list or vector of weights the same length as
\dcode{y}. In a weighted model the variances of the errors are assumed
to be inversely proportional to the weights \dcode{w}.
The \dcode{regression-model} function prints a very simple summary of the
fit model and returns a model object as its result. To be able to examine
the model further assign the returned model object to a variable using
an expression like
\footnote{Recall from Section \ref{MorePlots.Modifying} that
\dcode{\#} is the
printed representation of the model object returned by
\dcode{regression-model}. Unfortunately you can't cut and paste it
into the \dcode{def}, but of course you can cut and paste the
\dcode{regression-model} expression or use the history mechanism.}
\begin{verbatim}
(def bikes (regression-model travel-space separation :print nil))
\end{verbatim}
I have given the keyword argument \dcode{:print nil} to suppress the
printing of the summary, since we have already seen it. To find out what
messages are available use the \dcode{:help} message:
\begin{verbatim}
> (send bikes :help)
REGRESSION-MODEL-PROTO
Normal Linear Regression Model
Help is available on the following:
:ADD-METHOD :ADD-SLOT :BASIS :CASE-LABELS :COEF-ESTIMATES :COEF-STANDARD-ERRORS
:COMPUTE :COOKS-DISTANCES :DELETE-METHOD :DELETE-SLOT :DF :DISPLAY :DOC-TOPICS
:DOCUMENTATION :EXTERNALLY-STUDENTIZED-RESIDUALS :FIT-VALUES :GET-METHOD
:HAS-METHOD :HAS-SLOT :HELP :INCLUDED :INTERCEPT :INTERNAL-DOC :ISNEW
:LEVERAGES :METHOD-SELECTORS :NEW :NUM-CASES :NUM-COEFS :NUM-INCLUDED
:OWN-METHODS :OWN-SLOTS :PARENTS :PLOT-BAYES-RESIDUALS :PLOT-RESIDUALS
:PRECEDENCE-LIST :PREDICTOR-NAMES :PRINT :R-SQUARED :RAW-RESIDUALS
:RESIDUAL-SUM-OF-SQUARES :RESIDUALS :RESPONSE-NAME :RETYPE :SAVE :SHOW
:SIGMA-HAT :SLOT-NAMES :SLOT-VALUE :STUDENTIZED-RESIDUALS :SUM-OF-SQUARES
:SWEEP-MATRIX :TOTAL-SUM-OF-SQUARES :WEIGHTS :X :X-MATRIX :XTXINV :Y PROTO
NIL
>
\end{verbatim}
Many of these messages are self explanatory, and many have already been
used by the \dcode{:display} message, which \dcode{regression-model} sends
to the new model to print the summary. As examples let's try the
\dcode{:coef-estimates} \index{regression messages:coef-estimates} and
\dcode{:coef-standard-errors} \index{regression
messages:coef-standard-errors} messages
\footnote{Ordinarily the entries in the lists returned by these messages
correspond simply to the intercept, if one is included in the model,
followed by the independent variables as they were supplied to
\dcode{regression-model}. However, if degeneracy is detected during
computations some variables will not be used in the fit; they will be
marked as {\em aliased}\/ in the printed summary. The indices of the
variables used can be obtained by the \dcode{:basis} message; the
entries in the list returned by \dcode{:coef-estimates} correspond to
the intercept, if appropriate, followed by the coefficients of the
elements in the basis. The messages
\dcode{:x-matrix} and \dcode{:xtxinv} are similar in that they use only the
variables in the basis.}:
\begin{verbatim}
> (send bikes :coef-estimates)
(-2.182472 0.6603419)
> (send bikes :coef-standard-errors)
(1.056688 0.06747931)
>
\end{verbatim}
The \dcode{:plot-residuals} \index{regression messages:plot-residuals}
message will produce a residual plot\index{residual plot}. To find out
what residuals \index{residuals} are plotted against let's look at the help
information:
\begin{verbatim}
> (send bikes :help :plot-residuals)
:PLOT-RESIDUALS
Message args: (&optional x-values)
Opens a window with a plot of the residuals. If X-VALUES are not supplied
the fitted values are used. The plot can be linked to other plots with the
link-views function. Returns a plot object.
NIL
>
\end{verbatim}
Using the expressions
\begin{verbatim}
(plot-points travel-space separation)
(send bikes :plot-residuals travel-space)
\end{verbatim}
\begin{figure}\centering
%\vspace{3.47in}
\centerline{\includegraphics{postscript/fig15.ps}}
\wcaption{Linked raw data and residual plots for the bicycles
example.}{Scatterplot6} \end{figure}
we can construct two plots of the data as shown in Figure
\ref{Scatterplot6}. By linking the plots we can use the mouse to identify
points in both plots simultaneously. A point that stands out is observation
6 (starting the count at 0, as usual).
The plots both suggest that there is some curvature in the data; this
curvature is particularly pronounced in the residual plot if you
ignore observation 6 for the moment. To allow for this curvature we
might try to fit a model with a quadratic term in \dcode{travel-space}:
\begin{verbatim}
> (def bikes2 (regression-model (list travel-space (^ travel-space 2))
separation))
Least Squares Estimates:
Constant -16.41924 (7.848271)
Variable 0 2.432667 (0.9719628)
Variable 1 -0.05339121 (0.02922567)
R Squared: 0.9477923
Sigma hat: 0.5120859
Number of cases: 10
Degrees of freedom: 7
BIKES2
>
\end{verbatim}
I have used the exponentiation function ``\verb+^+'' to compute the
square of \dcode{travel-space}. Since I am now forming a multiple
regression model \index{multiple regression} the first argument to
\dcode{regression-model} is a list of the \dcode{x} variables.
You can proceed in many directions from this point. If you want to
calculate Cook's distances \index{} for the observations you can first
compute internally studentized residuals \index{studentized residuals}
as
\begin{verbatim}
(def studres (/ (send bikes2 :residuals)
(* (send bikes2 :sigma-hat)
(sqrt (- 1 (send bikes2 :leverages))))))
\end{verbatim}
Then Cook's distances are obtained as
\footnote{The / function is used here with three arguments.
The first argument is divided by the second, and the result is then
divided by the third. Thus the result of the expression (/ 6 3 2) is
1.}
\begin{verbatim}
> (* (^ studres 2)
(/ (send bikes2 :leverages) (- 1 (send bikes2 :leverages)) 3))
(0.166673 0.00918596 0.03026801 0.01109897 0.009584418 0.1206654 0.581929
0.0460179 0.006404474 0.09400811)
\end{verbatim}
The seventh entry -- observation 6, counting from zero -- clearly
stands out.
Another approach to examining residuals for possible outliers is to
use the Bayesian residual plot \index{Bayesian residual plot} proposed
by Chaloner and Brant \cite{ChalonerBrant}, which can be obtained
using the message \dcode{:plot-bayes-residuals}\index{regression
messages:plot-bayes-residuals}. The expression \dcode{(send bikes2
:plot-bayes-residuals)} produces the plot in Figure
\ref{Scatterplot7}.
\begin{figure}\centering
%\vspace{3.47in}
\centerline{\includegraphics{postscript/fig16.ps}}
\ncaption{Bayes residual plot for bicycle data.}{Scatterplot7}
\end{figure}
The bars represent mean $\pm 2SD$ of the posterior distribution of the
actual realized errors, based on an improper uniform prior distribution on
the regression coefficients. The $y$ axis is in units of $\hat{\sigma}$.
Thus this plot suggests the probability that point 6 is three or more
standard deviations from the mean is about 3\%; the probability that it is
at least two standard deviations from the mean is around 50\%.
Several other methods are available for residual and case analysis.
These include \dcode{:studentized-residuals} and
\dcode{:cooks-distances}, which we could have used above instead of
calculating these quantities from their definitions. Another useful
message is \dcode{:included}, which can be used to change the cases to
be used in estimating a model. Further details on these messages are
available in their help information.
\subsection*{Exercises}
\begin{enumerate}
\item
Using the variables \dcode{absorption}, \dcode{iron} and \dcode{aluminum}
introduced in Section \ref{MorePlots.Spinplot} above construct and examine
a model with \dcode{absorption} as the dependent variable.
\item
Using the variables \dcode{abrasion-loss}, \dcode{tensile-strength} and
\dcode{hardness} introduced in Section \ref{MorePlots.Scatmat} above
construct and examine a model with \dcode{abrasion-loss} as the dependent
variable. \end{enumerate}
\newpage
\section{Defining Your Own Functions and Methods}
\label{Fundefs}
This section gives a brief introduction to programming XLISP-STAT.
The most basic programming operation is to define a new function.
Closely related is the idea of defining a new method for an object.
\footnote{The discussion in this section only scratches the surface of
what you can do with functions in the XLISP language. To see more examples
you can look at the files that are loaded when XLISP-STAT starts up.
For more information on options of function definition, macros, etc. see the
XLISP documentation and the books on Lisp mentioned in the references.}
\subsection{Defining Functions}
You can use the XLISP language to define functions of your own. Many
of the functions you have been using so far are written in this
language. The special form used for defining functions is called
\dcode{defun}\index{defun}. The simplest form of the \dcode{defun}
syntax is
\begin{verbatim}
(defun fun args expression)
\end{verbatim}
where \dcode{fun} is the symbol you want to use as the function name,
\dcode{args} is the list of the symbols you want to use as arguments,
and \dcode{expression} is the body of the function. Suppose for
example that you want to define a function to delete a case from a
list. This function should take as its arguments the list and the
index of the case you want to delete. The body of the function can be
based on either of the two approaches described in Section
\ref{MoreData.Subsets} above. Here is one approach:
\begin{verbatim}
(defun delete-case (x i)
(select x (remove i (iseq 0 (- (length x) 1))) ) )
\end{verbatim}
I have used the function \dcode{length} in this definition to
determine the length of the argument \dcode{x}. Note that none of the
arguments to \dcode{defun} are quoted: \dcode{defun} is a special form
that does not evaluate its arguments.
Unless the functions you define are very simple you will probably want
to define them in a file and load the file into XLISP-STAT with the
\dcode{load} command. You can put the functions in the
\dcode{statinit.lsp} file or include in \dcode{statinit.lsp}
a \dcode{load} command that will load another file. The version of
XLISP-STAT for the Macintosh includes a simple editor that can be used
from within XLISP-STAT. The editor is described briefly in Section
\ref{Shortcuts.Editor} above.
You can also write functions that send messages to objects. Here is a
function that takes two regression models, assumed to be nested, and
computes the $F$ statistic for comparing these models:
\begin{verbatim}
(defun f-statistic (m1 m2)
"Args: (m1 m2)
Computes the F statistic for testing model m1 within model m2."
(let ((send ss1 (m1 :sum-of-squares))
(send df1 (m1 :df))
(send ss2 (m2 :sum-of-squares))
(send df2 (m2 :df)))
(/ (/ (- ss1 ss2) (- df1 df2)) (/ ss2 df2))))
\end{verbatim}
This definition uses the Lisp \dcode{let} \index{let} construct to
establish some local {\em variable bindings}. The variables \dcode{ss1},
\dcode{df1}, \dcode{ss2} and \dcode{df2} are defined in terms of the two
model arguments \dcode{m1} and \dcode{m2}, and are then used to compute
the $F$ statistic. The string following the argument list is a
{\em documentation string}. When a documentation string is present in a
\dcode{defun} expression \dcode{defun} will install it so the \dcode{help}
function will be able to retrieve it.
\subsection{Anonymous Functions}
\label{Fundefs.Anonymous}
Suppose you would like to plot the function $f(x) = 2x + x^{2}$ over the
range $-2 \leq x \leq 3$. We can do this by first defining a function
\dcode{f} and then using \dcode{plot-function}:
\begin{verbatim}
(defun f (x) (+ (* 2 x) (^ x 2)))
(plot-function #'f -2 3)
\end{verbatim}
This is not too hard, but it nevertheless involves one unnecessary step:
naming the function \dcode{f}. You probably won't need this function
again; it is a throw-away function defined only to allow you to give it
to \dcode{plot-function} as an argument. It would be nice if you could
just give \dcode{plot-function} an expression that constructs the function
you want. Here is such an expression:
\begin{verbatim}
(function (lambda (x) (+ (* 2 x) (^ x 2))))
\end{verbatim}
There are two steps involved. The first is to specify the definition of
your function. This is done using a {\em lambda expression}, in this case
\begin{verbatim}
(lambda (x) (+ (* 2 x) (^ x 2)))
\end{verbatim}
A lambda expression is a list starting with the symbol \dcode{lambda},
followed by the list of arguments and the expressions making up the
body of the function. The lambda expression is only a definition, it
is not yet a function, a piece of code that can be applied to
arguments. The special form \dcode{function} takes the lambda list and
constructs such a function. The result can be saved in a variable or
it can be passed on to another function as an argument. For our
plotting problem you can use the single expression
\begin{verbatim}
(plot-function (function (lambda (x) (+ (* 2 x) (^ x 2)))) -2 3)
\end{verbatim}
or, using the \dcode{\#'} short form,
\begin{verbatim}
(plot-function #'(lambda (x) (+ (* 2 x) (^ x 2))) -2 3)
\end{verbatim}
Since the function used in these expressions is never named it is
sometimes called an {\em anonymous function}.
You can also construct a rotating plot of a function of two variables
using the function \dcode{spin-function}. As an example, the
expression
\begin{verbatim}
(spin-function #'(lambda (x y) (+ (^ x 2) (^ y 2))) -1 1 -1 1)
\end{verbatim}
\begin{figure}\centering
%\vspace{3.47in}
\centerline{\includegraphics{postscript/fig17.ps}}
\ncaption{Rotatable plot of $f(x,y) = x^{2} + y^{2}$.}{Spinplot3}
\end{figure}
constructs a plot of the function $f(x, y) = x^{2} + y^{2}$ over the
range $-1 \leq x \leq 1, -1 \leq y \leq 1$ using a $6 \times 6$ grid.
The number of grid points can be changed using the {\tt :num-points}
keyword. The result is shown in Figure \ref{Spinplot3}. Again it convenient
to use a lambda expression to specify the function to be plotted.
There are a number of other situations in which you might want to pass
a function on as an argument without first going through the trouble
of thinking up a name for the function and defining it using
\dcode{defun}. A few additional examples are given in the next
subsection.
\subsection{Some Dynamic Simulations}
In Section \ref{MorePlots.Dynamic} we used a loop to control a dynamic
simulation in which points in a histogram of one variable were selected and
deselected in the order of a second variable. Let's look at how to run the
same simulation using a {\em slider}\/ to control the simulation.
A slider is a modeless dialog box containing a scroll bar and a value
display field. As the scroll bar is moved the displayed value is changed
and an action is taken. The action is defined by an {\em action function}
given to the scroll bar, a function that is called with one value, the
current slider value, each time the value is changed by the user. There
are two kinds of sliders, sequence sliders and interval sliders. Sequence
sliders take a sequence (a list or a vector) and scroll up and down the
sequence. The displayed value is either the current sequence element or
the corresponding element of a display sequence. An interval slider
dialog takes an interval, divides it into a grid and scrolls through the
grid. By default a grid of around 30 points is used; the exact number and
the interval end points are adjusted to give nice printed values. The
current interval point is displayed in the display field.
For our example let's use a sequence slider to scroll through the elements
of the \dcode{hardness} list in order and select the corresponding element
of \dcode{abrasion-loss}. The expression
\begin{verbatim}
(def h (histogram abrasion-loss))
\end{verbatim}
sets up a histogram and saves its plot object in the variable \dcode{h}.
The function \dcode{sequence-slider-dialog} takes a list or vector and
opens a sequence slider to scroll through its argument. To do
something useful with this dialog we need to give it an action function
as the value of the keyword argument \dcode{:action}. The function should
take one argument, the current value of the sequence controlled by the
slider. The expression
\begin{verbatim}
(sequence-slider-dialog (order hardness) :action
#'(lambda (i)
(send h :unselect-all-points)
(send h :point-selected i t)))
\end{verbatim}
sets up a slider for moving the selected point in the \dcode{abrasion-loss}
histogram along the order of the \dcode{hardness} variable. The histogram
and scroll bar are shown in Figure \ref{Dynaplot1}.
\begin{figure}\centering
%\vspace{3.8in}
\centerline{\includegraphics{postscript/fig18a.ps}}
\wcaption{Slider-controlled animation of a histogram.}{Dynaplot1}
\end{figure}
The action function is called every time the slider is moved. It is
called with the current element of the sequence
\dcode{(order hardness)}, the index of the point to select. It clears all
current selections and then selects the point specified in the call from
the slider. The body of the function is almost identical to the body of
the \dcode{dotimes} loop used in Section \ref{MorePlots.Dynamic}. The
slider is thus an interactive, graphically controlled version of this
loop.
As another example, suppose we would like to examine the effect of
changing the exponent in a Box-Cox power transformation
\begin{displaymath}
h(x) = \left\{
\begin{array}{cl}
\frac{\textstyle x^{\lambda} - 1}{\textstyle \lambda}
& \mbox{if $\lambda \not= 0$}\\
\\
\log(x)
& \mbox{otherwise}
\end{array}
\right.
\end{displaymath}
on a probability plot. As a first step we might define a function to
compute the power transformation and normalize the result to fall
between zero and one:
\begin{verbatim}
(defun bc (x p)
(let* ((bcx (if (< (abs p) .0001) (log x) (/ (^ x p) p)))
(min (min bcx))
(max (max bcx)))
(/ (- bcx min) (- max min))))
\end{verbatim}
This definition uses the \dcode{let*}\index{let*} form to establish
some local variable bindings. The \dcode{let*} form is like the
\dcode{let} form used above except that \dcode{let*} defines its
variables sequentially, allowing a variable to be defined in terms of
other variables already defined in the \dcode{let*} expression;
\dcode{let} on the other hand creates its assignments in parallel. In
this case the variables \dcode{min} and \dcode{max} are defined in
terms of the variable \dcode{bcx}.
Next we need a set of positive numbers to transform. Let's use a
sample of thirty observations from a $\chi^{2}_{4}$ distribution and
order the observations:
\begin{verbatim}
(def x (sort-data (chisq-rand 30 4)))
\end{verbatim}
The normal quantiles of the expected uniform order statistics are given by
\begin{verbatim}
(def r (normal-quant (/ (iseq 1 30) 31)))
\end{verbatim}
and a probability plot of the untransformed data is constructed using
\begin{verbatim}
(def myplot (plot-points r (bc x 1)))
\end{verbatim}
Since the power used is 1 the function \dcode{bc} just rescales the data.
There are several ways to set up a slider dialog to control the power
parameter. The simplest approach is to use the function
\dcode{interval-slider-dialog}:
\begin{verbatim}
(interval-slider-dialog (list -1 2)
:points 20
:action #'(lambda (p)
(send myplot :clear nil)
(send myplot :add-points r (bc x p))))
\end{verbatim}
\dcode{interval-slider-dialog} takes a list of two numbers, the lower and
upper bounds of an interval. The action function is called with the
current position in the interval.
This approach works fine on a Mac II but may be a bit slow on a Mac Plus
or a Mac SE. An alternative is to pre-compute the transformations for a
list of powers and then use the pre-computed values in the display. For
example, using the powers defined in
\begin{verbatim}
(def powers (rseq -1 2 16))
\end{verbatim}
we can compute the transformed data for each power and save the result as
the variable \dcode{xlist}:
\begin{verbatim}
(def xlist (mapcar #'(lambda (p) (bc x p)) powers))
\end{verbatim}
The function \dcode{mapcar} is one of several {\em mapping
functions}\/ available in Lisp. The first argument to \dcode{mapcar}
is a function. The second argument is a list. \dcode{mapcar} takes
the function, applies it to each element of the list and returns the
list of the results
\footnote{\dcode{mapcar} can be given several lists after the function.
The function must take as many arguments as there are lists.
\dcode{mapcar} will apply the function using the first element of
each list, then using the second element, and so on, until one of the lists
is exhausted, and return a list of the results.}.
Now we can use a sequence slider to move up and down the
transformed values in \dcode{xlist}:
\begin{verbatim}
(sequence-slider-dialog xlist
:display powers
:action #'(lambda (x)
(send myplot :clear nil)
(send myplot :add-points r x)))
\end{verbatim}
Note that we are scrolling through a list of lists and the element passed
to the action function is thus the list of current transformed values. We
would not want to see these values in the display field on the slider, so
I have used the keyword argument \dcode{:display} to specify an alternative
display sequence, the powers used in the transformation.
\subsection{Defining Methods}
When a message is sent to an object the object system will use the object
and the method selector to find the appropriate piece of code to execute.
Different objects may thus respond differently to the same message. A
linear regression model and a nonlinear regression model might both respond
to a \dcode{:coef-estimates} message but they will execute different code
to compute their responses.\index{methods}\index{methods:defining}
The code used by an object to respond to a message is called a {\em
method}. Objects are organized in a hierarchy in which objects {\em
inherit}\/ from other objects. If an object does not have a method of
its own for responding to a message it will use a method inherited
from one of its ancestors. The \dcode{send} function will move up the
{\em precedence list}\/ of an object's ancestors until a method for a
message is found.
Most of the objects encountered so far inherit directly from {\em
prototype}\/ objects. Scatterplots inherit from
\dcode{scatterplot-proto}, histograms from \dcode{histogram-proto},
regression models from \dcode{regres\-sion-model-proto}. Prototypes
are just like any other objects. They are essentially {\em typical}\/
versions of a certain kind of object that define default behavior.
Almost all methods are owned by prototypes. But any object can own a
method, and in the process of debugging a new method it is often
better to attach the method to a separate object constructed for that
purpose instead of the prototype.
To add a method to an object you can use the \dcode{defmeth}
macro. As an example, in Section \ref{Regression} we calculated Cook's
distances \index{Cook's distance} for a regression model. If you find that
you are doing this very frequently then you might want to define a
\dcode{:cooks-distances} \index{messages:cooks-distances} method.
The general form of a method definition is:
\begin{verbatim}
(defmeth object :new-method arg-list body)
\end{verbatim}
\dcode{object} is the object that is to own the new method. In the case of
regression models this can be either a specific regression model or the
prototype that all regression models inherit from,
\dcode{regression-model-proto}. The argument \dcode{:new-method} is the
message selector for your method; in our case this would be
\dcode{:cooks-distances}. The argument \dcode{arg-list} is the list of
argument names for your method, and \dcode{body} is one or more expressions
making up the body of the method. When the method is used each of
these expressions will be evaluated, in the order in which they appear.
Here is an expression that will install the \dcode{:cooks-distances} method:
\begin{verbatim}
(defmeth regression-model-proto :cooks-distances ()
"Message args: ()
Returns Cooks distances for the model."
(let* ((leverages (send self :leverages))
(studres (/ (send self :residuals)
(* (send self :sigma-hat) (sqrt (- 1 leverages)))))
(num-coefs (send self :num-coefs)))
(* (^ studres 2) (/ leverages (- 1 leverages) num-coefs)))))
\end{verbatim}
The variable \dcode{self} refers to the object that is receiving the
message. This definition is close to the definition of this method supplied
in the file \dcode{regression.lsp}.
\subsection{Plot Methods}
All plot activities are accomplished by sending messages to plot
objects. For example, every time a plot needs to be redrawn the
system sends the plot the \dcode{:redraw} message. By defining a new
method for this message you can change the way a plot is drawn.
Similarly, when the mouse is moved or clicked in a plot the plot is
sent the \dcode{:do-mouse} message. Menu items also send messages to
their plots when they are selected. If you are interested in modifying
plot behavior you may be able to get started by looking at the methods
defined in the graphics files loaded on start up. Further details
will be given in \cite{MyBook}.
\newpage
\section{Matrices and Arrays}
\label{Arrays}
\index{matrices} \index{arrays}
XLISP-STAT includes support for multidimensional arrays patterned
after the Common Lisp standard described in detail in Steele
\cite{CLtL}. The functions supported are listed in Section
\ref{Appendix.Arrays} of the appendix.
In addition to the standard Common Lisp array functions XLISP-STAT
also includes a number of linear algebra functions such as
\dcode{inverse}, \dcode{solve}, \dcode{transpose},
\dcode{chol-decomp}, \dcode{lu-decomp} and \dcode{sv-decomp}. These
functions are listed in the appendix as well.
An array is printed using the standard Common Lisp format. For
example, a 2 by 3 matrix with rows (1 2 3) and (4 5 6) is printed as
\begin{verbatim}
#2A((1 2 3)(4 5 6))
\end{verbatim}
The prefix \verb+#2A+ indicates that this is a two-dimensional array.
This form is not particularly readable, but it has the advantage that
it can be pasted into expressions and will be read as an array by the
XLISP reader.\footnote{You should quote an array if you type it in
using this form, as the value of an array is not defined.} For
matrices you can use the function \dcode{print-matrix} to get a
slightly more readable representation:
\begin{verbatim}
> (print-matrix '#2a((1 2 3)(4 5 6)))
#2a(
(1 2 3)
(4 5 6)
)
NIL
>
\end{verbatim}
The \dcode{select} function can be used to extract elements or
subarrays from an array. If \dcode{A} is a two dimensional array then
the expression
\begin{verbatim}
(select a 0 1)
\end{verbatim}
will return element 1 of row 0 of \dcode{A}. The expression
\begin{verbatim}
(select a (list 0 1) (list 0 1))
\end{verbatim}
returns the upper left hand corner of \dcode{A}.
\newpage
\section{Nonlinear Regression}
XLISP-STAT allows the construction of nonlinear, normal regression
models. The present implementation is experimental. The definitions
needed for nonlinear regression\index{nonlinear regression} are in the
file \dcode{nonlin.lsp} on the distribution disk. This file is not
loaded automatically at start up; you should load it now, using the
\macbold{Load} item on the
\macbold{File} menu or the \dcode{load} command, to carry out the
calculations in this section.
As an example, Bates and Watts \cite[A1.3]{BatesWatts} describe an
experiment on the relation between the velocity of an enzymatic
reaction, $y$, and the substrate concentration, $x$. The data for an
experiment in which the enzyme was treated with Puromycin are given by
\begin{verbatim}
(def x1 (list 0.02 0.02 0.06 0.06 .11 .11 .22 .22 .56 .56 1.1 1.1))
(def y1 (list 76 47 97 107 123 139 159 152 191 201 207 200))
\end{verbatim}
The Michaelis-Menten function
\begin{displaymath}
\eta(x) = \frac{\theta_{1}x}{\theta_{2}+x}
\end{displaymath}
often provides a good model for the dependence of velocity on
substrate concentration. Assuming the Michaelis-Menten function as the
mean velocity at a given concentration level the function \dcode{f1}
defined by
\begin{verbatim}
(defun f1 (b) (/ (* (select b 0) x1) (+ (select b 1) x1)))
\end{verbatim}
computes the list of mean response values at the points in
\dcode{x1} for a parameter list \dcode{b}. Using these definitions,
which are contained in the file \dcode{puromycin.lsp} in the
\dcode{Data} folder of the distribution disk, we can construct a
nonlinear regression model using the \dcode{nreg-model}
\index{nreg-model} function.
First we need initial estimates for the two model parameters.
Examining the expression for the Michaelis-Menten model shows that as
$x$ increases the function approaches an asymptote, $\theta_{1}$. The
second parameter, $\theta_{2}$, can be interpreted as the value of $x$
at which the function has reached half its asymptotic value. Using
these interpretations for the parameters and a plot constructed by the
expression
\begin{verbatim}
(plot-points x1 y1)
\end{verbatim}
shown in Figure \ref{MikeMentPlot} we can read off reasonable initial
estimates of 200 for $\theta_{1}$ and 0.1 for $\theta_{2}$. The
\dcode{nreg-model} function takes the mean function, the response vector
and an initial estimate of the parameters, computes more accurate
estimates using an iterative algorithm starting from the initial
guess, and prints a summary of the results. It returns a nonlinear
regression model object:
\footnote{Recall that the expression \dcode{\#'f1} is short for
\dcode{(function f1)} and is used for obtaining the function definition
associated with the symbol \dcode{f1}.}
\begin{figure}\centering
%\vspace{3.8in}
\centerline{\includegraphics{postscript/fig19.ps}}
\ncaption{Plot of reaction velocity against substrate concentration for
Puromycin experiment.}{MikeMentPlot}
\end{figure}
\begin{verbatim}
> (def puromycin (nreg-model #'f1 y1 (list 200 .1)))
Residual sum of squares: 7964.19
Residual sum of squares: 1593.16
Residual sum of squares: 1201.03
Residual sum of squares: 1195.51
Residual sum of squares: 1195.45
Residual sum of squares: 1195.45
Residual sum of squares: 1195.45
Residual sum of squares: 1195.45
Least Squares Estimates:
Parameter 0 212.684 (6.94715)
Parameter 1 0.0641213 (0.00828094)
R Squared: 0.961261
Sigma hat: 10.9337
Number of cases: 12
Degrees of freedom: 10
PUROMYCIN
>
\end{verbatim}
The function \dcode{nreg-model} also takes several keyword arguments,
including \dcode{:epsilon} to specify a convergence criterion and
\dcode{:count-limit}, a limit on the number of iterations. By default these
values are .0001 and 20, respectively. The algorithm for fitting the
model is a simple Gauss-Newton algorithm\index{Gauss-Newton algorithm}
with backtracking; derivatives are computed numerically.
To see how you can analyze the model further you can send
\dcode{puromycin} the \dcode{:help} message. The result is very
similar to the help information for a linear regression model. The
reason is simple: nonlinear regression models have been implemented as
objects, with the nonlinear regression model prototype
\dcode{nreg-model-proto} inheriting from the linear regression model
prototype. The internal data, the method for computing estimates, and
the method of computing fitted values have been modified for nonlinear
models, but the other methods remain unchanged. Once the model has
been fit the Jacobian of the mean function at the estimated parameter
values is used as the $X$ matrix. In terms of this definition most of
the methods for linear regression, such as the methods
\dcode{:coef-standard-errors} and \dcode{:leverages}, still make
sense, at least as first order asymptotic approximations.
In addition to the messages for linear regression models a nonlinear
regression model can respond to the messages
\begin{verbatim}
:COUNT-LIMIT
:EPSILON
:MEAN-FUNCTION
:NEW-INITIAL-GUESS
:PARAMETER-NAMES
:THETA-HAT
:VERBOSE
\end{verbatim}
\subsection*{Exercises}
\begin{enumerate}
\item
Examine the residuals of the \dcode{puromycin} model.
\item
The file \dcode{puromycin.lsp} also contains data \dcode{x2} and
\dcode{y2} and mean function \dcode{f2} for an experiment run without
the Puromycin treatment. Fit a model to this data and compare the
results to the experiment with Puromycin.
\end{enumerate}
\newpage
\section{One Way ANOVA}
XLISP-STAT allows the construction of normal one way analysis of
variance models. The definitions needed are in the file \dcode{oneway.lsp}
on the distribution disk. This file is not loaded automatically at start
up; you should load it now, using the \macbold{Load} item on the
\macbold{File} menu or the \dcode{load} command, to carry out the
calculations in this section.
As an example, suppose we would like to model the data on cholesterol
levels in rural and urban Guatemalans, examined in Section
\ref{Elementary.Summary}, as a one way ANOVA model. The boxplots we
obtained in Section \ref{Elementary.Summary} showed that the samples
were skewed and the center and spread of the urban sample were larger
than the center and spread of the rural sample. To compensate for
these facts I will use a normal ANOVA model for the logarithms of the
data:
\begin{verbatim}
> (def cholesterol (oneway-model (list (log urban) (log rural))))
Least Squares Estimates:
Group 0 5.377172 (0.03624821)
Group 1 5.099592 (0.03456131)
R Squared: 0.4343646
Sigma hat: 0.1621069
Number of cases: 42
Degrees of freedom: 40
Group Mean Square: 0.8071994 (1)
Error MeanSquare: 0.02627865 (40)
CHOLESTEROL
>
\end{verbatim}
The function \dcode{oneway-model} requires one argument, a list of the
lists or vectors representing the samples for the different groups. The
model \dcode{cholesterol} can respond to all regression messages as well as
a few new ones. The new ones are
\begin{verbatim}
:BOXPLOTS
:ERROR-MEAN-SQUARE
:ERROR-DF
:GROUP-DF
:GROUP-MEAN-SQUARE
:GROUP-NAMES
:GROUP-SUM-OF-SQUARES
:GROUPED-DATA
:STANDARD-DEVIATIONS
\end{verbatim}
\newpage
\section{Maximization and Maximum Likelihood Estimation}
\label{MaximumLikelihood}\index{maximization}
\index{maximization:Newton's method}
\index{maximization:Nelder-Mead simplex method}
XLISP-STAT includes two functions for maximizing functions of several
variables. The definitions needed are in the file \dcode{maximize.lsp}
on the distribution disk. This file is not loaded automatically at
start up; you should load it now, using the \macbold{Load} item on the
\macbold{File} menu or the \dcode{load} command, to carry out the
calculations in this section. The material in this section is somewhat
more advanced as it assumes you are familiar with the basic concepts
of maximum likelihood estimation.\index{maximum likelihood estimation}
Two functions are available for maximization. The first is
\dcode{newtonmax}\index{newtonmax}, which takes a function and a list
or vector representing an initial guess for the location of the
maximum and attempts to find the maximum using an algorithm based on
Newton's method\index{Newton's method} with backtracking. The
algorithm is based on the unconstrained minimization system described
in Dennis and Schnabel
\cite{DennisSchnabel}.
As an example, Cox and Snell \cite[Example T]{CoxSnell} describe data
collected on times (in operating hours) between failures of air
conditioning units on several aircraft. The data for one of the
aircraft can be entered as
\begin{verbatim}
(def x (90 10 60 186 61 49 14 24 56 20 79 84 44 59 29 118 25 156 310 76
26 44 23 62 130 208 70 101 208))
\end{verbatim}
A simple model for these data might be to assume the times between
failures are independent random variables with a common gamma
distribution. The density of the gamma distribution can be written as
\index{gamma distribution}
\begin{displaymath}
\frac{(\beta / \mu)(\beta x / \mu)^{\beta - 1} e^{- \beta x / \mu}}
{\Gamma(\beta)}
\end{displaymath}
where $\mu$ is the mean time between failures and $\beta$ is the gamma
shape parameter. The log likelihood for the sample is thus given by
\begin{displaymath}
n [\log(\beta) - \log(\mu) - \log(\Gamma(\beta))] +
\sum_{i=1}^{n} (\beta - 1) \log(\beta x_{i} / \mu) -
\sum_{i=1}^{n} \beta x_{i} / \mu.
\end{displaymath}
We can define a Lisp function to evaluate this log likelihood by
\begin{verbatim}
(defun gllik (theta)
(let* ((mu (select theta 0))
(beta (select theta 1))
(n (length x))
(bym (* x (/ beta mu))))
(+ (* n (- (log beta) (log mu) (log-gamma beta)))
(sum (* (- beta 1) (log bym)))
(sum (- bym)))))
\end{verbatim}
This definition uses the function \dcode{log-gamma} to evaluate
$\log(\Gamma(\beta))$. The data and function definition are contained
in the file \dcode{aircraft.lsp} in the \dcode{Data} folder of the
distribution disk.
Closed form maximum likelihood estimates are not available for the
shape parameter of this distribution, but we can use \dcode{newtonmax}
to compute estimates numerically.
\footnote{The maximizing value for $\mu$ is always the sample mean.
We could take advantage of this fact and reduce the problem to a one
dimensional maximization problem, but it is simpler to just maximize
over both parameters.}
We need an initial guess to use as a starting value in the
maximization. To get initial estimates we can compute the mean and
standard deviation of \dcode{x}
\begin{verbatim}
> (mean x)
83.5172
> (standard-deviation x)
70.8059
\end{verbatim}
and use method of moments estimates $\hat{\mu} = 83.52$ and
$\hat{\beta} = (\hat{\mu} / \hat{\sigma})^{2}$, calculated as
\begin{verbatim}
> (^ (/ (mean x) (standard-deviation x)) 2)
1.39128
\end{verbatim}
Using these starting values we can now maximize the log likelihood
function:
\begin{verbatim}
> (newtonmax #'gllik (list 83.5 1.4))
maximizing...
Iteration 0.
Criterion value = -155.603
Iteration 1.
Criterion value = -155.354
Iteration 2.
Criterion value = -155.347
Iteration 3.
Criterion value = -155.347
Reason for termination: gradient size is less than gradient tolerance.
(83.5173 1.67099)
\end{verbatim}
Some status information is printed as the optimization proceeds. You
can turn this off by supplying the keyword argument \dcode{:verbose}
with value \dcode{NIL}.
You might want to check that the gradient of the function is indeed
close to zero. If you do not have a closed form expression for the
gradient you can use \dcode{numgrad}\index{numgrad}\index{gradient} to
calculate a numerical approximation. For our example,
\begin{verbatim}
> (numgrad #'gllik (list 83.5173 1.67099))
(-4.07269e-07 -1.25755e-05)
\end{verbatim}
The elements of the gradient are indeed quite close to zero. You can
also compute the second derivative, or Hessian, matrix using
\dcode{numhess}\index{numhess}\index{Hessian matrix}. Approximate
standard errors of the maximum likelihood estimates are given by the
square roots of the diagonal entries of the inverse of the negative
Hessian matrix at the maximum:
\begin{verbatim}
> (sqrt (diagonal (inverse (- (numhess #'gllik (list 83.5173 1.67099))))))
(11.9976 0.402648)
\end{verbatim}
Instead of calculating the maximum using \dcode{newtonmax} and then
calculating the derivatives separately you can have \dcode{newtonmax}
return a list of the location of the maximum, the optimal function
value, the gradient and the Hessian by supplying the keyword argument
\dcode{:return-derivs} as \dcode{T}. \footnote{The function
\dcode{newtonmax} ordinarily uses numerical derivatives in its
computations. Occasionally this may not be accurate enough or may take
too long. If you have an expression for computing the gradient or the
Hessian then you can use these by having your function return a list
of the function value and the gradient, or a list of the function
value, the gradient and the Hessian matrix, instead of just returning
the function value.}
Newton's method assumes a function is twice continuously
differentiable. If your function is not smooth or you are having
trouble with \dcode{newtonmax} for some other reason you might try a
second maximization function,
\dcode{nelmeadmax}\index{nelmeadmax}\index{Nelder-Mead simplex
method}. \dcode{nelmeadmax} takes a function and an initial guess and
attempts to find the maximum using the Nelder-Mead simplex method as
described in Press, Flannery, Teukolsky and Vetterling
\cite{CRecipes}. The initial guess can consist of a simplex, a list of
$n+1$ points for an $n$-dimensional problem, or it can be a single
point, represented by a list or vector of $n$ numbers. If you specify
a single point you should also use the keyword argument \dcode{:size}
to specify as a list or vector of length $n$ the size in each
dimension of the initial simplex. This should represent the size of an
initial range in which the algorithm is to start its search for a
maximum. We can use this method in our gamma example:
\begin{verbatim}
> (nelmeadmax #'gllik (list 83.5 1.4) :size (list 5 .2))
Value = -155.603
Value = -155.603
Value = -155.603
Value = -155.587
Value = -155.53
Value = -155.522
...
Value = -155.347
Value = -155.347
Value = -155.347
Value = -155.347
(83.5181 1.6709)
\end{verbatim}
It takes somewhat longer than Newton's method but it does reach the
same result.
\subsection*{Exercises}
\begin{enumerate}
\item
The data set used in this example consists of sets of measurements for
ten aircraft. Data for five of the aricraft are contained in the
variable \dcode{failure-times} in the file \dcode{aircraft.lsp}. The
calculations of this section used the data for the second aricraft.
Examine the data for the remaining four aircraft.
\item
Use maximum likelihood estimation to fit a Weibull model to the data
used in this section.
\end{enumerate}
\newpage
\section{Approximate Bayesian Computations}
This section describes a set of tools for computing approximate
posterior moments and marginal densities in XLISP-STAT. The
definitions needed are in the file \dcode{bayes.lsp} on the
distribution disk. This file is not loaded automatically at start up;
you should load it now, using the \macbold{Load} item on the
\macbold{File} menu or the \dcode{load} command, to carry out the
calculations in this section. The material in this section is somewhat
more advanced as it assumes you are familiar with the basic concepts
of Bayesian inference.\index{Bayesian computing}\index{posterior
distributions}\index{posterior distributions:approximate moments}
\index{posterior distributions:approximate marginal densities}
The functions described in this section can be used to compute first
and second order approximations to posterior moments and
saddlepoint-like approximations to one dimensional marginal posterior
densities. The approximations, based primarily on the results in
\cite{TK,TKK1,TKK2}, assume the posterior density is smooth and
dominated by a single mode. The implementation is experimental and
may change in a number of ways in the near future.
Let's start with a simple example, a data set used to study the
relation between survival time (in weeks) of leukemia patients and
white blood cell count recorded for the patients at their entry into
the study \cite[Example U]{CoxSnell}. The data consists of two groups
of patients classified as AG positive and AG negative. The data for
the 17 AG positive patients, contained in the file
\dcode{leukemia.lsp} in the \dcode{Data} folder on the distribution
disk, can be entered as
\begin{verbatim}
(def wbc-pos (list 2300 750 4300 2600 6000 10500 10000 17000 5400 7000
9400 32000 35000 100000 100000 52000 100000))
(def times-pos (list 65 156 100 134 16 108 121 4 39 143 56 26 22 1 1 5 65))
\end{verbatim}
A high white blood cell count indicates a more serious stage of the
disease and thus a lower chance of survival.
A model often used for this data assumes the survival times are
exponentially distributed with a mean that is log linear in the
logarithm of the white blood cell count. For convenience I will scale
the white blood cell counts by 10000. That is, the mean survival time
for a patient with white blood cell count $\log(WBC_{i})$ is
\begin{displaymath}
\mu_{i} = \theta_{0} \exp\{-\theta_{1} x_{i}\},
\end{displaymath}
with $x_{i} = \log(WBC_{i} / 10000)$. The log likelihood function is thus
given by
\begin{displaymath}
\sum_{i=1}^{n} \theta_{1} x_{i}
-n \log(\theta_{0})
-\frac{1}{\theta_{0}} \sum_{i=1}^{n} y_{i} e^{\theta_{1} x_{i}},
\end{displaymath}
with $y_{i}$ representing the survival times. After computing the
transformed $WBC$ variable as
\begin{verbatim}
(def transformed-wbc-pos (- (log wbc-pos) (log 10000)))
\end{verbatim}
the log likelihood can be computed using the function
\begin{verbatim}
(defun llik-pos (theta)
(let* ((x transformed-wbc-pos)
(y times-pos)
(theta0 (select theta 0))
(theta1 (select theta 1))
(t1x (* theta1 x)))
(- (sum t1x)
(* (length x) (log theta0))
(/ (sum (* y (exp t1x)))
theta0))))
\end{verbatim}
I will look at this problem using a vague, improper prior distribution
that is constant over the range $\theta_{i} > 0$; thus the log
posterior density is equal to the log likelihood constructed above, up
to an additive constant. The first step is to construct a Bayes model
object using the function \dcode{bayes-model}\index{bayes-model}. This
function takes a function for computing the log posterior density and
an initial guess for the posterior mode, computes the posterior mode
by an iterative method starting with the initial guess, prints a short
summary of the information in the posterior distribution, and returns
a model object. We can use the function \dcode{llik-pos} to compute
the log posterior density, so all we need is an initial estimate for
the posterior mode. Since the model we are using assumes a linear
relationship between the logarithm of the mean survival time and the
transformed $WBC$ variable a linear regression of the logarithms of
the survival times on
\dcode{transformed-wbc-pos} should provide reasonable estimates. The
linear regression gives
\begin{verbatim}
> (regression-model transformed-wbc-pos (log times-pos))
Least Squares Estimates:
Constant 3.54234 (0.302699)
Variable 0 -0.817801 (0.214047)
R Squared: 0.4932
Sigma hat: 1.23274
Number of cases: 17
Degrees of freedom: 15
\end{verbatim}
so reasonable initial estimates of the mode are $\hat{\theta}_{0} =
\exp(3.5)$ and $\hat{\theta}_{1} = 0.8$. Now we can use these estimates
in the
\dcode{bayes-model} function:
\begin{verbatim}
> (def lk (bayes-model #'llik-pos (list (exp 3.5) .8)))
maximizing...
Iteration 0.
Criterion value = -90.8662
Iteration 1.
Criterion value = -85.4065
Iteration 2.
Criterion value = -84.0944
Iteration 3.
Criterion value = -83.8882
Iteration 4.
Criterion value = -83.8774
Iteration 5.
Criterion value = -83.8774
Iteration 6.
Criterion value = -83.8774
Reason for termination: gradient size is less than gradient tolerance.
First Order Approximations to Posterior Moments:
Parameter 0 56.8489 (13.9713)
Parameter 1 0.481829 (0.179694)
#
>
\end{verbatim}
It is possible to suppress the summary information by supplying
\dcode{NIL} as the value of the \dcode{:print} keyword argument.
The summary printed by \dcode{bayes-model} gives first order
approximations to the posterior means and standard deviations of the
parameters. That is, the means are approximated by the elements of
the posterior mode and the standard deviations by the square roots of
the diagonal elements of the inverse of the negative Hessian matrix of
the log posterior at the mode. These approximations can also be
obtained from the model by sending it the \dcode{:1stmoments} message:
\begin{verbatim}
> (send lk :1stmoments)
((56.8489 0.481829) (13.9713 0.179694))
\end{verbatim}
The result is a list of two lists, the means and the standard
deviations. A slower but more accurate second order approximation is
available as well. It can be obtained using the message
\dcode{:moments}:
\begin{verbatim}
> (send lk :moments)
((65.3085 0.485295) (17.158 0.186587))
\end{verbatim}
Both these messages allow you to compute moments for individual
parameters or groups of parameters by specifying an individual
parameter index or a list of indices:
\begin{verbatim}
> (send lk :moments 0)
((65.3085) (17.158))
> (send lk :moments (list 0 1))
((65.3085 0.485295) (17.158 0.186587))
\end{verbatim}
The first and second order approximations to the moments of
$\theta_{0}$ are somewhat different; in particular the mean appears to
be somewhat larger than the mode. This suggests that the posterior
distribution of this parameter is skewed to the right. We can confirm
this by looking at a plot of the approximate marginal posterior
density. The message \dcode{:margin1} takes a parameter index, and a
sequence of points at which to evaluate the density and returns as its
value a list of the supplied sequence and the approximate density
values at these points. This list can be given to \dcode{plot-lines}
to produce a plot of the marginal density:
\begin{verbatim}
> (plot-lines (send lk :margin1 0 (rseq 30 120 30)))
#
\end{verbatim}
The result is shown in Figure \ref{PMar0} and does indeed show some
skewness to the right.
\begin{figure}\centering
%\vspace{3.8in}
\centerline{\includegraphics{postscript/fig20.ps}}
\ncaption{Plot of marginal posterior density for $\theta_{0}$.}{PMar0}
\end{figure}
In addition to examining individual parameters it is also possible to
look at the posterior distribution of smooth functions of the
parameters.\footnote{The approximation methods assume these functions
are twice continuously differentiable; thus they can not be indicator
functions.} For example, you might want to ask what information the
data contains about the probability of a patient with $WBC = 50000$
surviving a year or more. This probability is given by
\begin{displaymath}
\frac{1}{\mu(x)}e^{-52 / \mu(x)},
\end{displaymath}
with
\begin{displaymath}
\mu(x) = \theta_{0} e^{-\theta_{1} x}
\end{displaymath}
and $x = \log(5)$, and can be computed by the function
\begin{verbatim}
(defun lk-sprob (theta)
(let* ((time 52.0)
(x (log 5))
(mu (* (select theta 0) (exp (- (* (select theta 1) x))))))
(exp (- (/ time mu)))))
\end{verbatim}
This function can be given to the \dcode{:1stmoments},
\dcode{:moments} and \dcode{:margin1} methods to approximate the
posterior moments and marginal posterior density of this function. For
the moments the results are
\begin{verbatim}
> (send lk :1stmoments #'lk-sprob)
((0.137189) (0.0948248))
> (send lk :moments #'lk-sprob)
((0.184275) (0.111182))
\end{verbatim}
with the difference again suggesting some positive skewness, and the
marginal density produced by the expression
\begin{verbatim}
(plot-lines (send lk :margin1 #'lk-sprob (rseq .01 .8 30)))
\end{verbatim}
is shown in Figure \ref{PMarP}. Based on this picture the data
suggests that this survival probability is almost certainly below 0.5,
but it is difficult to make a more precise statement than that.
\begin{figure}\centering
%\vspace{3.8in}
\centerline{\includegraphics{postscript/fig21.ps}}
\ncaption{Plot of marginal posterior density of the one year survival
probability of a patient with $WBC = 50000$.}{PMarP}
\end{figure}
The functions described in this section are based on the
optimization code described in the previous section. By default
derivatives are computed numerically. If you can compute derivatives
yourself you can have your log posterior function return a list of
the function value and the gradient or a list of the function value, the
gradient and the Hessian matrix.
\subsection*{Exercises}
\begin{enumerate}
\item
To be able to think about prior distributions for the two parameters
in this problem we need to try to understand what the parameters
represent. The parameter $\theta_{0}$ is fairly easy to understand: it
is the mean survival time for patients with $WBC = 10000$. The
parameter $\theta_{0}$ is a little more difficult to think about. In
represents the approximate percent difference in mean survival time
for patients with $WBC$ differing by one percent. Because of the minus
sign in the mean relationship, and the expected inverse relation
between $WBC$ and mean survival time, $\theta_{1}$ is expected to be
positive.
Consider an informative prior distribution that assumes the two
parameters {\em a priori}\/ independent, takes $\log(\theta_{0})$ to
be normally distributed with mean $\log(52)$ and standard deviation
$\log(2)$, and $\theta_{1}$ to be exponentially distributed with mean
$\mu = 5$. This prior is designed to represent an opinion that mean
survival time at $WBC = 10000$ should be around one year, but that
guess could easily be off by a factor of two either way. The
percentage change in the mean for a one percent change in $WBC$ should
be on the order of one to ten or so. Examine the posterior
distribution for this prior and compare your results to the results
for the vague prior used above.
\item
Construct and examine a posterior distribution for the parameters of
the gamma model based on the aircraft data of Section
\ref{MaximumLikelihood}.
\end{enumerate}
\newpage
\begin{thebibliography}{99}
\bibitem{BatesWatts}
{\sc Bates, D. M. and Watts, D. G.}, (1988), {\em Nonlinear Regression
Analysis and its Applications}, New York: Wiley.
\bibitem{oldSbook}
{\sc Becker, Richard A., and Chambers, John M., (1984)}, {\em S: An
Interactive Environment for Data Analysis and Graphics}, Belmont, Ca:
Wadsworth.
\bibitem{newSbook}
{\sc Becker, Richard A., Chambers, John M., and Wilks, Allan R.,
(1988)}, {\em The New S Language: A Programming Environment for Data
Analysis and Graphics}, Pacific Grove, Ca: Wadsworth.
\bibitem{BeckerCleveland}
{\sc Becker, Richard A., and William S. Cleveland}, (1987), ``Brushing
scatterplots,'' {\em Technometrics}, vol. 29, pp. 127-142.
\bibitem{BetzBYTE}
{\sc Betz, David}, (1985) ``An XLISP Tutorial,'' {\em BYTE}, pp 221.
\bibitem{Betz2.0}
{\sc Betz, David}, (1988), ``XLISP: An experimental object-oriented
programming language,'' Reference manual for XLISP Version 2.0.
\bibitem{ChalonerBrant}
{\sc Chaloner, Kathryn, and Brant, Rollin}, (1988) ``A Bayesian
approach to outlier detection and residual analysis,'' {\em
Biometrika}, vol. 75, pp. 651-660.
\bibitem{ClevelandMcGill}
{\sc Cleveland, W. S. and McGill, M. E.}, (1988) {\em Dynamic Graphics
for Statistics}, Belmont, Ca.: Wadsworth.
\bibitem{CoxSnell}
{\sc Cox, D. R. and Snell, E. J.}, (1981) {\em Applied Statistics:
Principles and Examples}, London: Chapman and Hall.
\bibitem{DennisSchnabel}
{\sc Dennis, J. E. and Schnabel, R. B.}, (1983), {\em Numerical
Methods for Unconstrained Optimization and Nonlinear Equations},
Englewood Cliffs, N.J.: Prentice-Hall.
\bibitem{DevorePeck}
{\sc Devore, J. and Peck, R.}, (1986), {\em Statistics, the Exploration
and Analysis of Data}, St. Paul, Mn: West Publishing Co.
\bibitem{McDonaldThesis}
{\sc McDonald, J. A.}, (1982), ``Interactive Graphics for Data
Analysis,'' unpublished Ph. D. thesis, Department of Statistics,
Stanford University.
\bibitem{Macanova}
{\sc Oehlert, Gary W.}, (1987), ``MacAnova User's Guide,'' Technical
Report 493, School of Statistics, University of Minnesota.
\bibitem{CRecipes}
{\sc Press, Flannery, Teukolsky and Vetterling}, (1988), {\em
Numerical Recipes in C}, Cambridge: Cambridge University Press.
\bibitem{CLtL}
{\sc Steele, Guy L.}, (1984), {\em Common Lisp: The Language},
Bedford, MA: Digital Press.
\bibitem{Stuetzle}
{\sc Stuetzle, W.}, (1987), ``Plot windows,'' {\em J. Amer. Statist.
Assoc.}, vol. 82, pp. 466 - 475.
\bibitem{MyBook}
{\sc Tierney, Luke}, (1990) {\em LISP-STAT: Statistical Computing and
Dynamic Graphics in Lisp}. Forthcoming.
\bibitem{TK}
{\sc Tierney, L. and J. B. Kadane}, (1986), ``Accurate approximations
for posterior moments and marginal densities,'' {\em J. Amer. Statist.
Assoc.}, vol. 81, pp. 82-86.
\bibitem{TKK1}
{\sc Tierney, Luke, Robert E. Kass, and Joseph B. Kadane}, (1989),
``Fully exponential Laplace approximations to expectations and
variances of nonpositive functions,'' {\em J. Amer. Statist. Assoc.},
to appear.
\bibitem{TKK2}
{\sc Tierney, L., Kass, R. E., and Kadane, J. B.}, (1989),
``Approximate marginal densities for nonlinear functions,'' {\em
Biometrika}, to appear.
\bibitem{Multreg}
{\sc Weisberg, Sanford}, (1982), ``MULTREG Users Manual,'' Technical
Report 298, School of Statistics, University of Minnesota.
\bibitem{WinstonHorn}
Winston, Patrick H. and Berthold K. P. Horn, (1988), {\em LISP}, 3rd Ed.,
New York: Addison-Wesley.
\end{thebibliography}
\newpage
\appendix
\section{XLISP-STAT on UNIX Systems}
This tutorial has dealt primarily with the Macintosh version of
XLISP-STAT. XLISP-STAT is also available on UNIX systems. If it has
been installed in a directory in your search path you should be able
to start it up by typing
\begin{verbatim}
xlispstat
\end{verbatim}
at the UNIX shell level. There are a few differences between the
Macintosh and UNIX versions. On UNIX systems:
\begin{itemize}
\item
UNIX versions of XLISP-STAT are designed to run on a standard terminal
and therefore do not provide parenthesis matching or indentation
support. If you use the {\em GNU}\/ \dcode{emacs} editior you can
obtain both these features by running XLISP-STAT from within
\dcode{emacs}. Otherwise, for editing files with \dcode{vi} you can use the
\dcode{-l} flag to get some Lisp editing support.
\item
To quit from the program type
\begin{verbatim}
(exit)
\end{verbatim}
On most systems you can also quit by typing a {\em Control-D}.
\item
You can interrupt\index{interrupt} a calculation that is taking too
long or was started in error by typing a {\em Control-C}.
\item
Data and example files are stored in the \dcode{Data} and
\dcode{Examples} subdirectories of the library tree. The functions
\dcode{load-data} and
\dcode{load-examples} will look in these directories, so
\begin{verbatim}
(load-data "tutorial")
\end{verbatim}
will load the data sets for the tutorial. Within XLISP-STAT the
variable \dcode{*default-path*} shows the root directory of the
library; you can look there if you want to examine the example files.
\item
The \dcode{require} function can be used to load program modules not
loaded at startup. To load the nonlinear regression module, for
example, use the expression
\begin{verbatim}
(require "nonlin")
\end{verbatim}
\end{itemize}
On basic UNIX systems the only graphics available are the functions
\dcode{plot-points} and \dcode{plot-lines}. These functions assume you
are using a {\em Tektronix}\/ terminal or emulator.
\subsection{XLISP-STAT Under the {\em X11}\/ Window System}
Window based graphics are available in XLISP-STAT on a workstation
running the {\em X11}\/ window system. Graphics under {\em X11}\/ are
fairly similar to Macintosh graphics as documented in this tutorial. A
few points to note:
\begin{itemize}
\item
Plot menus are popped up using a menu button in the top right corner
of a plot.
\item
In plot interaction you can use any of the mouse buttons. Normally
clicking any button in a plot unselects all selected points. To extend
a selection, or have a rotating plot continue rotating after the
button is released, hold down the shift key as you press any mouse
button.
\item
How plot windows are opened in response to a graphics command depends
on the window manager you are using. Under \dcode{uwm}, the default
window manager on many systems, a little corner will appear which you
can use to choose the position of your plot window.
\item
Slider dialog items are the only items that assume a three button
mouse. In the central part of the slider the right button increases
and the left button decreases the slider value. The middle button
drags the thumb indicator.
\item
Postscript images of plots can be saved by selecting the \macbold{Save
to File} item on a plot menu. The postscript file can then be printed
using a standard printing command.
\end{itemize}
\subsubsection{More Advanced {\em X11}\/ Features}
You can have XLISP-STAT use an alternate display for its graphics by
setting the \dcode{DISPLAY} environment variable before starting
XLISP-STAT. At present this is the only way to set an alternate
display. You can specify alternate fonts and a few other options
using the {\em X11}\/ resource management facilities. Resources
controlling appearance are
\begin{center}
\begin{tabular}{ll}
\tt xlisp*menu*titles: &
{\em on}\/ for a title on a menu, {\em off}\/ otherwise \\
\tt xlisp*menu*font: \\
\tt xlisp*dialog*font: \\
\tt xlisp*graph*font:
\end{tabular}
\end{center}
There are also a few experimental options controlling performance.
These are
\begin{center}
\begin{tabular}{ll}
\tt xlisp*graph*fastlines: & {\em on}\/ to use $0$ width lines\\
\tt xlisp*graph*fastsymbols: &
{\em on}\/ to use \dcode{DrawPoints} instead of bitmaps\\
\tt xlisp*graph*motionsync: &
{\em on}\/ to use \dcode{XSync} during mouse motion\\
\end{tabular}
\end{center}
By default all three options are are {\em on}. That seems to give the
best performance on a {\em Sun 3/50}. It may not be the best choice on other
workstations. You can also use the function \dcode{x11-options} to
change these three options from within XLISP-STAT. The
\dcode{fastlines} option will not take effect immediately when changed
this way but will affect the next plot created. The other two options
do take effect immediately.
\subsection{XLISP-STAT Under the {\em SunView}\/ Window System}
Window based graphics are also available when XLISP-STAT is run on a
{\em Sun}\/ console running \dcode{suntools}. Graphics under
\dcode{suntools} work like graphics on the Macintosh with the
following changes:
\begin{itemize}
\item
To close or resize plots or dialogs use the frame menu or the standard
\dcode{suntools} shortcuts (e.~g. to resize a plot window drag the
frame with the middle mouse button while holding down the {\em
control}\/ key).
\item
Plot menus are popped up by pressing the right mouse button in the
interior of the plot. Check marks do not appear on menu items so it
may not always be clear what state an item is in.
\item
Clicking and dragging on the Macintosh corresponds to clicking and
dragging with the left mouse button. Shift-clicking or shift-dragging
on the Macintosh (to extend a selection or cause a rotating plot to
continue spinning when the button is released) corresponds to using
the middle mouse button.
\item
Postscript images of plots can be saved by selecting the \macbold{Save
to File} item on a plot menu.
\end{itemize}
\subsection{Running UNIX Commands from XLISP-STAT}
The \dcode{system} function can be used to run UNIX commands from
within XLISP-STAT. This function takes a shell command string as its
argument and returns the shell exit code for the command. For example,
you can print the date using the UNIX \dcode{date} command:
\begin{verbatim}
> (system "date")
Wed Jul 19 11:06:53 CDT 1989
0
>
\end{verbatim}
The return value is 0, indicating successful completion of the UNIX command.
\subsection{Dynamic Loading and Foreign Function Calling}
\index{dynamic loading}\index{foreign function calls}
Some UNIX implementations of XLISP-STAT also provide a facility to
allow you to use your own C functions of FORTRAN subroutines from
within XLISP-STAT. The facility, patterned after the approach used in
the New {\em S}\/ language \cite{newSbook}, consists of the function
\dcode{dyn-load} for loading your code into a running XLISP-STAT
process and the functions \dcode{call-cfun}, \dcode{call-fsub} and
\dcode{call-lfun} for calling subroutines in the loaded code. The
\dcode{dyn-load} function requires one argument, a string containing
the name of a file to be linked and loaded. The file will be linked
with standard C libraries before loading. If you need it to be linked
with the standard FORTRAN libraries as well you can give the keyword
argument \dcode{:fortran} the value \dcode{T}. Finally, if you need to
link other libraries you can supply a string containing the library
flags you would specify in a linking command as the value of the
keyword argument \dcode{:libflags}. For example, to include the
library \dcode{cmlib} use the string \dcode{"-lcmlib"}.\footnote{There
may be slight differences in the implementation of \dcode{dyn-load} on
different systems. The help information for this function should give
information that is appropriate for your system.}
The function \dcode{call-cfun} takes a string identifying the C
function you want to call, followed by additional arguments for your C
function. The additional arguments must be integers, sequences of
integers, real numbers or sequences of real numbers. Pointers to
\dcode{int} or \dcode{double} data containing these values will be
passed to your routine. After your routine returns the contents of the
data referred to by these pointers are copied into lists and
\dcode{call-cfun} returns a list of these lists.
As an example, suppose the file \dcode{foo.c} contains the following C
function:
\begin{verbatim}
foo(n, x, sum)
int *n;
double *x, *sum;
{
int i;
for (i = 0, *sum = 0.0; i < *n; i++) {
*sum += x[i];
}
}
\end{verbatim}
After compiling the file to \dcode{foo.o} we can load it into
XLISP-STAT using the expression
\begin{verbatim}
(dyn-load "foo.o")
\end{verbatim}
We can then call the function \dcode{foo} using a list of real numbers
as the second argument. The function \dcode{float} can be used to
coerce numbers to reals:
\begin{verbatim}
> (call-cfun "foo" 5 (float (iseq 1 5)) 0.0)
((5) (1 2 3 4 5) (15))
\end{verbatim}
The third argument to \dcode{foo} has been used to return the result.
The function \dcode{call-fsub} is used for calling FORTRAN subroutines that
have been loaded dynamically. A FORTRAN subroutine analogous to the C
function \dcode{foo} might be written as
\begin{verbatim}
subroutine foo(n, x, sum)
integer n
double precision x(n), sum
integer i
sum = 0.0
do 10 i = 1, n
sum = sum + x(i)
10 continue
return
end
\end{verbatim}
After compiling and loading this routine it can be called using
\dcode{call-fsub}:
\begin{verbatim}
> (call-fsub "foo" 5 (float (iseq 1 5)) 0.0)
((5) (1 2 3 4 5) (15))
\end{verbatim}
Two C functions you may want to call from within your C functions are
\dcode{xscall\_alloc} and \dcode{xscall\_fail}. The function
\dcode{xscall\_alloc} is like \dcode{calloc}, except it insures the
allocated memory is garbage collected after the call to
\dcode{call-cfun} returns. The function \dcode{xscall\_fail} takes a
character string as its argument. It prints the string and signals an
error.
The function \dcode{call-lfun} can be used to call C functions written
using the internal XLISP conventions for obtaining arguments and
returning results. This allows you to accept any kinds of arguments.
Unfortunately, the source code is the only documentation for the
internal calling conventions.
A note of caution may be appropriate at this point. Dynamically loaded
code contains only the error checking you build into it. If a function
is not called with the proper arguments it will most likely cause
XLISP-STAT to crash, losing any variables you have not saved.
At present the number of arguments you can pass to C functions or
FORTRAN subroutines using \dcode{call-cfun} or \dcode{call-fsubr} is
limited to 15.
If dynloading is not available on your system you can still recompile
XLISP-STAT with files of your own added to the source code. The
functions \dcode{call-cfun}, \dcode{call-fsubr} and \dcode{call-lfun}
can be used to call your functions or subroutines in this case as
well.
\newpage
\section{Graphical Interface Tools}
One of the characteristic features of the Macintosh user interface is
the use of menus and dialogs for interacting with the computer.
XLISP-STAT allows you to construct your own menus and dialogs using
Lisp commands. This appendix gives a very brief introduction to
constructing menus and dialogs; further details will be given in
\cite{MyBook}. A few of the explanations and examples in this appendix
use Lisp concepts that have not been covered in this tutorial.
\subsection{Menus}
As an illustration I will outline how to construct a menu for sending
some simple messages to a regression model. I will make the convention
that there is a {\em current regression model}, the value of the
symbol \dcode{*current-model*}.\index{menus}
Menus are created by sending the \dcode{:new} message to the menu
prototype, \dcode{menu-proto}. The method for this message takes a
single argument, the menu title. We can use this message to set up our
menu:
\begin{verbatim}
> (setf model-menu (send menu-proto :new "Model"))
#
\end{verbatim}
Macintosh menus can be installed in and removed from the menu bar by sending
them the \dcode{:install} and \dcode{:remove} messages:
\begin{verbatim}
> (send model-menu :install)
NIL
> (send model-menu :remove)
NIL
\end{verbatim}
On other systems menus are {\em popped up}; this can be accomplished by
sending the \dcode{:popup} message to a menu. This message requires
two arguments, the $x$ and $y$ pixel coordinates of the left top
corner of the menu, measured from the left top corner of the screen.
Initially the menu has no items in it. Items are created using the
\dcode{menu-item-proto} prototype. The initialization method requires one
argument, the item's title, and takes several keyword arguments, including
\begin{itemize}
\item
\dcode{:action} -- a function to be called when the item is selected
\item
\dcode{:enabled} -- true by default
\item
\dcode{:key} -- a character to serve as the keyboard equivalent
\item
\dcode{:mark} -- \dcode{nil} (the default) or \dcode{t} to indicate a check
mark.
\end{itemize}
Analogous messages are available for changing these values in existing
menu items.
Suppose we would like to be able to use our menu to print a summary of
the current model or obtain a residual plot. We can construct two menu
items:
\begin{verbatim}
> (setf summary (send menu-item-proto :new "Summary" :action
#'(lambda () (send *current-model* :display))))
#
> (setf plot (send menu-item-proto :new "Plot Residuals" :action
#'(lambda () (send *current-model* :plot-residuals))))
#
\end{verbatim}
Suppose we have assigned the \dcode{bikes2} model of Section
\ref{Regression} to \dcode{*current-model*}. You can force an item's
action to be invoked by sending it the
\dcode{:do-action} message from the listener:
\begin{verbatim}
> (send summary :do-action)
Least Squares Estimates:
Constant -16.41924 (7.848271)
Variable 0 2.432667 (0.9719628)
Variable 1 -0.05339121 (0.02922567)
R Squared: 0.9477923
Sigma hat: 0.5120859
Number of cases: 10
Degrees of freedom: 7
NIL
\end{verbatim}
Ordinarily you will not send this message this way: the system sends
this message to the menu item when you select the item from a menu.
To add these items to the menu use the \dcode{:append-items} message:
\begin{verbatim}
> (send model-menu :append-items summary plot)
NIL
\end{verbatim}
You can also use the \dcode{:append-items} message to add items to a
plot menu. The menu associated with a plot can be obtained by sending
the plot the \dcode{:menu} message with no arguments.
You can enable and disable a menu item with the \dcode{:enabled}
message:
\begin{verbatim}
> (send summary :enabled)
T
> (send summary :enabled nil)
NIL
> (send summary :enabled t)
T
\end{verbatim}
\subsection{Dialogs}
Dialogs are similar to menus in that they are based on a dialog
prototype and dialog item prototypes. There are, however many more
variations. Fortunately most dialogs you need fall into one of several
categories and can be produced by custom dialog construction
functions.\index{dialogs}
\subsubsection{Modal Dialogs}
Modal dialogs are designed to ask specific questions and wait until
they receive a response. All other interaction is disabled until the
dialog is dismissed -- they place the system in {\em dialog mode}. Six
functions are available for producing some standard modal dialogs:
\begin{itemize}
\item
\dcode{(message-dialog \param{string})} -- presents a message with an
\macbold{OK} button; returns \dcode{nil} when the button is pressed.
\item
\dcode{(ok-or-cancel-dialog \param{string})} -- presents a message with
an \macbold{OK} and a \macbold{Cancel} button; returns \dcode{t} or
\dcode{NIL} according to the button pressed.
\item
\dcode{(choose-item-dialog \param{string} \param{string-list})} -- presents
a heading and a set of radio buttons for choosing one of the strings.
Returns the index of the selected string on \macbold{OK} or \dcode{nil}
on \macbold{Cancel}. Example:
\begin{verbatim}
> (choose-item-dialog "Dependent variable:" '("X" "Y" "Z"))
1
\end{verbatim}
\item
\dcode{(choose-subset-dialog \param{string} \param{string-list})} -- presents
a heading and a set of check boxes for indicating which items to select.
Returns a list of the list of selected indices on \macbold{OK} or \dcode{nil}
on \macbold{Cancel}. Example:
\begin{verbatim}
> (choose-subset-dialog "Independent variables:" '("X" "Y" "Z"))
((0 2))
\end{verbatim}
\item
\dcode{(get-string-dialog \param{prompt} [:initial \param{expr}])} --
presents a dialog with a prompt, an editable text field, an \macbold{OK} and
a \macbold{Cancel} button. The initial contents of the editable field is empty
or the \dcode{princ} formated version of \param{expr}. The result is a
string or \dcode{nil}. Example:
\begin{verbatim}
> (get-string-dialog "New variable label:" :initial "X")
"Tensile Strength"
\end{verbatim}
\item
\dcode{(get-value-dialog \param{prompt} [:initial \param{expr}])} --
like \dcode{get-string-dialog}, except
\begin{itemize}
\item
the initial value expression is converted to a string with \dcode{print}
formatting
\item
the result is interpreted as a lisp expression and is evaluated
\item
the result is a list of the value, or \dcode{nil}
\end{itemize}
\end{itemize}
On the Macintosh there are two additional dialogs for dealing with files:
\begin{itemize}
\item
\dcode{(open-file-dialog)} -- presents a standard \macbold{Open File} dialog
and returns a file name string or \dcode{nil}. Resets the working folder on
\macbold{OK}.
\item
\dcode{(set-file-dialog \dcode{prompt})} -- presents a standard \macbold{Save
File} dialog. Returns a file name string or \dcode{nil}. Resets the working
folder on \macbold{OK}.
\end{itemize}
\subsubsection{Modeless Dialogs}
Two functions for constructing custom modeless dialogs are available
also. They are the functions \dcode{interval-slider-dialog} and
\dcode{sequence-slider-dialog} introduced above in Section
\ref{Fundefs}.
\include{techappendix}
\include{techindex}
\end{document}