There are three primary reasons behind my decision to produce the XLISP-STAT environment. The first is to provide a vehicle for experimenting with dynamic graphics and for using dynamic graphics in instruction. Second, I wanted to be able to experiment with an environment supporting functional data, such as mean functions in nonlinear regression models and prior density and likelihood functions in Bayesian analyses. Finally, I was interested in exploring the use of object-oriented programming ideas for building and analyzing statistical models. I will discuss each of these points in a little more detail in the following paragraphs.

The development of high resolution graphical computer displays has
made it possible to consider the use of dynamic graphics for
understanding higher-dimensional structure. One of the earliest
examples is the real time rotation of a three dimensional point cloud
on a screen -- an effort to use motion to recover a third dimension
from a two dimensional display. Other techniques that have been
developed include * brushing* a scatterplot -- highlighting points
in one plot and seeing where the corresponding points fall in other
plots. A considerable amount of research has been done in this area,
see for example the discussion in Becker and Cleveland
[4] and the papers reproduced in Cleveland
and McGill[8]. However most of the software
developed to date has been developed on specialized hardware, such as
the TTY 5620 terminal or Lisp machines. As a result, very few
statisticians have had an opportunity to experiment with dynamic
graphics first hand, and still fewer have had access to an environment
that would allow them to implement dynamic graphics ideas of their
own. Several commercial packages for microcomputers now contain some
form of dynamic graphics, but most do not allow users to customize
their plots or develop functions for producing specialized plots, such
as dynamic residual plots. XLISP-STAT provides at least a partial
solution to these problems. It allows the user to modify a scatter
plot with Lisp functions and provides means for modifying the way in
which a plot responds to mouse actions. It is also possible to add
functions written in C to the program. On the Macintosh this has to be
done by adding to the source code. On some unix systems it is also
possible to compile and dynamically load code written in C or FORTRAN.

An integrated environment for statistical calculations and graphics is essential for developing an understanding of the uses of dynamic graphics in statistics and for developing new graphical techniques. Such an environment must essentially be a programming language. Its basic data types must include types that allow groups of numbers -- data sets -- to be manipulated as entire objects. But in model-based analyses numerical data are only part of the information being used. The remainder is the model itself. Sometimes a model is easily characterized by specifying a set of numbers. A normal linear regression model with errors might be described by the number of covariates, the coefficients and the error variance. On the other hand, in many cases it is easier to specify a model by specifying a function. To specify a normal nonlinear regression model, for example, one might specify the mean function. If our language is to allow us to specify this function within the language itself then the language must support a functional data type with full rights: It has to be possible to define functions that manipulate functions, return functions, apply functions to arguments, etc.. The choice I faced was to define a language from scratch or use an existing language. Because of the complexity of issues involved in functional programming I decided to use a dialect of a well understood functional language, Lisp. The syntax of Lisp is somewhat unfamiliar to most users of statistical packages, but it is easy to learn and several good tutorials are available in local book stores. I considered the possibility of using Lisp to write a top level interface with a more ``natural'' syntax, but I did not see any way of doing this without complicating access to some of the more powerful features of Lisp or running into some of the pitfalls of functional programming. I therefore decided to retain the basic Lisp top level syntax. To make the manipulation of numerical data sets easier I have redefined the arithmetic operators and basic numerical functions to work on lists and arrays of data.

Having decided to use Lisp as the basis for my environment XLISP was a
natural choice for several reasons. It has been made available for
unrestricted, non-commercial use by its author, David Betz. It is
small (for a Lisp system), its source code is available in C, and it
is easily extensible. Finally, it includes support for object-oriented
programming. Object-oriented programming has received considerable
attention in recent years and is particularly natural for use in
describing and manipulating graphical objects. It may also be useful
for the analysis of statistical data and models. A collection of data
and assumptions may be represented as an * object*. The model
object can then be examined and modified by sending it * messages*.
Many different kinds of models will answer similar questions, thus
fitting naturally into an * inheritance structure*. XLISP-STAT's
implementation of linear and nonlinear regression models as *
objects*, with nonlinear regression inheriting many of its *
methods* from linear regression, is a first, primitive attempt to
exploit this programming technique in statistical analysis.

Tue Jan 21 15:04:48 CST 1997