This note is a first start at outlining the issues and starting a discussion on how to add name spaces to R.
[Whenever it says below ``XYZ is true in R'' this should be read as ``it is my, possibly completely incorrect, impression that XYZ is true in R''.]
Put another way, a name space mechanism would provide a way of
creating some static structure to the global variables used by a
package, static structure that protects the package functions from the
variation in the current dynamic global environment where loading
packages and attaching frames can lead to unintended name conflicts.
The base package is a case in point. Most functions that refer to a
free variable exp
, say, intend this to be the exp
variable in
the base package. Creating the function in a name space that uses the
base package will insure that this will be the case.
Global variables need not all come from name spaces. If a function defined in a name space uses a global variable that is not found in the name space or any name spaces it uses, then the standard dynamic global environment is used.
Since name spaces are most useful for organizing packages, a simpler option would be to specify that each package has (at least optionally) an associated name space. Details of how to specify features of the name space could be merged with the current file structure of packages. I will assume this approach in the remainder of this note.
INDEX
file in a package, or
an additional EXPORTS
file could be created.
The current DESCRIPTION
file could be augmented by allowing one
or more Imports
lines, or an IMPORTS
file could be added.
Optional features might include:
simpson
from a package integrate
but refer
to this function in our package as integral
.
Suppose we have a package mynorm
with code in a file mynorm.R
:
<mynorm/R/mynorm.R
>=
c1 <- 1/sqrt(2 * pi)
lc1 <- log(c1)
phi <- function(z) c1 * exp(-0.5 * z^2)
lphi <- function(z) lc1 - 0.5 * z^2
my.dnorm<-function(x, mu = 0, sigma = 1, log = F) {
z <- (x - mu) / sigma
if (log) lphi(z) - log(sigma)
else phi(z) / sigma
}
my.pnorm <- function(x, mu = 0, sigma = 1) {
z <- (x - mu) / sigma
integral(phi, -5, z);
}
We would like only my.pnorm
and my.dnorm
to be public, so the
EXPORTS
file would be
<mynorm/EXPORTS
>=
my.dnorm
my.pnorm
Within the package we can use top level definitions like those for
c1
, lc1
, and phi
and keep them private. The name space
mechanism should make sure that they are not visible outside the name
space and will not be obscured by other definitions of those symbols
in the global name space, for example in loaded packages.
The IMPORTS
file might look like
<mynorm/IMPORTS
>=
base
integrate integral=simpson
The global variable integral
is intended to refer to an
integration rule, say the one implemented by a function simpson
in
a package integrate
. Importing the base
name space (which
could be made the default) means we are specifying that the variables
pi
, exp
, *
, etc., refer to the variables defined in
base
, not any others that might exist in the global search path
ahead of base.
<sample function>= f <- function(x) x+1
and called as f(2)
is evaluated in an environment that looks like
this:
------------ | x = 2 | ------------ | Global Env | ------------If the function is defined in a package name space
foo
that imports
base
and bar
, then the evaluation environment for its body
could be made to look like this:
--------------- | x = 2 | --------------- | foo internals | --------------- | bar exports | --------------- | base | --------------- | Global Env | ---------------That is, instead of giving the function a null environment, representing the global environment as the place to search for free variables, it is given an environment consisting of the internal frame for its name space
foo
, followed by the exports frame for bar
and a frame representing the base
package (which currently has its
values stored in the SYMVALUE
cell), and then the global
environment.
A variation would use a representation like
--------------- | x = 2 | --------------- | foo -----|---->| foo internals | --------------- --------------- | Global Env | | bar exports | --------------- --------------- | base | ---------------This way environments would only directly refer to a name space, not its imports structure. This would simplify
save
/load
code.
This approach can be quite straight forward to implement or very
difficult, depending on the level of mutability allowed for the name
spaces. Implementation should be fairly simple if name spaces are
made read-only once the .First.lib
of their package has been run.
Allowing reloading of packages would complicate matters but might be
quite important.
assign
, allow new bindings to be
created, also by assign
, and allow bindings to be removed by
rm
(here the base environment is exempted). Name spaces could
have their mutability restricted by ruling out adding/removing of
bindings or by ruling out any mutation at all. Restrictions could be
applied to exported variable only or to both exported and internal
variables.
Imposing restrictions has a number of benefits:
Currently an environment frame looks like this (chains in a hashed frame are analogous):
----------------- ----------------- ------ | name | val | --|--->| name | val | --|--->|..... ----------------- ----------------- ------If name spaces are entirely immutable, then sharing can be based on creating new frames with identical values and the same structure can be used for export frames. If name spaces do not allow addition or removal of bindings but do allow assignment to existing bindings, then sharing has to be at the binding level. A representation for export frames like
--------------- --------------- ----- | name | | --|--->| name | | --|--->|..... --------------- --------------- ----- | | \|/ \|/ ---------------------- ---------------------- | orig. name | val | | | orig. name | val | | --------------------- ----------------------should be sufficient. Here the cells containing the actual binding and the original names would be the binding cells from the internal environment of the owning package name space.
It may still be desirable to mark export frames as immutable to prevent inadvertent assignments. Selective permission to assign could be made available to a reload function.
Allowing name space environment to be fully mutable would be
considerably more complicated to implement and also raise some tricky
semantic issues. For example, it is not clear what should happen in
the example above, say, if integral
was imported as referring to
simpson
in name space integrate
but simpson
was then
removed.
To allow a package name space to be constructed during loading its
internal name space frame must allow assignments that create bindings,
at least until loading is complete. Restrictions could be imposed
once .First.lib
has been run. The export frame could be prepared
in advance based on an EXPORTS
file and locked against adding or
removing bindings. A reload function would not be able to change the
exports except perhaps if no other name space depends on them.
foo
that uses bar
should cause bar
to be loaded but need
not necessarily make bar
part of the global name space.
With explicit declaration of all exports in an EXPORTS
file the
loading of the corresponding package, or perhaps of parts of a
package, could be done on demand as a kind of autoload.
Package loading could be split into two steps, load.library
and
attach.library
. If library
is called then
A list of package name spaces could be maintained as a weak list; that way once a package is not attached and no longer used by any other packages it can be garbage collected. Whether this should be done automatically or only as an option is not clear.
A reloading mechanism should be provided for package development and possibly for supporting purging large packages.
Some packages, data packages in particular, might not need a name
space; some way to specify this should be provided (e.g. no
EXPORTS
file).
save
. The save format would need to be
augmented to include descriptions of the packages needed by saved
functions. The loading of these packages could be deferred until
needed or attempted at load time. One issue is how to properly handle
the search path.
Compilation would also benefit from the ability to declare some
bindings as constant, though this would probably be most useful for
bindings in the base package. Making pi
a constant might allow
constant folding, and making exp
a constant might allow some
inlining.
Currently classes are specified by a class
attribute consisting of
strings. Method dispatch is based on concatenating generic function
and class names and searching for a function by the resulting name.
This search is currently done in .GlobalEnv
(not quite true but
true for all practical purposes). With name spaces along the lines
outlined here, this means that all methods for classes have to be
exported and their export frames attached for them to be found by
dispatching. It would be possible to have method search occur in the
caller's environment or in the non-local part of the caller's
environment (i.e. starting with the name space frames). This would
almost allow the definition of private classes that are used entirely
within a name space. But it would not allow the appropriate methods to
be found if the members of a private class had generic functions
called on them from outside the name space.
This situation is less than ideal, but is fairly simple to describe: classes and their methods must be considered global. We can probably live with this.
foo::bar
to access
the exported variable bar
in name space foo
?