Maintaining Dynamic State in R

Luke Tierney
School of Statistics
University of Minnesota

Introduction

There are times when it is useful to set a globally accessible value for the duration of a computation. The mechanism most commonly used in R is to store the value in options and then use something like

op<-option()
on.exit(options(op))
options(abc = newvalue)
... some calculation ...
on.exit()

There are several problems with this approach:

It is a shallow binding approach that will not work well with concurrent computations.
It does not lend itself well to name space management.
it is coarse-grained since the on.exit mechanism applies to a function as a whole.
It seems a bit awkward and error prone (to me at least).

An alternative approach is to adapt the mechanism used in MzScheme. They call their dynamic variables parameters, but that is clearly not a good choice for R. I will call them dynamic variables.

This note presents a prototype implementation of dynamic variables. It could easily be merged into the internal R context mechanism to make it more efficient, and I do not think it would be too hard to implement in S-plus. The implementation is available as a package.

Interface

The interface consists of three functions, dynamic.variable, dynamic.bind, and dynamic.bind.list. dynamic.variable creates a new dynamic variable and gives it an initial binding with the single argument to dynamic.variable as its value. For example,

v <- dynamic.variable(1)

creates a dynamic variable with initial value 1.

Dynamic variable are in fact functions. Their current values are obtained by calling them with no argument, and the value of their current binding is changed by calling them with the new value as a an argument:

> v()
[1] 1
> v(2)
> v()
[1] 2

dynamic.bind is called as

dynamic.bind(expr, dv1 = a1, dv2 = a2, ...)

It evaluates expr and returns the result. During the evaluation, the dynamic variable dv1 is bound to the value of a1, dv2 to a2, and so on. After the evaluation of expr and before returning the result, the previous bindings of the dynamic variables are restored. For example,

> v()
[1] 2
> dynamic.bind(v(), v = 3)
[1] 3
> v()
[1] 2

Changing the value of a dynamic variable within a dynamic.bind only changes the current binding, not any enclosing bindings:

> v()
[1] 2
> dynamic.bind({cat("v =", v(),"\n"); v(4); cat("v =", v(), "\n")}, v = 3)
v = 3 
v = 4 
> v()
[1] 2

dynamic.bind.list is a lower level version of dynamic.bind that allows the variables to use to be computed. It is called as

dynamic.bind.list(expr, list.of.variables, lits.of values)

For example,

> v()
[1] 2
> dynamic.bind.list(v(), list(v), list(3))
[1] 3
> v()
[1] 2

dynamic.bind.list is most useful for implementing higher level functions like dynamic.bind.

By default, dynamic variables are created with a unique identifier that should insure that saving a variable in two different workspaces and restoring it again will produce the same variable from both workspaces. There may be times however, when we want to be able to create the same variable by evaluating two separate expressions. To achieve this, we can give the variable a name when we create it:

w <- dynamic.variable(1, name = "fred")

evaluating the constructor expression again will create the same variable:

> u <- dynamic.variable(1, name = "fred")
Warning message: 
dynamic variable "fred" already exists in: dynamic.variable(1, name = "fred") 
> w() 
[1] 1
> u()
[1] 1
> u(2)
> u()
[1] 2
> w()
[1] 2

This should be useful for dynamic variables defined in packages. It does raise a problem of possible name conflicts. It may be useful to provide some explicit support for dynamic variables as part of the namespace system; I need to think about that a bit. Until that is resolved, a useful convention would be to define a dynamic variable foo in package bar to have name bar::foo,

foo <- dynamic.variable(name = "bar::foo")

Some Examples

Providing Information While Loading Libraries

The .First.lib package initialization routine is given two arguments, libname and pkgname. Occasionally it is useful to also have the environment frame the package is being loaded into. This information can be computed, but this is awkward and would not work with the name space mechanism I proposed.

An alternative would be to provide a dynamic variable, perhaps

loading.package.frame <- dynamic.variable()

and then change the code in library for running the .First.lib function to

if (exists(".First.lib", envir = env, inherits = FALSE)) {
    firstlib <- get(".First.lib", envir = env, inherits = FALSE)
    tt <- try(dynamic.bind(firstlib(which.lib.loc, package),
              loading.package.frame = env))
    ...

A .First.lib function can then be defined as

.First.lib <- function(lib, pkg) {
    frame <- loading.package.frame()
    ...
}

Passing Callbacks To C or Fortran Code

Some C or Fortran code, for example optimization code, is designed to accept a function pointer as a callback. Using an R function as a callback is feasible if in addition a data pointer is accepted and passed to the callback. But some code does not provide for this. It is possible to pass the R function through a global C variable, but this awkward and will fail in a threaded context. An alternative is to define a dynamic variable to hold the user callback and provide a fixed R callback that uses this variable. The R interface would look something like

user.callback = dynamic.variable()

do.callback <- function(x) {
    fun <- user.callback()
    fun(x)
}

myopt <- function(fun, x)
    x <- as.double(x)
    n <- as.integer(length(x))
    dynamic.bind(.C("myopt", x, n),
                 user.callback = fun)

with the C code

double callback(double *x, int n) {
    /* call R function do.callback with argument vector x */
    ...
}

void myopt(double *x, int *n) {
    ...
    Copt(callback, x, *n)
    ...
}

Options as Dynamic Variables

Once we add threading support there are a number of options that should be specific to a thread. Even without threads, in the Netscape plug-in and in places like the running of finalizers or of event handlers we have separate bits of code that should be run under separate options settings. It would also be convenient to have a simple mechanism for temporarily changing an option setting for the duration of a calculation without having to manipulate the on.exit setting to insure that the options are properly reset on error.

One way to deal with this would be to represent options as dynamic variables. Thus options(foo=x) would create a new dynamic variable if the option does not exist and set it if one does exist. options() and getOption would retrieve the values of the dynamic variables.

Suppose we used this approach and had a function getOptionDynvar for retrieving the variable. Then we could, for example, use

showerr <- getOptionDynvar("show.error.messages")
dynamic.bind(try(expr), showerr = FALSE)

to evaluate a try expression with error printing turned off.

To make this sort of thing more convenient we could use dynamic.bind.list to define a function with.options to allow the previous example to use

with.options(try(expr), show.error.messages = FALSE)

Assuming a function get.options.variables that returns the dynamic variables corresponding to a vector of opitons names, we could define with.options something like

<with.options definition>=
with.options <- function(expr, ...) {
    values <- list(...)
    variables <- get.options.variables(names(values))
    dynamic.bind.list(expr, variables, values)
}

Defines with.options (links are to index).

Switching to using dynamic variables for options would require us to get rid of the .Options vector or to make it a sort of special object that prints like a vector and has its [, [[, $, and the corresponding assignment mechods defined to use the dynamic variables. I don't think this would be too hard to do, but I have not thought it through completely.

One trick for preserving the .Options variable would be to define it something like this:

<possible .Options definition>=
\begin{verbatim}
.Options <- local({
    optfun <- function() {
        assign(".Options", delay(optfun()), env=NULL)
        options()
    }
    delay(optfun())
})

Defines .Options (links are to index).

This installs a promise which reinstalls itself before returning the result of options().

Standard Connections as Dynamic Variables

The sink function redirects output to a specified connection. In a threaded environment this should only be done for the current thread; similarly for an event handler this should only affect the event handler context.

One way to manage the context where output is redirected would be to have dynamic variables representing the standard connections, say

input.connection <- dynamic.variable(getConnection(0))
output.connection <- dynamic.variable(getConnection(1))
error.connection <- dynamic.variable(getConnection(2))

sink(file) would then internally do the equivalent of

output.connection(file)

to change the dynamic binding of the output connection. Code writing output would get the connection to use by

con <- output.connection()
... write to con ...

sink also manages a stack of redirections. This could again be handled with a dynamic variable, since it would usually make sense for the redirection stack to be specific to a single thread or execution context.

Notes

Since dynamic variables are represented as functions stored in ordinary variables, they introduce no new name space management issues.

In a threaded context, the intent is that dynamic bindings created with dynamic.bind should only be visible in the current thread.

It might be useful to be able to mark a dynamic variable as bind-only. That is, changing the value with v(x) is not allowed but creating a new binding for v with dynamic.bind is permitted.

When a dynamic variable is stored in a workspace, the value saved is the initial global value. I'm not sure if it would be more appropriate to save the current value, or maybe have a mechanism for choosing, but at the moment this would be difficult to implement.

There is also a small possibility that restoring a dynamic variable from a saved workspace will fail.

Implementation

The implementation currently consists entirely of R code.

<dynvars.R>=
<global variables>
<internal functions>
<public functions>

The corresponding NAMESPACE file would be

<NAMESPACE>=
export(dynamic.variable, dynamic.bind, dynamic.bind.list)

The bindings of dynamic variables are stored in environments. Every dynamic variable has a global binding in the environment dynvars.database. This environment is created with attach/detach to insure that it is hashed.

<global variables>= (<-U)
dynvars.database <- local({ env <- attach(NULL); detach(2); env})

Defines dynvars.database (links are to index).

Dynamic bindings are created using deep binding by adding new environment frames onto an existing dynamic environment.

<internal functions>= (<-U) [D->]
new.dynamic.env <- function()
    eval(quote((function() environment())()), env=get.dynamic.env())

Defines new.dynamic.env (links are to index).

When dynamic.bind creates a new dynamic environment, it stores it in a variable with a reasonably unique name in its frame. The current dynamic environment is thus either the value of the first variable by this name found on the frame stack or the global dynamic environment:

<internal functions>+= (<-U) [<-D->]
get.dynamic.env <- function() {
    name <- "__DYNVAR_ENV__"
    n <- sys.nframe()
    if (n > 1)
        for (i in (n-1):1) {
            env <- sys.frame(i)
            if (exists(name, env = env))
                return(get(name, env = env))
        }
    dynvars.database
}

Defines get.dynamic.env (links are to index).

dynamic.bind first forces the evaluation of the ... argument to insure that its expressions are evaluated in the calling dynamic environment. Next, it creates and installs a new dynamic binding frame. It then gets the dynamic variables specified by the names in the ... argument and binds them in the new dynamic environment to the specified values. Then expr is evaluated and its result is returned. On return the new dynamic environment goes out of scope and thus the previous environment is restored. The mechanism for creating new bindings for dynamic variables is explained below.

<public functions>= (<-U) [D->]
dynamic.bind <- function(expr, ...) {
    values <- list(...)
    "__DYNVAR_ENV__" <- denv <- new.dynamic.env()
    penv <- parent.frame()
    names <- names(values)
    for (i in seq(along = names))
        get(names[[i]], env = penv)(values[[i]], dynamic.environment = denv)
    expr
}

Defines dynamic.bind (links are to index).

dynamic.bind.list differs from dynamic.bind only in the way the variables and values are supplied.

<public functions>+= (<-U) [<-D->]
dynamic.bind.list <- function(expr, variables, values) {
    variables <- as.list(variables) # forces evaluation in the
    values <- as.list(values)       # caller's dynamic context
    "__DYNVAR_ENV__" <- denv <- new.dynamic.env()
    for (i in seq(along = variables))
        variables[[i]](values[[i]], dynamic.environment = denv)
    expr
}

Defines dynamic.bind.list (links are to index).

dynamic.bind can be defined in terms of dynamic.bind.list but this requires allocating a list of variables. An internal implementation should avoid this.

<alternate definition of dynamic.bind>=
dynamic.bind <- function(expr, ...) {
    values <- list(...)
    variables <- lapply(names(values), get, env = parent.frame())
    dynamic.bind.list(expr, variables, values)
}

Defines dynamic.bind (links are to index).

Dynamic variables store their values under a name. Unless a name is supplied, a name is chosen that is constructed to be unique. For save/load to work uniqueness should be guaranteed across processes and machines. This is of course not perfectly achievable in any reasonable way, but the accepted way of getting close enough is to use a DCE universally unique identifier (UUID). Most systems have a way of generating these; most current UNIX/Linux system seems to have uuidgen and the libuuid library (FreeBSD and Mac OS X seem to be exceptions, but presumably we could get a libuuid for those). MS Windows may have UUID's directly as well (in fact Cygwin and the MinGW toolkit we use both seem to contain libuuid). Windows does have globally unique identifiers (GUID) which serve the same purpose and may actually be exactly the same, i.e. it may be that a UUID and a GUID are as likely to clash as two UUID's or two GUID's but I'm not sure. So the right way to do this would be something along the lines of

<UUID version of make.dynvar.name>=
make.dynvar.name <- function()
    paste("__DYNVAR__UUID__", system("uuidgen", TRUE), sep = "")

Defines make.dynvar.name (links are to index).

but using a call into libuuid instead of the system call. But all this requires some configuration adjustments and the like, so in the interim I'll use something less reliable but easier to implement:

<internal functions>+= (<-U) [<-D->]
make.dynvar.name <- function() {
    ur <- function() floor(runif(1,max=2^32-1))
    repeat {
        name <- paste("__DYNVAR__",ur(), ur(), ur(), ur(), sep = "")
        if (! exists.dynvar.name(name))
            return(name)
    }
}

Defines make.dynvar.name (links are to index).

Either as part of this primitive implementation or as part of loading a variable from a saved workspace, we need to check if the name exists in the data base:

<internal functions>+= (<-U) [<-D->]
exists.dynvar.name <- function(name)
    exists(name, env = dynvars.database)

Defines exists.dynvar.name (links are to index).

Creating a dynamic variable involves finding a unique name for it (unless a name is supplied), initializing its global binding, and creating its function. A warning is given if a variable by the chosen name already exists. Users should only call the dynamic variable function with zero or one arguments. But dynamic.bind needs a way of getting the dynamic variable to set the value of its new binding, and this is done by allowing an environment to be passed with a named argument. The named argument is placed after ... to insure it will only be matched if supplied explicitly with that name.

<public functions>+= (<-U) [<-D]
dynamic.variable <- function(init = NULL, name = make.dynvar.name()) {
    if (exists(name, env = dynvars.database))
        warning(paste("dynamic variable \"", name, "\" already exists",sep=""))
    assign(name, init, env = dynvars.database)
    f <- function(newval, ..., dynamic.environment)
        do.dynvar(name, newval, init, dynamic.environment)
    <hack to work around environment removal in library>
    f
}

Defines dynamic.variable (links are to index).

Ideally we could just rely on capturing the name and initial values in the environment. Unfortunately library currently eliminates environments in functions when loading a package, so this will not work if a dynamic variable is created in a package. To work around this for now we can replace the body of the function by one where the appropriate values have been inserted with substitute.

<hack to work around environment removal in library>= (<-U)
body(f) <- 
    substitute(do.dynvar(name, newval, quote(init), dynamic.environment),
               list(name = name, init = init))

The code portion of a dynamic variable is contained in do.dynvar. If a dynamic variable is restored from a saved workspace then its name will not be registered in the global dynamic environment. Ideally we should deal with this by running some de-serialization code at load time, but we do not yet have a mechanism for this. Instead, every use checks to make sure that there is a global definition available. If there is, then either the variable has already been initialized or there is a clash of the names. I don't think there is currently a sensible way to distinguish these two cases, so we could get a silent error here. Using names based on UUID's or GUID's would essentially eliminate this possibility. If no global definition is available then the variable has been loaded from a saved workspace but not yet initialized, so it is initialized with the initial value supplied to the dynamic.variable call that created the variable. The remainder of the code corresponds to the three types of calls that can be made to the variable:

A user call to get the value of the current dynamic binding.
A user call to change the value of the current dynamic binding.
An internal call from dynamic.bind to initialize the value of a new dynamic binding.

<internal functions>+= (<-U) [<-D]
do.dynvar <- function(name, value, init, dynamic.environment) {
    if (! exists.dynvar.name(name))
        assign(name, init, env = dynvars.database)
    if (missing(value))
        get(name, env = get.dynamic.env())
    else if (missing(dynamic.environment))
        assign(name, value, env = get.dynamic.env(), inherits = TRUE)
    else
        assign(name, value, env = dynamic.environment, inherits = FALSE)
}

Defines do.dynvar (links are to index).

<alternate definition of dynamic.bind>: D1
<dynvars.R>: D1
<global variables>: U1, D2
<hack to work around environment removal in library>: U1, D2
<internal functions>: U1, D2, D3, D4, D5, D6
<NAMESPACE>: D1
<possible .Options definition>: D1
<public functions>: U1, D2, D3, D4
<UUID version of make.dynvar.name>: D1
<with.options definition>: D1

do.dynvar: U1, U2, D3
dynamic.bind: U1, U2, D3, U4, D5
dynamic.bind.list: U1, U2, D3, U4
dynamic.variable: U1, D2
dynvars.database: D1, U2, U3, U4, U5
exists.dynvar.name: U1, D2, U3
get.dynamic.env: U1, D2, U3
make.dynvar.name: D1, D2, U3
new.dynamic.env: D1, U2, U3
.Options: D1
with.options: D1