A Very Simple Prototype of Exception Handling in R

Luke Tierney
School of Statistics
University of Minnesota

Introduction

After some discussions with Robert Gentleman and Duncan Temple Lang I realized that we should have enough basic building blocks to create a prototype of an exception handling mechanism (almost) entirely within R. A proper implementation would benefit greatly from some amount of integration at the C level, but it looks like we can get by without that for basic testing purposes. This note and first pass at such a beast. It is available as a package. This is only intended as a starting point for discussion and for trying out ideas, not as a definitive approach.

There are two fairly gross hacks needed to make this go. One is a means of taking over errors signaled using stop or the internal error and errorcall functions and transferring them to the exception mechanism. This is done by hijacking the error and show.error.message options (i.e. user settings of those options will be ignored when inside the exception handling mechanism). The second hack is needed to notify the internal error handling code that the exception handler mechanism has taken over and we are no longer in internal error handling code (i.e. the static variable inError in errors.c needs to be set to zero). This is accomplished by calling Rf_resetStack, which sets inError, and undoing everything else this routine does. (Which means this will break spectacularly when the internals of that routine change, but hopefully by then we may have a better way to do this.)

Interface

The mechanism implemented here is quite similar in many ways to Java's mechanism and also has some similarities to Common Lisp conditions. Exceptions are objects inheriting from the abstract class exception. The class simple.exception is the class currently used by stop and all internal error signals. The constructor by the same name takes a string describing the exception as argument and returns a simple.exception object.

Uncaught exceptions will be processed by the generic function default.exception.handler. The method for class exception calls stop with the result of applying as.character to the exception argument.

The function try.catch is used to catch exceptions. Its usage is

try.catch(expr, ..., finally = NULL)

It evaluates its expression argument in a context where the handlers provided in the ... argument are available. The finally expression is then evaluated in the context in which try.catch was called; that is, the handlers supplied to the current try.catch call are not active when the finally expression is evaluated.

Handlers provided in the ... argument to try.catch are established for the duration of the evaluation of expr. If no exception is raised when evaluating expr then try.catch returns the value of the expression.

If an exception is raised while evaluating expr then established handlers are checked, starting with the most recently established ones, for one matching the class of the exception. If a handler is found then control is transferred to the try.catch call that established the handler, the handlers in that call are dis-established, the handler is called with the exception as its argument, and the result returned by the handler is returned as the value of the try.catch call.

If an exception is raised and no handler is found, then the default handler is called. The context of the call is either the outer-most try.catch call or the raise.exception call if there is no surrounding try.catch call.

When a try.catch call is on the stack, calls to stop and errors signaled internally are converted into exceptions of type simple.exception and raised by raise.exception.

The only form of non-local transfer of control that try.catch can catch is raising of exceptions. It cannot capture the other three possibilities: calls to return, break, and next (there are actually one or two more, depending on how you count, that also are not caught). Whether these should be made, conceptually at least, into a kind of exception, or perhaps into something inheriting off a superclass of exception (in the spirit of Java's Throwable class) is not clear. try and restart currently also do not catch these three, but perhaps they should.

The printed representation of converted exceptions produced for stop calls and internal errors is less that ideal (as is the method of trapping and converting them) but this should do for getting a feel for how this might work.

Some Examples

When the expression evaluated by try.catch does not raise an exception, the value of the expression is returned. If a finally expression is provided, it is evaluated before return:

> try.catch(1, finally=print("Hello"))
[1] "Hello"
[1] 1

A simple exception can be constructed and raised by explicitly calling the constructor and passing the result to raise.exception:

> e<-simple.exception("test exception")
> raise.exception(e)
Error in default.exception.handler.exception(e) : 
        unhandled exception (simple.exception): test exception

Alternatively, raise.exception can be called with a string argument:

> raise.exception("test exception")
Error in default.exception.handler.exception(e) : 
        unhandled exception (simple.exception): test exception

If an exception is raised in an expression protected by a try.catch and the exception is not caught, then the finally expression is evaluated before passing the exception to the default handler:

> try.catch(raise.exception(e), finally=print("Hello"))
[1] "Hello"
Error in default.exception.handler.exception(e) : 
        unhandled exception (simple.exception): test exception

Internal errors and calls to stop are converted to simple exceptions:

> try.catch(stop("fred"), finally=print("Hello"))
[1] "Hello"
Error in default.exception.handler.exception(e) : 
        unhandled exception (simple.exception): Error in catch("__TRY_CATCH__", list(value = expr, throw = FALSE)) : 
        fred

The printed representation of converted exceptions leave something to be desired.

If an exception is raised and a handler is provided, then the result of calling the handler function with the exception object as argument is returned:

> try.catch(raise.exception(e),
            exception = function(e) e,
            finally=print("Hello"))
[1] "Hello"
<simple.exception: test exception>
> try.catch(stop("fred"),
            exception = function(e) e,
            finally=print("Hello"))
[1] "Hello"
<simple.exception: Error in catch("__TRY_CATCH__", list(value = expr, throw = FALSE)) : 
        fred

Some Issues

More Than Exceptions?

In both Java and Common Lisp errors or exceptions are part of a larger hierarchy of things that might be raised or thrown. In Common Lisp they are called conditions; Java calls them throwables. In Java raising a throwable is the only means of initiating a non-local exit.

There are (at least) five other forms of non-local exits

calls to return, break, and next
giving the Q command in the browser
calling quit or sending a user signal

Should we have a class throwable that is a superclass of exception and can be used to implement all non-local exits? Doing this could have the benefit of cleaning up some mechanisms we currently have. For example, we could define a class exit with a constructor of the same name that takes the same three arguments as q and define q as

q <- function(save = "default", status = 0, runLast = TRUE)
    raise.throwable(exit(save, status, runLast))

and have the R main loop defined something like

mainloop <- function()
    repeat
        try.catch(try.catch(repl(),
                            exception=default.exception.handler),
                  exception = function(e) cat("uncaught exception"),
                  exit = function(e) if (ask.about.exit()) break)

Here repl would be the actual read-eval-print loop. The outer exception handler is intended to be be as fail-safe as possible since the next stop would have to be an exit from mainloop.

[This is too simplistic even for the current level of concurrency we have and certainly would not work with threads, but the point is we could get clearer semantics for the shutdown process if we do things this way. At the moment, if you cancel a quit things don't seem to go back quite to normal.]

One other reason for explicitly allowing us to manage all non-local exiting is embedding and callbacks. Suppose an R function foo does .C("bar", ...) and bar calls back into R to a function baz. If baz calls stop or executes a return from foo a longjmp will occur that skips over the bar C level call frames. A lot of C code is robust to this sort of thing but a lot isn't. In cases where the C code isn't it would be nice to have a means of trapping all transfers of control, something like calling baz with

try.catch(list(value=baz(...), throw=FALSE),
          throwable=function(e) list(value=e,throw=TRUE))

This would allow the C code to then be given an exit status and to do whatever cleanup it needs to do before returning to foo. foo can then either ignore the throwable it receives, or re-throw it.

Currently we only have try, which is based on restart, to do this sort of thing at the R level. restart and hence try only catch errors, not return, break, or next calls; leaving browser with Q also goes through a restart. Whether this should be changed to make restart more absorbent is not clear. At the C level we currently have another option, which is to establish a new toplevel context for the call. This is particularly appropriate for things like running finalizers or processing events where we are at least conceptually running in a concurrent thread (to support this there are other stack examinations that should stop at the first toplevel context they find as well). It may not be the right approach for nested callbacks where we are conceptually nested deep in the call stack.

Debugging Information

Calls to stop and errorcall contain the call that raised the error. This could be folded into the exception mechanism. All internal error calls currently also save the traceback information. It would probably make sense to provide this sort of information to exceptions as well. Java has a somewhat peculiar approach to this. The constructor of throwables calls a method called fillInStackTrace (or something like that) that fills in a stack trace based on where the constructor is called. This stack trace is then part of the exception and can be printed by a handler. This approach assumes that errors will always be signaled by creating an exception object on the spot. This is not always feasible; for out-of-memory errors it would be a good idea to allocate the exception object in advance.

We should probably do something along these lines, but exactly what is not clear.

Warnings

The mechanism provided here allows only for exiting handlers, handlers where control is transferred to the point where the handler was defined before the handler is called. Common Lisp and Dylan also allow for calling handlers. These are called in the context where the condition (which is what they call the parent class of exceptions or errors) is raised. This makes them similar to UNIX signal handlers. One advantage of having calling handlers is that warnings, which usually do not cause a transfer of control, can be implemented as part of the exception handling mechanism. Both Common Lisp and Dylan do this. We could do this as well.

Implementation

The implementation consists of R code and a single C function.

<simpexcept.R>=
<catch and throw>
<dynamically scoped variables>
<exceptions>
<raising and catching exceptions>
<converting internal exceptions>
.First.lib <- function(lib, pkg)
    library.dynam(pkg, pkg, lib)

<simpexcept.c>=
#include "Rinternals.h"
<ResetErrorHandling definition>

Common Lisp-Like Catch and Throw

Transfer of control is handled by the pair of functions catch and throw, which are patterned after the Common Lisp special operators with the same names.

Catch uses local variables with hopefully unique names to save its tag argument, provide a means for throw to transfer control to the catch call, and provide a place for throw to place a value. (It would be better to have a more structured mechanism for maintaining dynamic state, some form of dynamically scoped variables.) The means for transferring control is a closure that captures a promise containing a return expression. Calling the closure evaluates the return expression which then causes a return from the catch call.

<catch and throw>= (<-U) [D->]
catch<-function(tag, expr) {
    "__CATCH_TAG__" <- as.character(tag)
    "__CATCH_THROWER__"<- make.thrower(return(get("__CATCH_VALUE__")))
    "__CATCH_VALUE__" <- NULL
    expr
}

make.thrower <- function(expr) function() expr

Defines catch, make.thrower (links are to index).

The throw function evaluates and saves the value of the expr argument and then searches for an active catch for the specified tag. The search begins with the current frame and will find the most recently established catch if there is more than one. An error is signaled if no matching catch is found. The value of the expression argument is placed in the value variable in the frame of the catch call, the throwing closure created by make.thrower in the catch call is obtained and called. This call evaluates the captured return expression and causes a return from the catch expression.

<catch and throw>+= (<-U) [<-D]
throw<-function(tag, expr, no.tag = tag.not.found) {
    value <- expr # forces evaluation of expr
    tag <- as.character(tag)
    env <- get.target.frame(tag, "__CATCH_TAG__")
    assign("__CATCH_VALUE__", value, env = env)
    fun <- get("__CATCH_THROWER__", env = env)
    fun()
}

##****parent.env with arg > 1 doesn't seem to work as advertised??
get.target.frame <- function(tag, name) {
    n <- sys.nframe()
    if (n > 1)
        for (i in (n-1):1) {
            env <- sys.frame(i)
            if (exists(name, env = env) &&
                get(name, env = env) == tag)
                return(env)
        }
    stop(paste("no catch for tag \"", tag, "\"", sep = ""))
}

Defines get.target.frame, throw (links are to index).

Matching of tags is done with ==, so tags can be any sort of object for which this makes sense.

Some simple tests:

<tests>= [D->]
catch("x", 1)
catch("x", { throw("x", 1); 2 })
catch("x", throw("y", 1))

Dynamically Scoped Variables

We need to be able to check for the existence of a try.catch call on the stack without causing an error. This is done by having try.catch create a binding for a variable with a reasonably unique name and searching for the existence of such a binding in the frame stack. The function dynamic.exists does this search.

<dynamically scoped variables>= (<-U)
dynamic.exists <- function(name) {
    n <- sys.nframe()
    if (n > 1)
        for (i in (n-1):1)
            if (exists(name, env = sys.frame(i)))
                return(TRUE)
    FALSE;
}

Defines dynamic.exists (links are to index).

This function could be viewed as part of a mechanism for managing dynamically scoped variables. The rest of such a mechanism would consist of dynamic.get for getting the value of the current dynamic binding, dynamic.assign for changing the value, and either a separate mechanism for creating new dynamic bindings or a mechanism or a variant of dynamic.assign that does this. It might also be useful to use a separate mechanism for holding the bindings, a separate environment for example, to avoid the possibility of name clashes. Proper interaction with name spaces would also need a look. It may well be that MzScheme's parameter mechanism would work well for us.

Exceptions

Exceptions are represented as objects that inherit from class exception. They are assumed to have a method for as.character that produces a printed representation describing the exception that occurred. The print method for exceptions uses as.character to produce a representation of the exception.

<exceptions>= (<-U) [D->]
print.exception <- function(e, ...)
    cat("<", class(e)[1], ": ", as.character(e), ">\n", sep="")

Defines print.exception (links are to index).

The exception class is a virtual class. It has no constructor, and its as.character method warns that this method should be overridden by non-virtual classes. I'm not sure this is really a good idea, but it seemed useful during development.

<exceptions>+= (<-U) [<-D->]
as.character.exception <- function(e, ...) {
    warning(paste("no as.character method for exception of class", class(e)))
    as.character(unclass(e))
}

Defines as.character.exception (links are to index).

One concrete exception class is provided: simple.exception. Simple exceptions contain a string slot that is used to provide the printed representation.

<exceptions>+= (<-U) [<-D->]
simple.exception <- function(string) {
    class <- c("simple.exception", "exception")
    structure(list(string=as.character(string)), class=class)
}

as.character.simple.exception <- function(e, ...)
    e$string

Defines as.character.simple.exception, simple.exception (links are to index).

When an exception occurs for which no handler is available, the generic function default.exception.handler is called. The method for the exception class just calls stop.

<exceptions>+= (<-U) [<-D]
default.exception.handler <- function(e)
    UseMethod("default.exception.handler", e)

default.exception.handler.exception <- function(e)
    stop(paste("unhandled exception (", class(e)[1],
               "): ", as.character(e), sep = ""))

Converting Internal Exceptions

To merge errors signaled with stop and with the internal error and errorcall functions into the exception mechanism I use a fairly gross device: I turn off error message printing with the show.error.messages option and set the error option to call a function that grabs the error string, creates a simple exception with this string, and raises the exception. Since the internal variable inError in error.c is set before the error option expression is evaluated, I need to reset this if the error is handled; this is where the ResetErrorHandling C routine comes in. With this approach calls to stop and error or errorcall will only be converted if a try.catch call is on the stack.

Error conversion is done by

<converting internal exceptions>= (<-U) [D->]
error.converter <- function() {
    raise.exception(simple.exception(geterrmessage()))
}

Defines error.converter (links are to index).

The initial setting of the two options is handled by set.error.options. This function returns the previous settings of these two options in a list. The code is a little convoluted for two reasons:

When the error option is NULL it is not in the options list, and options()$error matches the error.messages option, which is a logical.
Before show.error.messages is set the first time options()$show.error.messages returns NULL, which is not a valid value to use when setting this option.

<converting internal exceptions>+= (<-U) [<-D->]
set.error.options <- function() {
    op <- options()
    options(show.error.messages = FALSE)
    options(error = quote(error.converter()))
    if (is.logical(op$show.error.messages)) show <- op$show.error.messages
    else show <- TRUE
    if (is.logical(op$error)) err <- NULL
    else err <- op$error
    list(show.error.messages = show, error = err)
    
}

Defines set.error.options (links are to index).

During the raising of an exception, the options are set back to default values to prevent infinite recursion. I'm not sure this is really necessary, but for now I'll just do it to be a little safer.

<converting internal exceptions>+= (<-U) [<-D->]
reset.error.options <- function() {
    options(show.error.messages = TRUE)
    options(error = NULL)
}

Defines reset.error.options (links are to index).

At the end of a try.catch the original values of the options are restored by the function

<converting internal exceptions>+= (<-U) [<-D]
restore.error.options <- function(op) {
    options(op)
}

Defines restore.error.options (links are to index).

Raising and Catching Exceptions

The function raise.exception takes a single argument representing an exception and raises the exception. To avoid recursion it first calls reset.error.options to turn off the calling handler and turn on printing of error messages. This corresponds in the current internal error code to setting inError to a non-zero value. Next it converts its argument to a simple exception if it is not already an exception. This allows raise.exception to be called with a string argument, for example. Finally the exception is thrown to the innermost try.catch, or the default handler is called if no try.catch is established.

<raising and catching exceptions>= (<-U) [D->]
raise.exception <- function(e) {
    reset.error.options() # turn off to avoid recursion
    if (! inherits(e, "exception"))
        e <- simple.exception(as.character(e))
    if (exists.try.catch())
        throw("__TRY_CATCH__", list(value = e, throw = TRUE))
    else
        default.exception.handler(e)
}

exists.try.catch <- function() dynamic.exists("__TRY_CATCH__")

Defines exists.try.catch, raise.exception (links are to index).

The try.catch function uses catch to establish a target for raise.exception to throw an exception to. The catch expression is contained in a locally call that also establishes a marker used by exists.try.catch to determine whether a try.catch form is on the stack. Leaving this local dis-establishes the try.catch; it will not be seen during the remaining processing.

The result returned by the catch call will be a list with two named components. If the throw component is false, then no exception was raised and the value component is the value of the expression. If the throw component is true, then an exception did occur, and the exception object is in the value component.

If an exception is caught, then the options used for converting internal error need to be turned back on since they have been turned off by raise.exception. In addition, if the exception was converted from an internal error or a call to stop, then we need to turn off the internal setting that indicates that we are in error handling code. This is done by calling a little C routine with .Call.

Next, we search the ... argument for a handler that matches the exception. If one is found, it is called and the result is returned as the value of the try.catch call. Otherwise, we need to raise the exception again. Before raising the exception, we disable the on.exit, restore the error options and evaluate the finally expression. This needs to be done in case raise.exception calls the default handler, since that call will occur from within the try.catch call.

<raising and catching exceptions>+= (<-U) [<-D]
try.catch <- function(expr, ..., finally = NULL) {
    op <- set.error.options()
    on.exit({restore.error.options(op); finally })
    result <- local({
        "__TRY_CATCH__" <- NULL
        catch("__TRY_CATCH__", list(value = expr, throw = FALSE))
    })
    if (result$throw) {
        set.error.options() # turn back on because turned off in raise
        .Call("ResetErrorHandling") # is this the right place for this?
        handlers <- list(...)
        names <- names(handlers)
        e <- result$value
        for (i in seq(along = names))
            if (inherits(e, names[i]))
                return(handlers[[i]](e))
        on.exit()                  # to get
        restore.error.options(op)  # things reasonable
        finally                    # for stop calls going to default handler
        raise.exception(e)
    }
    else result$value
}

The C function used to reset inError is ResetErrorHandling. This approach is very brittle, since it depends on the way a particular function Rf_resetStack happens to be implemented. It would be much better to provide a hook within error.c that is intended for this purpose, but I wanted to get something going that did not require changes to R, so this will have to do for now. It is likely to break very soon as changes are made to the internal error handling code, but I will try to make modifications as necessary to keep it working.

<ResetErrorHandling definition>= (U->)
SEXP ResetErrorHandling(void)
{
    extern SEXP R_Warnings;     
    extern int R_CollectWarnings;
    extern int R_PPStackTop;

    SEXP oldWarnings = R_Warnings;
    int oldCollectWarnings = R_CollectWarnings;
    int oldPPStackTop = R_PPStackTop;

    Rf_resetStack(FALSE);

    R_PPStackTop = oldPPStackTop;
    R_Warnings = oldWarnings;
    R_CollectWarnings = oldCollectWarnings;
    return R_NilValue;
}

Defines ResetErrorHandling (links are to index).

Some Tests

<tests>+= [<-D]
try.catch(1, finally=print("Hello"))
e<-simple.exception("test exception")
raise.exception(e)
try.catch(raise.exception(e), finally=print("Hello"))
try.catch(stop("fred"), finally=print("Hello"))
try.catch(raise.exception(e), exception = function(e) e, finally=print("Hello"))
try.catch(stop("fred"),  exception = function(e) e, finally=print("Hello"))

<catch and throw>: U1, D2, D3
<converting internal exceptions>: U1, D2, D3, D4, D5
<dynamically scoped variables>: U1, D2
<exceptions>: U1, D2, D3, D4, D5
<raising and catching exceptions>: U1, D2, D3
<ResetErrorHandling definition>: U1, D2
<simpexcept.c>: D1
<simpexcept.R>: D1
<tests>: D1, D2

as.character.exception: D1
as.character.simple.exception: D1
catch: D1, U2, U3, U4, U5, U6
dynamic.exists: D1, U2
error.converter: D1, U2
exists.try.catch: D1
get.target.frame: D1
make.thrower: D1
print.exception: D1
raise.exception: U1, D2, U3, U4
ResetErrorHandling: U1, D2
reset.error.options: D1, U2
restore.error.options: D1, U2
set.error.options: D1, U2
simple.exception: D1, U2, U3, U4
throw: D1, U2, U3, U4