% -*- mode: Noweb; noweb-code-mode: c-mode -*- \documentclass[11pt]{article} \usepackage[fullpage]{mynoweb} \noweboptions{noidentxref,longchunks,smallcode} \title{Simple References with Finalization} \author{Luke Tierney\\School of Statistics\\University of Minnesota} \begin{document} \maketitle \section{Introduction} This is a proposal for a simple mechanism to add foreign references and a finalization mechanism to R. With some caveats it should be simple enough to add this to R before the 1.2.0 release. It should be possible to extend this mechanism to make R reference objects that should do for some of the things John and Duncan are looking at, but there are a few technical and conceptual issues that need to be ironed out first. I'll discuss these below in Section \ref{Section:RRefs}. Because of these issues I think we should hold off on this step until after 1.2.0. \section{Interface} \subsection{R Level Interface} The interface for pointer objects is entirely a C level interface. From the R level these objects are opaque. They have a printed representation as \begin{verbatim} \end{verbatim} Their type (the value returned by [[typeof]]) is [["externalptr"]]. Like environments and names, pointer reference objects are not copied by [[duplicate]]. Like any R object, they do have an attribute field. However, as with environments, modifying this field is destructive and thus attributes are not very useful. If you want to create an R object that corresponds to a pointer, then you should do something like \begin{verbatim} p <- .Call(....) # create and return pointer object object <- list(p) class(object) <- "myclass" \end{verbatim} \subsection{C Level Interface} An external pointer reference is constructed by calling [[R_MakeExternalPtr]] with two arguments, the pointer value and a tag [[SEXP]]. The tag can be used, for example, to attach type information to the pointer reference. <>= SEXP R_MakeExternalPtr(void *p, SEXP tag); @ %def R_MakeExternalPtr Reader functions are provided to allow the pointer and tag values to be retrieved: <>= void *R_ExternalPtrAddr(SEXP s); SEXP R_ExternalPtrTag(SEXP s); @ %def R_ExternalPtrAddr R_ExternalPtrTag In addition, we allow the pointer value to be cleared (its value is set to [[NULL]]). (Perhaps allowing the value to be redefined arbitrarily is OK too). As part of finalization it is a good idea to clear a pointer reference just in case it has managed to get itself resurrected. Code that uses pointer references should check for [[NULL]] values since these can occur as a result of clearing or save/loads. <>= void R_ClearExternalPtr(SEXP s); @ %def R_ClearExternalPtr When a pointer object is saved in a workspace its pointer field it saved as [[NULL]] since pointer values are not likely to be useful across sessions. The tag object will be retained. Whether several saved pointers that were created with the same tag object retain this shared substructure within a session or across save/loads is unspecified. \subsection{Finalization} A finalizer can be registered for a pointer reference (and maybe eventually for a few other types, such as R reference objects). The finalizer must be an R function taking a single argument, the object to be finalized. Only one finalizer may be registered for an object. <>= void R_RegisterFinalizer(SEXP s, SEXP fun); @ %def R_RegisterFinalizer It would be possible to use an expression rather than a function here, but then we would have to include some means of referencing the object to be finalized. Using an environment would potentially, depending on implementation details, lead to creating unintended strong links to the object, resulting in it never being collected. The finalization function will be called sometime after the garbage collector detects that the object is no longer accessible from within R. The exact timing is not predictable. There is no guarantee that finalizers will be called before system exit, even for objects that may already have been determined to be eligible for finalization. [The exact wording of this needs refinement, but the intention is to be in line with what Java does. Other systems may try to provide stronger guarantees, or to insure that the order in which finalizers are called has some relation to the order in which objects are created; I don't propose we do any of that.] \subsection{An Example} [None of this is tested, so the chances that any of this would actually work without modification are minimal--it should capture the essence though.] A simple interface to the [[fopen]] and [[fclose]] calls could be implemented using external pointer objects to represent file streams and finalization to insure files are closed. The internal portions of the interface might consist of a file [[file.c]] and the R portions might be in [[file.R]]. <>= #include #include "Rinternals.h" <> <> @ %def <>= <> <> @ %def To allow some type checking on the file pointer, we use a symbol with a reasonably unique name as a type tag. This symbol is stored in a local static variable; it is initialized by calling the C level initialization function in the package [[.First.lib]] function. <>= static SEXP FILE_type_tag; @ %def FILE_type_tag <>= SEXP FILE_init(void) { FILE_type_tag = install("FILE_TYPE_TAG"); } @ %def FILE_init <>= .First.lib <- function(lib, pkg) { library.dynam( "file", pkg, lib ) .Call("FILE_init") } @ %def .First.lib Checking of a file stream argument is done by the macro [[CHECK_FILE_STREAM]]: <>= #define CHECK_FILE_STREAM(s) do { \ if (TYPEOF(s) != EXTPTRSXP || \ R_ExternalPtrTag(SEXP s) != FILE_type_tag) \ error("bad file stream"); \ } while (0) @ %def CHECK_FILE_STREAM An alternative to using a symbol as the type identifier would be to use an arbitrary allocated object, which would then have to be stored in the precious list. The advantage would be complete uniqueness within the session; the drawback is somewhat unclear semantics across save/load. The R function [[fopen]] passes its file name and mode arguments along with the R function [[fclose]], to be used as the finalization function, to the C function [[FILE_fclose]]. <>= fopen <- function(name, mode = "r") { .Call("FILE_fopen", as.character(name), as.character(mode), fclose) } @ %def fopen <>= SEXP FILE_fopen(SEXP name, SEXP mode, SEXP fun) { FILE *f = fopen(CHAR(STRING_ELT(name, 0)), CHAR(STRING_ELT(mode, 0))); if (f == NULL) return R_NilValue; else { SEXP val = R_MakeExternalPtr(f, FILE_type_tag); R_RegisterFinalizer(f, fun); } } @ %def FILE_fopen If we wanted to provide a function at the R level for registering finalizers, then the [[FILE_fopen]] function would become <>= SEXP FILE_fopen(SEXP name, SEXP mode, SEXP fun) { FILE *f = fopen(CHAR(STRING_ELT(name, 0)), CHAR(STRING_ELT(mode, 0))); if (f == NULL) return R_NilValue; else return R_MakeExternalPtr(f, FILE_type_tag); } @ %def FILE_fopen and the R function [[fopen]] would be defined as <>= fopen <- function(name, mode = "r") { s <- .Call("FILE_fopen", as.character(name), as.character(mode), fclose) if (! is.null(s)) register.finalizer(s, fclose) s } @ %def fopen The R function [[fclose]] just calls the C function [[FILE_fclose]]: <>= fclose <- function(stream) { .Call("FILE_fclose", stream); } @ %def fclose The C function [[FILE_fclose]] closes the stream and clears the pointer unless the pointer is already [[NULL]], which would indicate that the file has already been closed. <>= SEXP FILE_fclose(SEXP s) { FILE *f; CHECK_FILE_STREAM(s); f = R_ExternalPtrAddr(s); if (f != NULL) { fclose(f); R_ClearExternalPtr(s); } return R_NilValue; } @ %def FILE_fclose If a file stream is closed by user code, then there is no longer any need for finalization. But providing a mechanism for removing finalizers is more trouble than it is worth, so the finalization will eventually call [[fclose]], but nothing much will happen since the stream pointer will have been cleared. But this issue needs to be kept in mind in designing finalizer functions. Just to have something to do with these pointers, we can add a simple [[fgets]] function that uses a fixed size buffer. <>= fgets <- function(s) .Call("FILE_fgets", s) @ %def fgets <>= SEXP FILE_fgets(SEXP s) { char buf[512]; CHECK_FILE_STREAM(s); if (fgets(buf, sizeof(buf), R_ExternalPtrAddr(s)) == NULL) return R_NilValue; else { SEXP val; PROTECT(val = allocString(1)); SET_STRING_ELT(val, 0, mkChar(buf)); UNPROTECT(1); return val; } } @ %def \subsection{Weak References} It may also be possible to add a weak reference mechanism at this point. Initially it would only be usable with pointer references, but eventually it could be used with R reference objects too. But there are some tricky issues; maybe it would be better to think just in terms of a weak table mechanism, to associate an ordinary R object with the lifetime of a reference object. An example of the setting where you might want this is keeping track of the names of open files without this making the file stay open after it is otherwise unreachable. \section{Implementation} [These are just some sketchy notes--I'll flesh it out if this seems like a reasonable way to go.] Need to add a new [[SEXP]] type. Add support to printing, save/restore, and subassign (I think)--just look at where byte code stuff needed to go. For finalization, do same thing as in xlispstat collector. Need to figure out where to run finalizers, how trap errors. \section{R Reference Objects} \label{Section:RRefs} [This is still just a sketch.] R reference objects would be entities that are passed by reference (not copied) and contain a single R object (possibly another reference). Main issue: save/load needs to preserve sharing to some degree. This is a bit messy to add; also not obvious what the right thing to to is for multiple saves. Should look at Java and the like for serialization. \end{document}