This note describes a simple mechanism for managing foreign references that I have added to 1.2.
It should be possible to extend this mechanism to make R reference objects that should do for some of the things John and Duncan are looking at, but there are a few technical and conceptual issues that need to be ironed out first. I'll discuss these below in Section [->]. Because of these issues I think we should hold off on this step until after 1.2.0.
<pointer ...>Their type (the value returned by
typeof
) is "externalptr"
.
Like environments and names, pointer reference objects are not copied
by duplicate
. Like any R object, they do have an attribute field.
However, as with environments, modifying this field is destructive and
thus attributes are not very useful. If you want to create an R
object that corresponds to a pointer, then you should do something
like
p <- .Call(....) # create and return pointer object object <- list(p) class(object) <- "myclass"
R_MakeExternalPtr
with three arguments, the pointer value, a tag
SEXP
, and a value to be protected. The tag can be used, for
example, to attach type information to the pointer reference. The
protected value can be used for associating with the pointer an R
object that must remain alive as long as the pointer is alive, perhaps
because the pointer points into the object. An example of this in in
Section [->].
<external pointer reference constructor>= (U->) SEXP R_MakeExternalPtr(void *p, SEXP tag, SEXP prot);
DefinesR_MakeExternalPtr
(links are to index).
Reader functions are provided to allow the pointer, tag, and protected values to be retrieved:
<external pointer reference readers>= (U->) void *R_ExternalPtrAddr(SEXP s); SEXP R_ExternalPtrTag(SEXP s); SEXP R_ExternalPtrProtected(SEXP s);
DefinesR_ExternalPtrAddr
,R_ExternalPtrProtected
,R_ExternalPtrTag
(links are to index).
In addition, we allow the pointer value to be cleared (its value is
set to NULL
) or to be given a new value. As part of finalization
it is a good idea to clear a pointer reference just in case it has
managed to get itself resurrected. Code that uses pointer references
should check for NULL
values since these can occur as a result of
clearing or save/loads. It may also occasionally be useful to be able
to change the tag or protected values of a pointer object.
<external pointer reference writers>= (U->) void R_ClearExternalPtr(SEXP s); void R_SetExternalPtrAddr(SEXP s, void *p); void R_SetExternalPtrTag(SEXP s, SEXP tag); void R_SetExternalPtrProtected(SEXP s, SEXP p);
DefinesR_ClearExternalPtr
,R_SetExternalPtrAddr
,R_SetExternalPtrProtected
(links are to index).
When a pointer object is saved in a workspace its pointer field it
saved as NULL
since pointer values are not likely to be useful
across sessions. The tag object will be retained.
Whether several saved pointers that were created with the same tag object retain this shared structure within a session or across save/loads is unspecified (currently sharing is not preserved).
Currently the protected field is preserved across save/load but without preserving sharing.
<finalization interface>= (U->) [D->] typedef void (*R_CFinalizer_t)(SEXP);
DefinesR_CFinalizer_t
(links are to index).
Finalizers are registered with
<finalization interface>+= (U->) [<-D] void R_RegisterFinalizer(SEXP s, SEXP fun); void R_RegisterCFinalizer(SEXP s, R_CFinalizer_t fun);
DefinesR_RegisterFinalizer
(links are to index).
It is an error to register an object for finalization more than once. (Currently this is not checked, but it probably should be. At the moment, if an object is registered more than once all finalizers will be run.)
It would be possible to use an expression rather than a function as the R finalizer, but then we would have to include some means of referencing the object to be finalized. Using an environment would potentially, depending on implementation details, lead to creating unintended strong links to the object, resulting in it never being collected.
The finalization function will be called sometime after the garbage collector detects that the object is no longer accessible from within R. The exact timing is not predictable. There is no guarantee that finalizers will be called before system exit, even for objects that may already have been determined to be eligible for finalization.
[The exact wording of this needs refinement, but the intention is to be in line with what Java does. Other systems may try to provide stronger guarantees, or to insure that the order in which finalizers are called has some relation to the order in which objects are created; I don't propose we do any of that.]
<reference object C interface>= R_MakeReference(SEXP val); R_ReferenceValue(val);
<reference object R interface>= make.reference <- function(val) { ... } reference.value <- function(ref) { ... } "reference.value<-" <- function(ref, val) { ... }
with make.reference
and the assignment function installing a copy
(result of applying duplicate
) of val
.
This should in principle allow some of the ideas for objects with
mutable state John and Duncan have been thinking about to be
implemented. A potential variation that might be a reference array, a
single reference object with multiple fields. You could build such a
thing as a list
of single reference objects, so maybe there is no
need for this. Some more thought is needed to come up with the right
approach here.
Implementing a simple R reference object is quite straight forward, except for one thing: save/load. For save/load to make sense, the shared structure of references would need to be preserved within a single save (probably not much you can do across multiple saves). Reference objects would need to be treated like environments and symbols in the save/load code. This is tricky enough that I'd rather not get into it at this point.
It's probably worth looking at what Java and Python serialization (called pickling in Python I think) do about shared substructure.
I'm not sure at this point what form of weak reference mechanism is
best in R. The fact that non-reference objects are copied complicates
things a bit. I think we need to have a weak reference be a pair
consisting of a reference object guard
and an arbitrary (possibly
copied) R object value
that would typically refer to the reference
object. In a file context, guard
would be the file pointer
reference object and value
might be a higher-level R
representation that contains the file pointer along with the path name
of the file. The lifetime of the reference is determined by
guard
; once guard
is no longer reachable, value
is set to
NULL
.
One possible interface would just produce weak references from a
guard
/value
pair:
<weak reference interface>= SEXP R_MakeWeakReference(SEXP value, SEXP guard); SEXP R_WeakReferenceValue(SEXP wref);
The value returned by R_WeakReferenceValue
will be either (a copy
of) the value
given to R_MakeWeakReference
or NULL
if
guard
has been collected.
Another possible approach is a weak table
<weak table interface>= SEXP R_MakeWeakTable(void); SEXP R_SetWeakTableValue(SEXP table, SEXP guard, SEXP value); SEXP R_GetWeakTableValue(SEXP table, SEXP guard);
I think we need to get some concrete cases where this functionality is needed to see which approach would work best.
Again in order for these to be useful across save/loads we need to preserve the shared structure of references.
file
.
A simple interface to the fopen
and fclose
calls could be
implemented using external pointer objects to represent file streams
and finalization to insure files are closed. The internal portions of
the interface might consist of a file file.c
and the R portions
might be in file.R
.
<file.c>= #include <stdio.h> #include "Rinternals.h" <file.c globals and macros> <file.c functions>
<file.R>= <file.R public functions> <file.R initialization function>
To allow some type checking on the file pointer, we use a symbol with
a reasonably unique name as a type tag. This symbol is stored in a
local static variable; it is initialized by calling the C level
initialization function in the package .First.lib
function.
<file.c globals and macros>= (<-U) [D->] static SEXP FILE_type_tag;
DefinesFILE_type_tag
(links are to index).
<file.c functions>= (<-U) [D->] SEXP FILE_init(void) { FILE_type_tag = install("FILE_TYPE_TAG"); return R_NilValue; }
DefinesFILE_init
(links are to index).
<file.R initialization function>= (U->) .First.lib <- function(lib, pkg) { library.dynam( "file", pkg, lib ) .Call("FILE_init") }
Defines.First.lib
(links are to index).
Checking of a file stream argument is done by the macro CHECK_FILE_STREAM
:
<file.c globals and macros>+= (<-U) [<-D] #define CHECK_FILE_STREAM(s) do { \ if (TYPEOF(s) != EXTPTRSXP || \ R_ExternalPtrTag(s) != FILE_type_tag) \ error("bad file stream"); \ } while (0)
DefinesCHECK_FILE_STREAM
(links are to index).
An alternative to using a symbol as the type identifier would be to use an arbitrary allocated object, which would then have to be stored in the precious list. The advantage would be complete uniqueness within the session; the drawback is somewhat unclear semantics across save/load.
The R function fopen
passes its file name and mode arguments along
with the R function fclose
, to be used as the finalization
function, to the C function FILE_fclose
.
<file.R public functions>= (U->) [D->] fopen <- function(name, mode = "r") { .Call("FILE_fopen", as.character(name), as.character(mode), fclose) }
Definesfopen
(links are to index).
<file.c functions>+= (<-U) [<-D->] SEXP FILE_fopen(SEXP name, SEXP mode, SEXP fun) { FILE *f = fopen(CHAR(STRING_ELT(name, 0)), CHAR(STRING_ELT(mode, 0))); if (f == NULL) return R_NilValue; else { SEXP val = R_MakeExternalPtr(f, FILE_type_tag, R_NilValue); R_RegisterFinalizer(val, fun); return val; } }
DefinesFILE_fopen
(links are to index).
If we wanted to provide a function at the R level for registering
finalizers, then the FILE_fopen
function would become
<alternate version of FILE_fopen
>=
SEXP FILE_fopen(SEXP name, SEXP mode, SEXP fun)
{
FILE *f = fopen(CHAR(STRING_ELT(name, 0)), CHAR(STRING_ELT(mode, 0)));
if (f == NULL)
return R_NilValue;
else
return R_MakeExternalPtr(f, FILE_type_tag, R_NilValue);
}
DefinesFILE_fopen
(links are to index).
and the R function fopen
would be defined as
<alternate version of fopen
>=
fopen <- function(name, mode = "r") {
s <- .Call("FILE_fopen", as.character(name), as.character(mode), fclose)
if (! is.null(s)) register.finalizer(s, fclose)
s
}
Definesfopen
(links are to index).
The R function fclose
just calls the C function FILE_fclose
:
<file.R public functions>+= (U->) [<-D->] fclose <- function(stream) { .Call("FILE_fclose", stream); }
Definesfclose
(links are to index).
The C function FILE_fclose
closes the stream and clears the
pointer unless the pointer is already NULL
, which would indicate
that the file has already been closed.
<file.c functions>+= (<-U) [<-D->] SEXP FILE_fclose(SEXP s) { FILE *f; CHECK_FILE_STREAM(s); f = R_ExternalPtrAddr(s); if (f != NULL) { fclose(f); R_ClearExternalPtr(s); } return R_NilValue; }
DefinesFILE_fclose
(links are to index).
If a file stream is closed by user code, then there is no longer any
need for finalization. But providing a mechanism for removing
finalizers is more trouble than it is worth, so the finalization
mechanism will eventually call fclose
, but nothing much will
happen since the stream pointer will have been cleared. But this
issue needs to be kept in mind in designing finalizer functions.
Yet another option for handling finalization is to use a C finalizer.
The R version of fopen
, call this one fopen1
, would then be
<file.R public functions>+= (U->) [<-D->] fopen1 <- function(name, mode = "r") { .Call("FILE_fopen1", as.character(name), as.character(mode)) }
Definesfopen1
(links are to index).
and the new version of FILE_fopen
, call it FILE_fopen1
, becomes
<file.c functions>+= (<-U) [<-D->] SEXP FILE_fopen1(SEXP name, SEXP mode) { FILE *f = fopen(CHAR(STRING_ELT(name, 0)), CHAR(STRING_ELT(mode, 0))); if (f == NULL) return R_NilValue; else { SEXP val = R_MakeExternalPtr(f, FILE_type_tag, R_NilValue); R_RegisterCFinalizer(val, (R_CFinalizer_t) FILE_fclose); return val; } }
DefinesFILE_fopen1
(links are to index).
Just to have something to do with these file pointers, we can add a simple
fgets
function that uses a fixed size buffer.
<file.R public functions>+= (U->) [<-D] fgets <- function(s) .Call("FILE_fgets", s)
Definesfgets
(links are to index).
<file.c functions>+= (<-U) [<-D] SEXP FILE_fgets(SEXP s) { char buf[512]; FILE *f; CHECK_FILE_STREAM(s); f = R_ExternalPtrAddr(s); if (f == NULL) error("file pointer is NULL"); if (fgets(buf, sizeof(buf), f) == NULL) return R_NilValue; else { SEXP val; PROTECT(val = allocVector(STRSXP, 1)); SET_STRING_ELT(val, 0, mkChar(buf)); UNPROTECT(1); return val; } }
Some examples: Load the package and open a file:
> library("file",lib.loc="lib") > f<-fopen("simpleref.nw")To see the finalization, trace
fclose
, remove the file variable,
and force a collection:
> trace(fclose) > rm(f) > gc() trace: function (stream) { .Call("FILE_fclose", stream) }(<pointer: 0x88f2bb8> ) used (Mb) gc trigger (Mb) Ncells 153747 4.2 350000 9.4 Vcells 29016 0.3 786432 6.0
Use the alternate version of fopen
, read a few lines, and close
explicitly:
> f<-fopen1("simpleref.nw") > fgets(f) [1] "% -*- mode: Noweb; noweb-code-mode: c-mode -*-\n" > fgets(f) [1] "\n" > fgets(f) [1] "\\documentclass[11pt]{article}\n" > fclose(f) NULLWith this version we can't use trace to see the finalization, but a utility like
lsof
can be used to check that it is indeed working.
If we need to allocate data for use in a C function we can do it with
malloc
and use finalization to insure it is released. Since
finalization imposes some overhead, and malloc isn't integrated with
the heap management, an alternative that may often be better is to use
the R heap. The following function (which perhaps should be part of
the interface?) allocates data in a string off the heap, creates a
pointer object for the string's data, puts the string object in the
pointer object's protected field, and returns the pointer object. As
long as the pointer object is alive, the data will be also and the
pointer will remain valid. Once the pointer object becomes
unreachable, the data will be unreachable as well and both will be
collected.
<allocate C data on the R heap>= SEXP R_AllocatePtr(size_t nmemb, size_t size, SEXP tag) { SEXP data, val; int bytes; if (INT_MAX / size < nmemb) error("allocation request is too large"); bytes = nmemb * size; PROTECT(data = allocString(bytes)); memset(CHAR(data), 0, bytes); val = R_MakeExternalPtr(CHAR(data), tag, data); UNPROTECT(1); return val; }
DefinesR_AllocatePtr
(links are to index).
xlispstat
interface, whis is
in turn based on the Tcl interface. It is descriped in a
separate
document.
SEXP
type in the defines and in the enum
.
The value used is 22---the value 21 is reserved for BCODESXP
.
<new SEXP
type define>= (U->)
#define BCODESXP 21 /* byte code */
#define EXTPTRSXP 22 /* external pointer */
DefinesEXTPTRSXP
(links are to index).
<new SEXP
enum value>= (U->)
BCODESXP = 21, /* byte code */
EXTPTRSXP = 22, /* external pointer */
DefinesBCODESXP
,EXTPTRSXP
(links are to index).
We need some macros for accessing the fields of a pointer object. It
would be better to eventually define another union member in the node
structure, but for now we'll just borrow the list cell one and put the
pointer address in the CAR
cell. The memory manager will have to
be adjusted in any case to not follow this value.
<pointer field access macros>= /* External pointer access macros */ #define EXTPTR_PTR(x) CAR(x) #define EXTPTR_PROT(x) CDR(x) #define EXTPTR_TAG(x) TAG(x)
DefinesEXTPTR_PROT
,EXTPTR_PTR
,EXTPTR_TAG
(links are to index).
Finally, we need to add the pointer interface and the finalization interface.
<Rinternals.h additions>= <newSEXP
type define> <newSEXP
enum value> /* External pointer interface */ <external pointer reference constructor> <external pointer reference readers> <external pointer reference writers> /* Finalization interface */ <finalization interface>
is.recursive
case for do_is
add a clause to return
TRUE
for external pointers (because of the tag and protected
fields it contains; this seems consistent with some other choices
here, but could go the other way too).
<coerce.c additions>= case EXTPTRSXP:
deparse2buff
, add a case for printing pointers.
<deparse.c additions>= case EXTPTRSXP: sprintf(tpb, "<pointer: %p>\n", R_ExternalPtrAddr(s)); print2buff(tpb); break;
duplicate
add a case for EXTPTRSXP
to the group of non-copied types.
<duplicate.c additions>= case EXTPTRSXP:
eval
.
<eval.c additions>= case EXTPTRSXP:
DO_CHILDREN
we need an extra case,
<memory.c additions>= [D->] case EXTPTRSXP: \ dc__action__(EXTPTR_PROT(__n__), dc__extra__); \ dc__action__(EXTPTR_TAG(__n__), dc__extra__); \ break; \
do_memoryprofile
needs to change the size of the return value and
the loop limit from 21 to 23 and needs an extra case:
<memory.c additions>+= [<-D->] SET_STRING_ELT(nms, EXTPTRSXP, mkChar("EXTPTRSXP"));
The implementation of the pointer interface:
<memory.c additions>+= [<-D->] /* External Pointer Objects */ SEXP R_MakeExternalPtr(void *p, SEXP tag, SEXP prot) { SEXP s = allocSExp(EXTPTRSXP); EXTPTR_PTR(s) = p; EXTPTR_PROT(s) = prot; EXTPTR_TAG(s) = tag; return s; } void *R_ExternalPtrAddr(SEXP s) { return EXTPTR_PTR(s); } SEXP R_ExternalPtrTag(SEXP s) { return EXTPTR_TAG(s); } SEXP R_ExternalPtrProtected(SEXP s) { return EXTPTR_PROT(s); } void R_ClearExternalPtr(SEXP s) { EXTPTR_PTR(s) = NULL; } void R_SetExternalPtrAddr(SEXP s, void *p) { EXTPTR_PTR(s) = p; } void R_SetExternalPtrTag(SEXP s, SEXP tag) { CHECK_OLD_TO_NEW(s, tag); EXTPTR_TAG(s) = tag; } void R_SetExternalPtrProtected(SEXP s, SEXP p) { CHECK_OLD_TO_NEW(s, p); EXTPTR_PROT(s) = p; }
<memory.c additions>+= [<-D->] /* Finalization */ static SEXP R_fin_registered = NULL; static void CheckFinalizers(void) { SEXP s; for (s = R_fin_registered; s != R_NilValue; s = CDR(s)) if (! NODE_IS_MARKED(CAR(s)) && s->sxpinfo.gp == 0) s->sxpinfo.gp = 1; } static Rboolean RunFinalizers(void) { volatile SEXP s, last; volatile Rboolean finalizer_run = FALSE; for (s = R_fin_registered, last = R_NilValue; s != R_NilValue;) { SEXP next = CDR(s); if (s->sxpinfo.gp != 0) { RCNTXT thiscontext; RCNTXT * volatile saveToplevelContext; volatile int savestack; volatile SEXP topExp; finalizer_run = TRUE; /* A top level context is established for the finalizer to insure that any errors that might occur do not spill into the call that triggered the collection. */ begincontext(&thiscontext, CTXT_TOPLEVEL, R_NilValue, R_GlobalEnv, R_NilValue, R_NilValue); saveToplevelContext = R_ToplevelContext; PROTECT(topExp = R_CurrentExpr); savestack = R_PPStackTop; if (! SETJMP(thiscontext.cjmpbuf)) { SEXP val, fun, e; R_GlobalContext = R_ToplevelContext = &thiscontext; /* The entry in the finalization list is removed before running the finalizer. This insures that a finalizer is run only once, even if running it raises an error. */ if (last == R_NilValue) R_fin_registered = next; else SETCDR(last, next); PROTECT(s); val = CAR(s); fun = TAG(s); if (TYPEOF(fun) == EXTPTRSXP) { /* Must be a C finalizer. */ R_CFinalizer_t cfun = R_ExternalPtrAddr(fun); cfun(val); } else { /* An R finalizer. */ PROTECT(e = LCONS(fun, LCONS(val, R_NilValue))); eval(e, R_GlobalEnv); UNPROTECT(1); } UNPROTECT(1); } endcontext(&thiscontext); R_ToplevelContext = saveToplevelContext; R_PPStackTop = savestack; R_CurrentExpr = topExp; UNPROTECT(1); } else last = s; s = next; } return finalizer_run; } void R_RegisterFinalizer(SEXP s, SEXP fun) { switch (TYPEOF(s)) { case ENVSXP: case EXTPTRSXP: switch (TYPEOF(fun)) { case CLOSXP: case BUILTINSXP: case SPECIALSXP: break; default: error("finalizer function must be a closure"); } R_fin_registered = CONS(s, R_fin_registered); SET_TAG(R_fin_registered, fun); R_fin_registered->sxpinfo.gp = 0; break; default: error("can only finalize reference objects"); } } void R_RegisterCFinalizer(SEXP s, R_CFinalizer_t fun) { /* We need to protect s since otherwise when R_MakeExternalPtr is called, its only link visible to the garbage collector might be the one in the finalization chain, resulting in it being registered as elligible for finalization. */ PROTECT(s); R_fin_registered = CONS(s, R_fin_registered); SET_TAG(R_fin_registered, R_MakeExternalPtr(fun, R_NilValue, R_NilValue)); R_fin_registered->sxpinfo.gp = 0; UNPROTECT(1); }
In addition, extract the main collector loop into a macro (since it is now needed twice):
<memory.c additions>+= [<-D->] #define PROCESS_NODES() do { \ while (forwarded_nodes != NULL) { \ s = forwarded_nodes; \ forwarded_nodes = NEXT_NODE(forwarded_nodes); \ SNAP_NODE(s, R_GenHeap[NODE_CLASS(s)].Old[NODE_GENERATION(s)]); \ R_GenHeap[NODE_CLASS(s)].OldCount[NODE_GENERATION(s)]++; \ FORWARD_CHILDREN(s); \ } \ } while (0)
and replace the old main processing loop section by
<memory.c additions>+= [<-D->] /* main processing loop */ PROCESS_NODES(); /* mark nodes ready for finalizing */ CheckFinalizers(); /* process finalizers */ FORWARD_NODE(R_fin_registered); PROCESS_NODES();
Finally, modify R_gc_internal
to run finalizers.
<memory.c additions>+= [<-D] static void R_gc_internal(int size_needed) { int vcells; double vfrac; Rboolean first = TRUE; again: gc_count++; BEGIN_SUSPEND_INTERRUPTS { gc_start_timing(); RunGenCollect(size_needed); gc_end_timing(); } END_SUSPEND_INTERRUPTS; if (gc_reporting) { REprintf("\n%d cons cells free (%d%%)\n", R_Collected, (100 * R_Collected / R_NSize)); vcells = VHEAP_FREE(); vfrac = (100.0 * vcells) / R_VSize; /* arrange for percentage to be rounded down, or we get `100% free' ! */ REprintf("%.1f Mbytes of heap free (%d%%)\n", vcells * sizeof(VECREC) / Mega, (int)vfrac); } if (first) { first = FALSE; /* Run any eligible finalizers. The return result of RunFinalizers is TRUE if any finalizers are actually run. There is a small chance that running finalizers here may chew up enough memory to make another immediate collection necessary. If so, we jump back to the beginning and run the collection, but on this second pass we do not run finalizers. */ if (RunFinalizers() && (NO_FREE_NODES() || size_needed > VHEAP_FREE())) goto again; } }
PrintValueRec
add a case to print pointers.
<print.c additions>= case EXTPTRSXP: Rprintf("<pointer: %p>\n", R_ExternalPtrAddr(s)); break;
NewMakeLists
needs a new case to follow the two heap pointer fields:
<saveload.c additions>= [D->] case EXTPTRSXP: NewMakeLists(EXTPTR_PROT(obj), sym_list, env_list); NewMakeLists(EXTPTR_TAG(obj), sym_list, env_list); break;
NewWriteItem
needs a new case for writing out pointer nodes:
<saveload.c additions>+= [<-D->] case EXTPTRSXP: NewWriteItem(EXTPTR_PROT(s), sym_list, env_list, fp); NewWriteItem(EXTPTR_TAG(s), sym_list, env_list, fp); break;
NewReadItem
needs new case for reading in pointer nodes.
<saveload.c additions>+= [<-D] case EXTPTRSXP: PROTECT(s = allocSExp(type)); R_SetExternalPtrAddr(s, NULL); R_SetExternalPtrProtected(s, NewReadItem(sym_table, env_table, fp)); R_SetExternalPtrTag(s, NewReadItem(sym_table, env_table, fp)); /*UNPROTECT(1);*/ break;
SubassignTypeFix
add s case for assigning pointers into vectors.
<subassign.c additions>= [D->] case 1922: /* vector <- eternal pointer */
In do_subassign2_dflt
add a case for assigning pointers into vectors.
<subassign.c additions>+= [<-D] case 1922: /* vector <- external pointer */
We really ought to make a macro that allows us to get rid of this 1922 stuff; something like
#define TYPEPAIR(x, y) (100 * (x) + (y))would then allow
case TYPEPAIR(VECSXP,EXTPTRSXP):
TypeTable
.
<util.c additions>= { "externalptr", EXTPTRSXP },
FILE_fopen
>: D1
fopen
>: D1
SEXP
enum value>: D1, U2
SEXP
type define>: D1, U2