Simple References with Finalization

Luke Tierney
School of Statistics
University of Minnesota

Introduction
Interface

R Level Interface
C Level Interface
Finalization
R Reference Objects
Weak References

Examples

A Simple File Stream Interface
Allocating C Data On The R Heap
Regular Expression Interface

Implementation

Changes to Rinternals.h (1.43)
Changes to coerce.c (1.81)
Changes to deparse.c (1.41)
Changes to duplicate.c (1.19)
Changes to eval.c (1.98)
Changes to memory.c (1.85)

Pointer Object Changes
Finalization Changes

Changes to print.c (1.50)
Changes to saveload.c (1.58)
Changes to subassign.c (1.62)
Changes to util.c (1.47)

Introduction

This note describes a simple mechanism for managing foreign references that I have added to 1.2.

It should be possible to extend this mechanism to make R reference objects that should do for some of the things John and Duncan are looking at, but there are a few technical and conceptual issues that need to be ironed out first. I'll discuss these below in Section [->]. Because of these issues I think we should hold off on this step until after 1.2.0.

Interface

R Level Interface

The interface for pointer objects is entirely a C level interface. From the R level these objects are opaque. They have a printed representation as

<pointer ...>

Their type (the value returned by typeof) is "externalptr".

Like environments and names, pointer reference objects are not copied by duplicate. Like any R object, they do have an attribute field. However, as with environments, modifying this field is destructive and thus attributes are not very useful. If you want to create an R object that corresponds to a pointer, then you should do something like

p <- .Call(....) # create and return pointer object
object <- list(p)
class(object) <- "myclass"

C Level Interface

An external pointer reference is constructed by calling R_MakeExternalPtr with three arguments, the pointer value, a tag SEXP, and a value to be protected. The tag can be used, for example, to attach type information to the pointer reference. The protected value can be used for associating with the pointer an R object that must remain alive as long as the pointer is alive, perhaps because the pointer points into the object. An example of this in in Section [->].

<external pointer reference constructor>= (U->)
SEXP R_MakeExternalPtr(void *p, SEXP tag, SEXP prot);

Defines R_MakeExternalPtr (links are to index).

Reader functions are provided to allow the pointer, tag, and protected values to be retrieved:

<external pointer reference readers>= (U->)
void *R_ExternalPtrAddr(SEXP s);
SEXP R_ExternalPtrTag(SEXP s);
SEXP R_ExternalPtrProtected(SEXP s);

Defines R_ExternalPtrAddr, R_ExternalPtrProtected, R_ExternalPtrTag (links are to index).

In addition, we allow the pointer value to be cleared (its value is set to NULL) or to be given a new value. As part of finalization it is a good idea to clear a pointer reference just in case it has managed to get itself resurrected. Code that uses pointer references should check for NULL values since these can occur as a result of clearing or save/loads. It may also occasionally be useful to be able to change the tag or protected values of a pointer object.

<external pointer reference writers>= (U->)
void R_ClearExternalPtr(SEXP s);
void R_SetExternalPtrAddr(SEXP s, void *p);
void R_SetExternalPtrTag(SEXP s, SEXP tag);
void R_SetExternalPtrProtected(SEXP s, SEXP p);

Defines R_ClearExternalPtr, R_SetExternalPtrAddr, R_SetExternalPtrProtected (links are to index).

When a pointer object is saved in a workspace its pointer field it saved as NULL since pointer values are not likely to be useful across sessions. The tag object will be retained.

Whether several saved pointers that were created with the same tag object retain this shared structure within a session or across save/loads is unspecified (currently sharing is not preserved).

Currently the protected field is preserved across save/load but without preserving sharing.

Finalization

A finalizer can be registered for a pointer reference or an environment (and maybe eventually for a few other types, such as R reference objects). The finalizer can be an R function taking a single argument, the object to be finalized. It can also be a C function with the function prototype corresponding to

<finalization interface>= (U->) [D->]
typedef void (*R_CFinalizer_t)(SEXP);

Defines R_CFinalizer_t (links are to index).

Finalizers are registered with

<finalization interface>+= (U->) [<-D]
void R_RegisterFinalizer(SEXP s, SEXP fun);
void R_RegisterCFinalizer(SEXP s, R_CFinalizer_t fun);

Defines R_RegisterFinalizer (links are to index).

It is an error to register an object for finalization more than once. (Currently this is not checked, but it probably should be. At the moment, if an object is registered more than once all finalizers will be run.)

It would be possible to use an expression rather than a function as the R finalizer, but then we would have to include some means of referencing the object to be finalized. Using an environment would potentially, depending on implementation details, lead to creating unintended strong links to the object, resulting in it never being collected.

The finalization function will be called sometime after the garbage collector detects that the object is no longer accessible from within R. The exact timing is not predictable. There is no guarantee that finalizers will be called before system exit, even for objects that may already have been determined to be eligible for finalization.

[The exact wording of this needs refinement, but the intention is to be in line with what Java does. Other systems may try to provide stronger guarantees, or to insure that the order in which finalizers are called has some relation to the order in which objects are created; I don't propose we do any of that.]

R Reference Objects

[*] R reference objects would be entities that are passed by reference (not copied) and contain a single R object (possibly another reference). I'm not sure there is much need for a tag or a protected field here, so let's not bother for now (the space is there, but no point if they are not needed). The interface might be

<reference object C interface>=
R_MakeReference(SEXP val);
R_ReferenceValue(val);

The R interface might be

<reference object R interface>=
make.reference <- function(val) { ... }
reference.value <- function(ref) { ... }
"reference.value<-" <- function(ref, val) { ... }

with make.reference and the assignment function installing a copy (result of applying duplicate) of val.

This should in principle allow some of the ideas for objects with mutable state John and Duncan have been thinking about to be implemented. A potential variation that might be a reference array, a single reference object with multiple fields. You could build such a thing as a list of single reference objects, so maybe there is no need for this. Some more thought is needed to come up with the right approach here.

Implementing a simple R reference object is quite straight forward, except for one thing: save/load. For save/load to make sense, the shared structure of references would need to be preserved within a single save (probably not much you can do across multiple saves). Reference objects would need to be treated like environments and symbols in the save/load code. This is tricky enough that I'd rather not get into it at this point.

It's probably worth looking at what Java and Python serialization (called pickling in Python I think) do about shared substructure.

Weak References

There are some situations where you want to keep track of some aspect of live objects but without the fact that you are keeping track of them forcing them to remain live. One example is a file interface where you want to be able to produce a list of all open files but without preventing open files that have become unreachable from being collected and finalized. Another example is that some external interfaces are a lot happier if all connections to them are closed when the program shuts down. To do that we need to be able to get a list of the active connections, but we also want connections that have become unreachable to be reclaimed. Weak references are a way to handle this.

I'm not sure at this point what form of weak reference mechanism is best in R. The fact that non-reference objects are copied complicates things a bit. I think we need to have a weak reference be a pair consisting of a reference object guard and an arbitrary (possibly copied) R object value that would typically refer to the reference object. In a file context, guard would be the file pointer reference object and value might be a higher-level R representation that contains the file pointer along with the path name of the file. The lifetime of the reference is determined by guard; once guard is no longer reachable, value is set to NULL.

One possible interface would just produce weak references from a guard/value pair:

<weak reference interface>=
SEXP R_MakeWeakReference(SEXP value, SEXP guard);
SEXP R_WeakReferenceValue(SEXP wref);

The value returned by R_WeakReferenceValue will be either (a copy of) the value given to R_MakeWeakReference or NULL if guard has been collected.

Another possible approach is a weak table

<weak table interface>=
SEXP R_MakeWeakTable(void);
SEXP R_SetWeakTableValue(SEXP table, SEXP guard, SEXP value);
SEXP R_GetWeakTableValue(SEXP table, SEXP guard);

I think we need to get some concrete cases where this functionality is needed to see which approach would work best.

Again in order for these to be useful across save/loads we need to preserve the shared structure of references.

Examples

A Simple File Stream Interface

This example is available as a package file.

A simple interface to the fopen and fclose calls could be implemented using external pointer objects to represent file streams and finalization to insure files are closed. The internal portions of the interface might consist of a file file.c and the R portions might be in file.R.

<file.c>=
#include <stdio.h>
#include "Rinternals.h"
<file.c globals and macros>
<file.c functions>

<file.R>=
<file.R public functions>
<file.R initialization function>

To allow some type checking on the file pointer, we use a symbol with a reasonably unique name as a type tag. This symbol is stored in a local static variable; it is initialized by calling the C level initialization function in the package .First.lib function.

<file.c globals and macros>= (<-U) [D->]
static SEXP FILE_type_tag;

Defines FILE_type_tag (links are to index).

<file.c functions>= (<-U) [D->]
SEXP FILE_init(void)
{
    FILE_type_tag = install("FILE_TYPE_TAG");
    return R_NilValue;
}

Defines FILE_init (links are to index).

<file.R initialization function>= (U->)
.First.lib <- function(lib, pkg) {
    library.dynam( "file", pkg, lib )
    .Call("FILE_init")
}

Defines .First.lib (links are to index).

Checking of a file stream argument is done by the macro CHECK_FILE_STREAM:

<file.c globals and macros>+= (<-U) [<-D]
#define CHECK_FILE_STREAM(s) do { \
    if (TYPEOF(s) != EXTPTRSXP || \
        R_ExternalPtrTag(s) != FILE_type_tag) \
        error("bad file stream"); \
} while (0)

Defines CHECK_FILE_STREAM (links are to index).

An alternative to using a symbol as the type identifier would be to use an arbitrary allocated object, which would then have to be stored in the precious list. The advantage would be complete uniqueness within the session; the drawback is somewhat unclear semantics across save/load.

The R function fopen passes its file name and mode arguments along with the R function fclose, to be used as the finalization function, to the C function FILE_fclose.

<file.R public functions>= (U->) [D->]
fopen <- function(name, mode = "r") {
    .Call("FILE_fopen", as.character(name), as.character(mode), fclose)
}

Defines fopen (links are to index).

<file.c functions>+= (<-U) [<-D->]
SEXP FILE_fopen(SEXP name, SEXP mode, SEXP fun)
{
    FILE *f = fopen(CHAR(STRING_ELT(name, 0)), CHAR(STRING_ELT(mode, 0)));
    if (f == NULL)
        return R_NilValue;
    else {
        SEXP val = R_MakeExternalPtr(f, FILE_type_tag, R_NilValue);
        R_RegisterFinalizer(val, fun);
        return val;
    }
}

Defines FILE_fopen (links are to index).

If we wanted to provide a function at the R level for registering finalizers, then the FILE_fopen function would become

<alternate version of FILE_fopen>=
SEXP FILE_fopen(SEXP name, SEXP mode, SEXP fun)
{
    FILE *f = fopen(CHAR(STRING_ELT(name, 0)), CHAR(STRING_ELT(mode, 0)));
    if (f == NULL)
        return R_NilValue;
    else 
        return R_MakeExternalPtr(f, FILE_type_tag, R_NilValue);
}

Defines FILE_fopen (links are to index).

and the R function fopen would be defined as

<alternate version of fopen>=
fopen <- function(name, mode = "r") {
    s <- .Call("FILE_fopen", as.character(name), as.character(mode), fclose)
    if (! is.null(s)) register.finalizer(s, fclose)
    s
}

Defines fopen (links are to index).

The R function fclose just calls the C function FILE_fclose:

<file.R public functions>+= (U->) [<-D->]
fclose <- function(stream) {
    .Call("FILE_fclose", stream);
}

Defines fclose (links are to index).

The C function FILE_fclose closes the stream and clears the pointer unless the pointer is already NULL, which would indicate that the file has already been closed.

<file.c functions>+= (<-U) [<-D->]
SEXP FILE_fclose(SEXP s)
{
    FILE *f;
    CHECK_FILE_STREAM(s);
    f = R_ExternalPtrAddr(s);
    if (f != NULL) {
        fclose(f);
        R_ClearExternalPtr(s);
    }
    return R_NilValue;
}

Defines FILE_fclose (links are to index).

If a file stream is closed by user code, then there is no longer any need for finalization. But providing a mechanism for removing finalizers is more trouble than it is worth, so the finalization mechanism will eventually call fclose, but nothing much will happen since the stream pointer will have been cleared. But this issue needs to be kept in mind in designing finalizer functions.

Yet another option for handling finalization is to use a C finalizer. The R version of fopen, call this one fopen1, would then be

<file.R public functions>+= (U->) [<-D->]
fopen1 <- function(name, mode = "r") {
    .Call("FILE_fopen1", as.character(name), as.character(mode))
}

Defines fopen1 (links are to index).

and the new version of FILE_fopen, call it FILE_fopen1, becomes

<file.c functions>+= (<-U) [<-D->]
SEXP FILE_fopen1(SEXP name, SEXP mode)
{
    FILE *f = fopen(CHAR(STRING_ELT(name, 0)), CHAR(STRING_ELT(mode, 0)));
    if (f == NULL)
        return R_NilValue;
    else {
        SEXP val = R_MakeExternalPtr(f, FILE_type_tag, R_NilValue);
        R_RegisterCFinalizer(val, (R_CFinalizer_t) FILE_fclose);
        return val;
    }
}

Defines FILE_fopen1 (links are to index).

Just to have something to do with these file pointers, we can add a simple fgets function that uses a fixed size buffer.

<file.R public functions>+= (U->) [<-D]
fgets <- function(s) .Call("FILE_fgets", s)

Defines fgets (links are to index).

<file.c functions>+= (<-U) [<-D]
SEXP FILE_fgets(SEXP s)
{
    char buf[512];
    FILE *f;
    CHECK_FILE_STREAM(s);
    f = R_ExternalPtrAddr(s);
    if (f == NULL)
        error("file pointer is NULL");
    if (fgets(buf, sizeof(buf), f) == NULL)
        return R_NilValue;
    else {
        SEXP val;
        PROTECT(val = allocVector(STRSXP, 1));
        SET_STRING_ELT(val, 0, mkChar(buf));
        UNPROTECT(1);
        return val;
    }
}

Some examples: Load the package and open a file:

> library("file",lib.loc="lib")
> f<-fopen("simpleref.nw")

To see the finalization, trace fclose, remove the file variable, and force a collection:

> trace(fclose)
> rm(f)
> gc()
trace: function (stream) 
{
    .Call("FILE_fclose", stream)
}(<pointer: 0x88f2bb8>
)
         used (Mb) gc trigger (Mb)
Ncells 153747  4.2     350000  9.4
Vcells  29016  0.3     786432  6.0

Use the alternate version of fopen, read a few lines, and close explicitly:

> f<-fopen1("simpleref.nw")
> fgets(f)
[1] "% -*- mode: Noweb; noweb-code-mode: c-mode -*-\n"
> fgets(f)
[1] "\n"
> fgets(f)
[1] "\\documentclass[11pt]{article}\n"
> fclose(f)
NULL

With this version we can't use trace to see the finalization, but a utility like lsof can be used to check that it is indeed working.

Allocating C Data On The R Heap

[*] [This example hasn't been tested yet, so it may not work exactly as shown, but the idea is there.]

If we need to allocate data for use in a C function we can do it with malloc and use finalization to insure it is released. Since finalization imposes some overhead, and malloc isn't integrated with the heap management, an alternative that may often be better is to use the R heap. The following function (which perhaps should be part of the interface?) allocates data in a string off the heap, creates a pointer object for the string's data, puts the string object in the pointer object's protected field, and returns the pointer object. As long as the pointer object is alive, the data will be also and the pointer will remain valid. Once the pointer object becomes unreachable, the data will be unreachable as well and both will be collected.

<allocate C data on the R heap>=
SEXP R_AllocatePtr(size_t nmemb, size_t size, SEXP tag)
{
    SEXP data, val;
    int bytes;
    if (INT_MAX / size < nmemb)
        error("allocation request is too large");
    bytes = nmemb * size;
    PROTECT(data = allocString(bytes));
    memset(CHAR(data), 0, bytes);
    val = R_MakeExternalPtr(CHAR(data), tag, data);
    UNPROTECT(1);
    return val;
}

Defines R_AllocatePtr (links are to index).

Regular Expression Interface

A more extended example is provided by an interface to POSIX regular expressions. This is a port of the xlispstat interface, whis is in turn based on the Tcl interface. It is descriped in a separate document.

Implementation

Changes to Rinternals.h (1.43)

We need to add a new SEXP type in the defines and in the enum. The value used is 22---the value 21 is reserved for BCODESXP.

<new SEXP type define>= (U->)
#define BCODESXP    21    /* byte code */
#define EXTPTRSXP   22    /* external pointer */

Defines EXTPTRSXP (links are to index).

<new SEXP enum value>= (U->)
BCODESXP    = 21,   /* byte code */
EXTPTRSXP   = 22,   /* external pointer */

Defines BCODESXP, EXTPTRSXP (links are to index).

We need some macros for accessing the fields of a pointer object. It would be better to eventually define another union member in the node structure, but for now we'll just borrow the list cell one and put the pointer address in the CAR cell. The memory manager will have to be adjusted in any case to not follow this value.

<pointer field access macros>=
/* External pointer access macros */
#define EXTPTR_PTR(x)   CAR(x)
#define EXTPTR_PROT(x)  CDR(x)
#define EXTPTR_TAG(x)   TAG(x)

Defines EXTPTR_PROT, EXTPTR_PTR, EXTPTR_TAG (links are to index).

Finally, we need to add the pointer interface and the finalization interface.

<Rinternals.h additions>=
<new SEXP type define>
<new SEXP enum value>

/* External pointer interface */
<external pointer reference constructor>
<external pointer reference readers>
<external pointer reference writers>

/* Finalization interface */
<finalization interface>

Changes to coerce.c (1.81)

In the is.recursive case for do_is add a clause to return TRUE for external pointers (because of the tag and protected fields it contains; this seems consistent with some other choices here, but could go the other way too).

<coerce.c additions>=
case EXTPTRSXP:

Changes to deparse.c (1.41)

In deparse2buff, add a case for printing pointers.

<deparse.c additions>=
case EXTPTRSXP:
    sprintf(tpb, "<pointer: %p>\n", R_ExternalPtrAddr(s));
    print2buff(tpb);
    break;

Changes to duplicate.c (1.19)

In duplicate add a case for EXTPTRSXP to the group of non-copied types.

<duplicate.c additions>=
case EXTPTRSXP:

Changes to eval.c (1.98)

Need to add self-evaluating case in eval.

<eval.c additions>=
case EXTPTRSXP:

Changes to memory.c (1.85)

Pointer Object Changes

In DO_CHILDREN we need an extra case,

<memory.c additions>= [D->]
case EXTPTRSXP: \
  dc__action__(EXTPTR_PROT(__n__), dc__extra__); \
  dc__action__(EXTPTR_TAG(__n__), dc__extra__); \
  break; \

do_memoryprofile needs to change the size of the return value and the loop limit from 21 to 23 and needs an extra case:

<memory.c additions>+= [<-D->]
SET_STRING_ELT(nms, EXTPTRSXP, mkChar("EXTPTRSXP"));

The implementation of the pointer interface:

<memory.c additions>+= [<-D->]
/* External Pointer Objects */
SEXP R_MakeExternalPtr(void *p, SEXP tag, SEXP prot)
{
    SEXP s = allocSExp(EXTPTRSXP);
    EXTPTR_PTR(s) = p;
    EXTPTR_PROT(s) = prot;
    EXTPTR_TAG(s) = tag;
    return s;
}

void *R_ExternalPtrAddr(SEXP s)
{
    return EXTPTR_PTR(s);
}

SEXP R_ExternalPtrTag(SEXP s)
{
    return EXTPTR_TAG(s);
}

SEXP R_ExternalPtrProtected(SEXP s)
{
    return EXTPTR_PROT(s);
}

void R_ClearExternalPtr(SEXP s)
{
    EXTPTR_PTR(s) = NULL;
}

void R_SetExternalPtrAddr(SEXP s, void *p)
{
    EXTPTR_PTR(s) = p;
}

void R_SetExternalPtrTag(SEXP s, SEXP tag)
{
    CHECK_OLD_TO_NEW(s, tag);
    EXTPTR_TAG(s) = tag;
}

void R_SetExternalPtrProtected(SEXP s, SEXP p)
{
    CHECK_OLD_TO_NEW(s, p);
    EXTPTR_PROT(s) = p;
}

Finalization Changes

The finalization implementation:

<memory.c additions>+= [<-D->]
/* Finalization */

static SEXP R_fin_registered = NULL;

static void CheckFinalizers(void)
{
    SEXP s;
    for (s = R_fin_registered; s != R_NilValue; s = CDR(s))
        if (! NODE_IS_MARKED(CAR(s)) && s->sxpinfo.gp == 0)
            s->sxpinfo.gp = 1;
}

static Rboolean RunFinalizers(void)
{
    volatile SEXP s, last;
    volatile Rboolean finalizer_run = FALSE;

    for (s = R_fin_registered, last = R_NilValue; s != R_NilValue;) {
        SEXP next = CDR(s);
        if (s->sxpinfo.gp != 0) {
            RCNTXT thiscontext;
            RCNTXT * volatile saveToplevelContext;
            volatile int savestack;
            volatile SEXP topExp;

            finalizer_run = TRUE;

            /* A top level context is established for the finalizer to
               insure that any errors that might occur do not spill
               into the call that triggered the collection. */
            begincontext(&thiscontext, CTXT_TOPLEVEL, R_NilValue, R_GlobalEnv,
                         R_NilValue, R_NilValue);
            saveToplevelContext = R_ToplevelContext;
            PROTECT(topExp = R_CurrentExpr);
            savestack = R_PPStackTop;
            if (! SETJMP(thiscontext.cjmpbuf)) {
                SEXP val, fun, e;
                R_GlobalContext = R_ToplevelContext = &thiscontext;

                /* The entry in the finalization list is removed
                   before running the finalizer.  This insures that a
                   finalizer is run only once, even if running it
                   raises an error. */
                if (last == R_NilValue)
                    R_fin_registered = next;
                else
                    SETCDR(last, next);
                PROTECT(s);
                val = CAR(s);
                fun = TAG(s);
                if (TYPEOF(fun) == EXTPTRSXP) {
                    /* Must be a C finalizer. */
                    R_CFinalizer_t cfun = R_ExternalPtrAddr(fun);
                    cfun(val);
                }
                else {
                    /* An R finalizer. */
                    PROTECT(e = LCONS(fun, LCONS(val, R_NilValue)));
                    eval(e, R_GlobalEnv);
                    UNPROTECT(1);
                }
                UNPROTECT(1);
            }
            endcontext(&thiscontext);
            R_ToplevelContext = saveToplevelContext;
            R_PPStackTop = savestack;
            R_CurrentExpr = topExp;
            UNPROTECT(1);
        }
        else last = s;
        s = next;
    }
    return finalizer_run;
}

void R_RegisterFinalizer(SEXP s, SEXP fun)
{
    switch (TYPEOF(s)) {
    case ENVSXP:
    case EXTPTRSXP:
        switch (TYPEOF(fun)) {
        case CLOSXP:
        case BUILTINSXP:
        case SPECIALSXP:
            break;
        default:
            error("finalizer function must be a closure");
        }
        R_fin_registered = CONS(s, R_fin_registered);
        SET_TAG(R_fin_registered, fun);
        R_fin_registered->sxpinfo.gp = 0;
        break;
    default: error("can only finalize reference objects");
    }
}

void R_RegisterCFinalizer(SEXP s, R_CFinalizer_t fun)
{
    /* We need to protect s since otherwise when R_MakeExternalPtr is
       called, its only link visible to the garbage collector might be
       the one in the finalization chain, resulting in it being
       registered as elligible for finalization. */
    PROTECT(s);
    R_fin_registered = CONS(s, R_fin_registered);
    SET_TAG(R_fin_registered, R_MakeExternalPtr(fun, R_NilValue, R_NilValue));
    R_fin_registered->sxpinfo.gp = 0;
    UNPROTECT(1);
}

In addition, extract the main collector loop into a macro (since it is now needed twice):

<memory.c additions>+= [<-D->]
#define PROCESS_NODES() do { \
    while (forwarded_nodes != NULL) { \
        s = forwarded_nodes; \
        forwarded_nodes = NEXT_NODE(forwarded_nodes); \
        SNAP_NODE(s, R_GenHeap[NODE_CLASS(s)].Old[NODE_GENERATION(s)]); \
        R_GenHeap[NODE_CLASS(s)].OldCount[NODE_GENERATION(s)]++; \
        FORWARD_CHILDREN(s); \
    } \
} while (0)

and replace the old main processing loop section by

<memory.c additions>+= [<-D->]
/* main processing loop */
PROCESS_NODES();

/* mark nodes ready for finalizing */
CheckFinalizers();
    
/* process finalizers */
FORWARD_NODE(R_fin_registered);
PROCESS_NODES();

Finally, modify R_gc_internal to run finalizers.

<memory.c additions>+= [<-D]
static void R_gc_internal(int size_needed)
{
    int vcells;
    double vfrac;
    Rboolean first = TRUE;

 again:

    gc_count++;

    BEGIN_SUSPEND_INTERRUPTS {
      gc_start_timing();
      RunGenCollect(size_needed);
      gc_end_timing();
    } END_SUSPEND_INTERRUPTS;

    if (gc_reporting) {
        REprintf("\n%d cons cells free (%d%%)\n",
                 R_Collected, (100 * R_Collected / R_NSize));
        vcells = VHEAP_FREE();
        vfrac = (100.0 * vcells) / R_VSize;
        /* arrange for percentage to be rounded down, or we get
           `100% free' ! */
        REprintf("%.1f Mbytes of heap free (%d%%)\n",
                 vcells * sizeof(VECREC) / Mega, (int)vfrac);
    }

    if (first) {
        first = FALSE;
        /* Run any eligible finalizers.  The return result of
           RunFinalizers is TRUE if any finalizers are actually run.
           There is a small chance that running finalizers here may
           chew up enough memory to make another immediate collection
           necessary.  If so, we jump back to the beginning and run
           the collection, but on this second pass we do not run
           finalizers. */
        if (RunFinalizers() &&
            (NO_FREE_NODES() || size_needed > VHEAP_FREE()))
            goto again;
    }
}

Changes to print.c (1.50)

In PrintValueRec add a case to print pointers.

<print.c additions>=
case EXTPTRSXP:
    Rprintf("<pointer: %p>\n", R_ExternalPtrAddr(s));
    break;

Changes to saveload.c (1.58)

NewMakeLists needs a new case to follow the two heap pointer fields:

<saveload.c additions>= [D->]
case EXTPTRSXP:
    NewMakeLists(EXTPTR_PROT(obj), sym_list, env_list);
    NewMakeLists(EXTPTR_TAG(obj), sym_list, env_list);
    break;

NewWriteItem needs a new case for writing out pointer nodes:

<saveload.c additions>+= [<-D->]
case EXTPTRSXP:
    NewWriteItem(EXTPTR_PROT(s), sym_list, env_list, fp);
    NewWriteItem(EXTPTR_TAG(s), sym_list, env_list, fp);
    break;

NewReadItem needs new case for reading in pointer nodes.

<saveload.c additions>+= [<-D]
case EXTPTRSXP:
    PROTECT(s = allocSExp(type));
    R_SetExternalPtrAddr(s, NULL);
    R_SetExternalPtrProtected(s, NewReadItem(sym_table, env_table, fp));
    R_SetExternalPtrTag(s, NewReadItem(sym_table, env_table, fp));
    /*UNPROTECT(1);*/
    break;

Changes to subassign.c (1.62)

In SubassignTypeFix add s case for assigning pointers into vectors.

<subassign.c additions>= [D->]
case 1922:  /* vector     <- eternal pointer */

In do_subassign2_dflt add a case for assigning pointers into vectors.

<subassign.c additions>+= [<-D]
case 1922:  /* vector     <- external pointer */

We really ought to make a macro that allows us to get rid of this 1922 stuff; something like

#define TYPEPAIR(x, y) (100 * (x) + (y))

would then allow

case TYPEPAIR(VECSXP,EXTPTRSXP):

Changes to util.c (1.47)

Add external pointer entry to TypeTable.

<util.c additions>=
{ "externalptr",        EXTPTRSXP  },

<allocate C data on the R heap>: D1
<alternate version of FILE_fopen>: D1
<alternate version of fopen>: D1
<coerce.c additions>: D1
<deparse.c additions>: D1
<duplicate.c additions>: D1
<eval.c additions>: D1
<external pointer reference constructor>: D1, U2
<external pointer reference readers>: D1, U2
<external pointer reference writers>: D1, U2
<file.c>: D1
<file.c functions>: U1, D2, D3, D4, D5, D6
<file.c globals and macros>: U1, D2, D3
<file.R>: D1
<file.R initialization function>: U1, D2
<file.R public functions>: U1, D2, D3, D4, D5
<finalization interface>: D1, D2, U3
<memory.c additions>: D1, D2, D3, D4, D5, D6, D7
<new SEXP enum value>: D1, U2
<new SEXP type define>: D1, U2
<pointer field access macros>: D1
<print.c additions>: D1
<reference object C interface>: D1
<reference object R interface>: D1
<Rinternals.h additions>: D1
<saveload.c additions>: D1, D2, D3
<subassign.c additions>: D1, D2
<util.c additions>: D1
<weak reference interface>: D1
<weak table interface>: D1

.First.lib: D1
BCODESXP: U1, D2
CHECK_FILE_STREAM: D1, U2, U3
EXTPTR_PROT: D1, U2, U3, U4, U5
EXTPTR_PTR: D1, U2
EXTPTR_TAG: D1, U2, U3, U4, U5
EXTPTRSXP: U1, D2, D3, U4, U5, U6, U7, U8, U9, U10, U11, U12, U13, U14, U15, U16
fclose: U1, U2, D3, U4
fgets: D1, U2
FILE_fclose: U1, D2, U3
FILE_fopen: U1, D2, D3, U4
FILE_fopen1: U1, D2
FILE_init: D1, U2
FILE_type_tag: D1, U2, U3, U4, U5, U6
fopen: D1, U2, U3, D4, U5
fopen1: D1
R_AllocatePtr: D1
R_CFinalizer_t: D1, U2, U3, U4
R_ClearExternalPtr: D1, U2, U3
R_ExternalPtrAddr: D1, U2, U3, U4, U5, U6, U7
R_ExternalPtrProtected: D1, U2
R_ExternalPtrTag: D1, U2, U3
R_MakeExternalPtr: D1, U2, U3, U4, U5, U6, U7
R_RegisterFinalizer: D1, U2, U3
R_SetExternalPtrAddr: D1, U2, U3
R_SetExternalPtrProtected: D1, U2, U3