USE_GENERATIONAL_GC
implements a
non-moving generational collector with two or three generations. The
generational collector divides allocated nodes into generations based
on some notion of age. Younger generations are collected more
frequently than older ones.
Memory allocated by R_alloc
is maintained in a stack. Code that
R_alloc
s memory must use vmaxget
and vmaxset
to obtain and
reset the stack pointer.
LEVEL_0_FREQ
level zero collections a level 1 collection is done;
after every LEVEL_1_FREQ
level 1 collections a level 2 collection
occurs. Thus, roughly, every LEVEL_0_FREQ
-th collection is a
level 1 collection and every (LEVEL_0_FREQ * LEVEL_1_FREQ)
-th
collection is a level 2 collection.
When a level N
collection fails to produce at least
MinFreeFrac * R_NSize
free nodes and MinFreeFrac * R_VSize
free vector space, the next collection will be a level N + 1
collection.
malloc
in chunks called pages. Pages are grouped into classes
according to the size of the vector component they contain.
Occasionally an attempt is made to release unused pages back to the
operating system. When pages are released, a number of free nodes
equal to R_MaxKeepFrac
times the number of allocated nodes for
each class is retained. Pages not needed to meet this requirement are
released. An attempt to release pages is made every
R_PageReleaseFreq
level 1 or level 2 collections.
R_NSize
and R_VSize
are used for
triggering collections. The initial values set by defaults or command
line arguments are used as minimal values. After full collections
these levels are adjusted up or down, though not below the minimal
values or above the maximum values, to towards maintaining heap
occupancy within a specified range. When the number of nodes in use
reaches R_NGrowFrac * R_NSize
, the value of R_NSize
is
incremented by R_NGrowIncrMin + R_NGrowIncrFrac * R_NSize
. When
the number of nodes in use falls below R_NShrinkFrac
, R_NSize
is decremented by R_NShrinkIncrMin * R_NShrinkFrac * R_NSize
.
Analogous adjustments are made to R_VSize
.
This mechanism for adjusting the heap size constants is very primitive but hopefully adequate for now. Some modeling and experimentation would be useful. We want the heap sizes to get set at levels adequate for the current computations. The present mechanism uses only the size of the current live heap to provide information about the current needs; since the current live heap size can be very volatile, the adjustment mechanism only makes gradual adjustments. A more sophisticated strategy would use more of the live heap history.
nuke
example
from the help page for the boot
library:
library(boot) data(nuclear) nuke <- nuclear[,c(1,2,5,7,8,10,11)] nuke.lm <- glm(log(cost)~date+log(cap)+ne+ ct+log(cum.n)+pt, data=nuke) nuke.diag <- glm.diag(nuke.lm) nuke.res <- nuke.diag$res*nuke.diag$sd nuke.res <- nuke.res-mean(nuke.res) nuke.data <- data.frame(nuke,resid=nuke.res,fit=fitted(nuke.lm)) new.data <- data.frame(cost=1, date=73.00, cap=886, ne=0, ct=0, cum.n=11, pt=1) new.fit <- predict(nuke.lm, new.data) nuke.fun <- function(dat, inds, i.pred, fit.pred, x.pred) { assign(".inds", inds, envir=.GlobalEnv) lm.b <- glm(fit+resid[.inds] ~date+log(cap)+ne+ct+ log(cum.n)+pt, data=dat) pred.b <- predict(lm.b,x.pred) remove(".inds", envir=.GlobalEnv) c(coef(lm.b), pred.b-(fit.pred+dat$resid[i.pred])) } for (i in 1:20) print(system.time(boot(nuke.data, nuke.fun, R=999, m=1, fit.pred=new.fit, x.pred=new.data)))The results for the first, tenth and twentieth runs on several architectures are given in the table below.
We can improve locality of reference by a simple trick of sorting the free lists either on each older generation collection or on each full collection. Sorting on each full collection seems sufficient, so I have adopted that approach.
PPC PPC, bytecode AMD-K6 1st 10th 20th 1st 10th 20th 1st 10th 20th no sort 50.91 51.67 51.82 40.86 48.84 49.88 64.10 70.36 72.87 all old 51.62 50.96 50.95 39.96 39.82 39.94 62.72 61.95 61.51 full only 51.47 50.94 51.13 40.16 40.41 40.77 62.59 62.27 62.47 PPC PPC, bytecode AMD-K6 1st 10th 20th 1st 10th 20th 1st 10th 20th no sort 38.58 41.75 42.86 72.05 79.03 82.98 152.63 164.84 177.63 all old 37.21 36.86 36.97 72.16 72.23 72.48 151.28 150.63 151.02 full only 37.66 37.43 37.57 72.95 73.16 74.13 151.25 150.37 151.30
R_VSize
and are R_NSize
are restricted to remain
below the values of R_MaxVSize
and R_MaxNSize
. Initially
these maximal values are both set to INT_MAX
to insure that the
size counters will not wrap on systems that have that much memory. On
other systems this means the heap will continue to grow until a
malloc
call fails. In many cases this may be adequate, but there
are situations where paging can lead to serious system performance
degradation before malloc
can fail.
The functions R_SetMaxVSize
and R_SetMaxNSize
can be used to
set new maximal values; the values will be kept above the current use
values. A problem with the way things currently are is that before
initialization R_VSize
is measured in bytes; after initialization
it is measured in vector cells (typically 8 bytes). Also, the values
are reported to the user in bytes. These conversions create the
possibility of integer overflows and are messy. We should change this
to always make the internal representation be in vector cells.
Hitting a memory limit will raise an error that aborts the current computation. With a more sophisticated error handling system it would be possible to allow the limit to be soft in the sense of raising a recoverable error that would allow a user-defined error handler to specify an increase in the limit and allow the calculation to continue.
It might be a good idea to bump the max limit a bit when it is hit to insure there is enough headroom to do any necessary cleanup. We haven't done this yet since it is a pain to do and so far it looks like enough room to delete big variables it usually available even in artificial test cases. Here is one that triggers a node cell overflow:
e<-NULL while(TRUE){ ee<-new.env() assign("x", e, env=ee) e<-ee }
R_NSize
and R_VSize
, as well as
several other size-related variables in the collector are defined as
int
. This is not ideal for 64-bit architectures where pointers
and long
s are 64 bits but int
is often 32 bits. These
definitions should be changed to long
(or perhaps something like
size_t
); I believe there are gcc
warnings options that can
help track down incompatibilities.
There area a few places where heap adjustments use occupancy fractions
expressed as double
's. I think these will work fine even for
64-bit integers, but this should be reviewed.
The current upper limits of of INT_MAX
on R_VSize
and
R_NSize
are intended to prevent wrapping on systems with huge
amounts of memory. They limit the number of nodes and the number of
vector cells to about 2,000,000,000 each.
We should review all uses of integer variable types for appropriate size and for possible overflows.
R_NSize
, R_VSize
). They
represent limits up to which allocation can occur without a GC.
Second, there is the total amount of memory allocated by malloc
.
This will fluctuate more than the (R_NSize
, R_VSize
) pair
because of efforts to release pages back to the OS. It will usually
be well below the limit implied by (R_NSize
, R_VSize
) right
after a GC. Right before before a GC it could be above due to
granularity in the page allocation for small nodes but will usually be
below because the GC will typically be triggered by only one of the
thresholds being reached. Currently the total malloc
allocation
is not recorded but could be computed easily if we wanted to (and we
may if we want to control total malloc
allocation directly).
Third there is the actual memory footprint of R in the OS. This would
be the number top
or the Windows task manager should be reporting
if they are accurate (the task manager may be but top
I think may
need to be taken with a grain of salt.) What happens here depends a
lot on the quality of the malloc
--whether calls to free
are able to
result in reducing memory footprint.
For limiting memory footprint we would ideally like to control the
third measure, but this is quite difficult on Unix and Windows, or any
system with real VM. Controlling malloc
is a surrogate that may
be adequate.
For preventing a computation from going out of control, using
R_MaxNSize
and R_MaxVSize
should be adequate. It might even
be sufficient for controlling memory footprint.
The current approach assumes that a malloc
ought to succeed if it
is permitted to occur by the current R_NSize
and R_VSize
settings. if it fails, an error is signaled immediately. With a
malloc
limit (and maybe even without one) it would make sense to
go through a few intermediate steps after a malloc
fails and
before signaling an error:
malloc
still fails, try to release all possible pages
and try again.
malloc
still fails, then signal an error.
As always with resource exhaustion errors, an issue is that after signaling the error the problem may still exist and it might therefore not be possible to do anything about it. A scheme of allowing a small reserve for processing the error and cleaning up may be needed.
Memory
help page needs to be adjusted properly once
the interface to tuning the GC is established.