Background

In computing an expression like

<expr1> + <expr2>

the value of <expr1> is computed first and then the value of <expr2>. It is possible, though unusual, that the expression for computing <expr2> might contain complex assignments. These assignments must not mutate the computed value of <expr1>. In R 3.4.4 this was not handled properly and this example produced the wrong result:

x <- 1 + 0
x + { x[1] <- 2; 1}
## [1] 3  # should be 2

This issue was present in all calls to multi-argument BUILTIN functions from the AST interpreter. A similar issue existed in subset operations, implemented by a SPECIAL:

x <- matrix(1:4, 2)
i <- 1 + 0
x[i, { i[1]<-2; 1}]
## [1] 2  # this is m[2, 1]; should be 1

Similar issues also existed in byte compiled code where a complex assignment could mutate values on the stack from earlier computations.

A variant of this issue was reported to the R-devel list by Lukas Stadler in August 2017.

Fixes in R 3.5.0 and R 3.6.0

These issues were partially addressed in R 3.5.0 and more completely in R 3.6.0. The issue for BUILTIN calls is handled in eval.c and needs no further action or maintenance elsewhere.

Implementation changes to SPECIAL functions that evaluate more than one argument need to pay attention to this issue and make sure values computed earlier cannot be mutated while evaluating later arguments. Ths should be done using INCREMENT_LINKS and DECREMENT_LINKS, e.g.ย for two arguments as

val1 = eval(arg1, rho);
INCREMENT_LINKS(val1);
val2 = eval(arg2, rho);
DECREMENT_LINKS(val1);

Changes in R 4.0.0

The fix for protecting the byte code stack used in R 3.6.0 required the compiler to emit additional instructions protecting and unprotecting values on the stack. This note describes an alternative approach with less overhead that will be incorporated in R 4.0.0. (Committed to R_devel in r77275.)

Legitimate mutations through R code can only occur in a few places:

The byte code interpreter also mutates values on the stack in a few situations where this is known to be safe.

To make sure that these operations do not mutate any boxed value on the BC stack all values on the stack need to have their link counts (NAMED or REFCNT) incremented and then decrement around these possible mutation points. This is done by calling

These functions use a stack pointer R_BCProtStack to keep track of how much of the stack has been protected. The pointer is stored in contexts and used to decrement link counts on a LONGJMP.

A key assumption is that none of the values on the protected stack will be changed between matching INCLNK_stack and DECLNK_stack operations. The binding cache and the variable value for a for loop may need to change, but these also do not need to be protected. Their stack fields are marked with the NLNKSXP tag so they will not have their links incremented and decremented.

The stack is protected around every call to applydefine. This ensures that values on the stack are protected during complex assignment operations during calls into the AST interpreter.

To reduce link count adjustments for objects lower on the stack, the stack protection pointer adjustment during byte compiled complex assignments has several components:

Together these ensure that the stack is protected during all byte compiled complex assignments.

To further reduce unnecessary link count adjustments:

A final efficiency improvement is to defer actually committing the link count increments until they are needed. This means that code written in a functional style that does not use complex assignments does not need to actually perform the link count adjustments. The points where count increments are committed are

For now, counts are also committed in SETVAR before deciding to reuse a boxed scalar, but this will go away with unboxed bindings.

Raising the minimal BC level will allow redoing the binding cache to be much smaller, and allowing the non-small-cache option to be dropped. The hacks for supporting old byte code can also be dropped at that point.