Protecting the BC Stack from Mutation

Background

In computing an expression like

<expr1> + <expr2>

the value of <expr1> is computed first and then the value of <expr2>. It is possible, though unusual, that the expression for computing <expr2> might contain complex assignments. These assignments must not mutate the computed value of <expr1>. In R 3.4.4 this was not handled properly and this example produced the wrong result:

x <- 1 + 0
x + { x[1] <- 2; 1}
## [1] 3  # should be 2

This issue was present in all calls to multi-argument BUILTIN functions from the AST interpreter. A similar issue existed in subset operations, implemented by a SPECIAL:

x <- matrix(1:4, 2)
i <- 1 + 0
x[i, { i[1]<-2; 1}]
## [1] 2  # this is m[2, 1]; should be 1

Similar issues also existed in byte compiled code where a complex assignment could mutate values on the stack from earlier computations.

A variant of this issue was reported to the R-devel list by Lukas Stadler in August 2017.

Fixes in R 3.5.0 and R 3.6.0

These issues were partially addressed in R 3.5.0 and more completely in R 3.6.0. The issue for BUILTIN calls is handled in eval.c and needs no further action or maintenance elsewhere.

Implementation changes to SPECIAL functions that evaluate more than one argument need to pay attention to this issue and make sure values computed earlier cannot be mutated while evaluating later arguments. Ths should be done using INCREMENT_LINKS and DECREMENT_LINKS, e.g. for two arguments as

val1 = eval(arg1, rho);
INCREMENT_LINKS(val1);
val2 = eval(arg2, rho);
DECREMENT_LINKS(val1);

Changes in R 4.0.0

The fix for protecting the byte code stack used in R 3.6.0 required the compiler to emit additional instructions protecting and unprotecting values on the stack. This note describes an alternative approach with less overhead that will be incorporated in R 4.0.0. (Committed to R_devel in r77275.)

Legitimate mutations through R code can only occur in a few places:

In byte code between STARTASSIGN/ENDASSIGN or STARTASSIGN2/ENDASSIGN2 instructions.
In calls to applyDefine from the AST interpreter.

The byte code interpreter also mutates values on the stack in a few situations where this is known to be safe.

In updating the value of the loop variable in for loops for some data types. This is safe for values that are not shared as they cannot also appear lower on the stack. Most of this will disappear with unboxed bindings, but not the STRSXP case.
In the current implementation of the SETVAR instruction, which tries to re-use a boxed variable allocation if possible, but only when the assignment is at top level. This is safe as the stack below the current bcEval call will be protectes as described below. This will disappear with unboxed bindings.]

To make sure that these operations do not mutate any boxed value on the BC stack all values on the stack need to have their link counts (NAMED or REFCNT) incremented and then decrement around these possible mutation points. This is done by calling

INCLNK_stack before entering possibly mutating code;
DECLNK_stack after leaving this code.

These functions use a stack pointer R_BCProtStack to keep track of how much of the stack has been protected. The pointer is stored in contexts and used to decrement link counts on a LONGJMP.

A key assumption is that none of the values on the protected stack will be changed between matching INCLNK_stack and DECLNK_stack operations. The binding cache and the variable value for a for loop may need to change, but these also do not need to be protected. Their stack fields are marked with the NLNKSXP tag so they will not have their links incremented and decremented.

The stack is protected around every call to applydefine. This ensures that values on the stack are protected during complex assignment operations during calls into the AST interpreter.

To reduce link count adjustments for objects lower on the stack, the stack protection pointer adjustment during byte compiled complex assignments has several components:

The stack protection pointer is raised at the beginning of a bcEval call and restored at the end. This means that top level complex assignments in byte compiled code need no further protection.
The protection pointer does need to be raised around non-top-level complex assignments. This is done by INCLNKSTK and DECLNKSTK instructions emitted by the compiler for non-top-level complex assignments.

Together these ensure that the stack is protected during all byte compiled complex assignments.

To further reduce unnecessary link count adjustments:

For for loops the stack protection pointer is adjusted around the loop body to avoid processing the loop data for every stack protection pointer adjustment needed while executing the body.
Similarly, the stack protection pointer is adjusted for loops needing a jump context to avoid repeated processing of the context data.

A final efficiency improvement is to defer actually committing the link count increments until they are needed. This means that code written in a functional style that does not use complex assignments does not need to actually perform the link count adjustments. The points where count increments are committed are

in applydefine for the AST interpreter;
in the STARTASSIGN and STARTASSIGN instructions.

For now, counts are also committed in SETVAR before deciding to reuse a boxed scalar, but this will go away with unboxed bindings.

Raising the minimal BC level will allow redoing the binding cache to be much smaller, and allowing the non-small-cache option to be dropped. The hacks for supporting old byte code can also be dropped at that point.

Protecting the BC Stack from Mutation

Luke Tierney

Background

Fixes in R 3.5.0 and R 3.6.0

Changes in R 4.0.0