Lecture 36, Case Statements

Part of the notes for S:4980:1
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Case-select Jump Tables

Consider the following case statement:

select i in
case 1:
  -- some code
case 2:
  -- more code
end

A two-alternative case statement is hardly worth writing, since it could be replaced with one or the other of the following. If the variable i is confined to the range 1..2, a single if statement suffices:

if i = 1 then
  -- some code
else
  -- more code
end

If the variable i is not confined to the range 1..2, an additional if statement is required:

if i = 1 then
  -- some code
else if i = 2 then
  -- more code
end end

Cascades of if statements can always be used to replace case statements, but if there are more alternatives, there are far better ways to generate the machine code. An assembly language programmer intent on writing machine for the case statement that generalizes to any number of cases might write code like this:

        ldr     r3,[fp,#i]       @ get operand i (a local var)

        @ bounds checking code, required if i can take on other values
        cmp     r3,#lower        @ check lower bound
        blt     endcase
        cmp     r3,#upper        @ check upper bound
        bgt     endcase                 @ better bounds-checking is possible

        @ jump to the correct case
        adr     r4,table
        ldr     pc,[r4,r3,lsl#2] @ indirect indexed jump through table

        @ the jump table and its description
lower   =       1
upper   =       2
table   =       .-(lower<<2)     @ address of element zero of table
        .word   case1            @ table[1]
        .word   case2            @ table[2]

        @ the cases
case1:
        -- some code
        br      endcase
case2:
        -- more code
        br      endcase          @ this isn't really needed
endcase:

This approach to case statements easily expands to any number of cases, and unlike the cascade of if statements, the run-time is independent of the number of cases. If the list of cases includes gaps, the unused table entries need to point to the else clause, if there is one, or to the endcase label if not.

There are alternatives to the jump table approach used above. If the list of table entries is sparse, a table with one entry per integer in the range from the minimum case label to the maximum will make very inefficient use of memory. A cascade of n compare instructions, each followed by a conditional branch, takes 2n words on the ARM if all of the constants are small, and 4n words in the worst case where the constants are arbitrary 32-bit values. Thus, a cascade of compares could make better use of memory if less than half of the table entries are used, and it will make better use of memory if less than 1/4 of the table entries are used.

Consider the following sparse case statement:

select i in
case 1:
  -- code for case 1
case 2:
  -- code for case 2
case 4:
  -- code for case 4
case 8:
  -- code for case 8
end

A simple cascade of if statements gives a linear search, hardly an attractive alternative as the number of cases grows:

if i = 1 then
  -- code for case 1
else if i = 2 then
  -- code for case 2
else if i = 4 then
  -- code for case 4
else if i = 8 then
  -- code for case 8
else
  -- code for all others
end end end

But, we can reorganize the search by rewriting the if statements:

if (i >= 1) | (i <= 8) then
  if i < 4 then
    if i < 2 then
      -- code for case 1
    else
      if i = 2 then
        -- code for case 2
      end
    end
  else
    if i = 4 then
      -- code for case 4
    else
      if i = 8 then
        -- code for case 8
      end
    end
  end
end

Generalizing on this code as the number of alternatives grows has the efficiency of a binary search. Hybrid code is possible, with jump tables for tightly-packed subranges of case labels, and conditional branching to deal with sparse areas of the table. For code patterned on the above, in the worst case where jump tables are not used we can select between any n consecutive alternatives with log2n+2 comparisons.

An optimizing compiler that tries to select between a jump table a linear cascade of comparisons or a binary-search cascade of comparisons must accumulate the entire list of cases before it decides on how to select between them. This requires complex code that is typical of a mature optimizing compiler but unlikely in a prototype compiler.

When Nicolas Wirth designed Pascal, his solution to this problem was simple: He simply advertised, in the reference manual for the language, that case statements were inefficient when the set of cases was sparse. Programmers were simply advised to write cascades of if-else statements to handle selection between sparse sets of cases. This worked well, and it allowed the compiler to always use an indexed indirect jump through a jump table.

Things are different in languages such as Python where expressions are allowed everywhere, including case labels. There, the cascade of comparisons is the general solution, and only a mature optimizing compiler will be able to decide to generate a jump table after it finds that all the case labels in some case statement are constants and that the range of values is compact.

Compiling in One Pass

Compiling a case-select construct in one pass is surprisingly easy so long as the assembler that processes the output uses two passes, and so long as the compiler can accumulate the entire case table in memory, saving it until the end of the case statement. Consider the following reorganization of the original example above: code:

        ldr     r3,[fp,#i]      @ get operand i (a local var)

        cmp     r3,#lower       @ check lower bound
        blt     endcase
        cmp     r3,#upper       @ check upper bound
        bgt     endcase

        adr     r4,table
        ldr     pc,[r4,r3,lsl#2]@ indirect indexed jump through table

case1:
        -- some code
        br      endcase
case2:
        -- more code
        br      endcase

        @ code emitted at the end of the case statement
lower   =       1
upper   =       2
table   =       .-(lower<<2)    @ address of element zero of table
        .word   case1            @ table[1]
        .word   case2            @ table[2]
endcase:

Note that, after the compiler has processed all of the cases, it has accumulated the values of all of the case labels, along with the assembly language labels to which control should jump for each case. While it accumulated this information, it can track the maximum and minimum case labels, the values for lower and upper.

If the assembler accumulates the cases in sorted form (for example, using a binar search tree), then all it has to do at the end is traverse the set of case labels, emitting the words in ascending order, with extra words inserted as needed when the case labels are not consecutive. If there was an else clause in the case statement, the extra words are the address of the else clause, otherwise they are the address of the end case.

Nested Function Definitions

When function definitions are nested arbitrarily within other functions, as permitted by Kestrel, there is another problem. Consider this example:

a: procedure
    -- some code

    b: prodedure
       -- body
    end;

    -- more code
end;

It would be nice, at compile time, to untangle the procedures a and b, so that the compiler output would look like the following:

A:      -- receiving sequence for a
        -- translation of some code
        -- translation of more code
        -- return sequence for a

B:      -- receiving sequence for b
        -- translation of body
        -- return sequence for b

A compiler that generates the complete parse tree and then generates code from that parse tree can untangle the code easily, but a one-pass compiler that generates code on the fly must resort to alternatives. One approach to this is to simply compile into successive memory locations in the order of the text of the program, with added branches to connect the parts of each subroutine. The result might look like the following:

A:      -- receiving sequence for a
        -- translation of some code
        B       ACONTINUE

B:      -- receiving sequence for b
        -- translation of body
        -- return sequence for b

ACONTINUE:
        -- translation of more code
        -- return sequence for a

This imposes a small run-time penalty for the nesting structure of the program, so it is worth looking for alternatives.

Perhaps the simplest solution is to defer the generation of the receiving sequence of a subroutine until the first component of that subroutine is encountered that requires code generation. This means that, so long as programmers adopt a coding style where all declarations are given first before any executable statements, and so long as local variables are not subject to implicit initialization, the receiving sequence for each block can be deferred until after the entire code for nested subroutines has been generated. In the event that programmers mix definitions and code in an undisciplined way, the branches suggested above would remain mixed into the code.

The Gnu as assembler offers a useful alternative because it supports the use of object code sections and subsections, each with its own location counter. By default, code is assembled into the text section, but you can create other sections. Furthermore, each section can have subsections. Since the text section is the one that normally contains all of the executable code of a program, it is natural to put the code there, and it is reasonable to use subsections of the text section to organize that code. For the full details, see the documentation for the .section assembler operation, and the .text directive, equivalent to .section text.

We could, for example, put each subroutine into a separate subsection of the text section by replacing the branch around each nested subroutine with a .text directive specifying a new subsection:

        .text   1
A:      -- receiving sequence for a
        -- translation of some code
    
        .text   2
B:      -- receiving sequence for b
        -- translation of body
        -- return sequence for b

        .text   1
        -- translation of more code
        -- return sequence for a

By generating code this way, the body of each routine is assembled into one contiguous subsection of the text section, where these subsections are appended to each other to form the entire program.