22C:18, Lecture 20, Fall 1996

Douglas W. Jones
University of Iowa Department of Computer Science

Arrays

Formally, to compute the address of an element of an array, given the address of the array itself and an integer subscript, the following computation is involved:

	a - address of first element of array A
	i - subscript
	s - size of an element in bytes
	f - subscript of first array element
	l - subscript of last array element
	e - address of A[i]

	error if i > l
	error if i < f
	e = a + s*(i - f)

Some programming languages have simplified models of arrays. For example, in C, f is always 0, and the compiler does not check for errors in the subscript range. In Pascal, Ada and many other languages, arrays may be declared as running between two arbitrary integers, and subscripts outside that range are illegal.

Consider the following Pascal declaration:

	var a: array [1 .. 10] of integer;

This declares a as an array with legal subscripts from 1 to 10, where each element is an integer, typically a 32 bit word. Given that the array is a local variable of a procedure, allocated in the activation record of that procedure at offset A from the base of the activation record, and given that R3 is to be used as the array index, the indexed word of the array, a[R3], can be computed as follows:

	; code to address a[i], where R3 = i
	CMPI	R3,10	; test upper bound
	BGT	ERROR
	ADDSI	R3,-1	; adjust by lower bound
	BLT	ERROR
	LEA	R4,R2,A	; R4 is address of array a
	ADDSI	R3,R4,2	; R3 is now the address of A[i]

Note in the above code that the comparison with the lower bound was combined with the adjustment of the subscript to account for the fact that the lower bound was nonzero.

This code can be simplified if we know that the array subscript is within bounds. In languages like Pascal and Ada, the type of each variable states the upper and lower bound of that variable, and declarations can be adjusted to specifically state the actual range of values a variable takes on. In such languages, it is frequently possible for the compiler to eliminate the tests of the array bounds. This can allow the above code to be reduced to:

	; code to address a[i], where R3 = i
	; assumes that i is known to be within bounds
	LEA	R4,R2,A-4 ; R4 is address of A[0]
	ADDSI	R3,R4,2   ; R3 is now the address of A[i]

This code uses assembly time subtraction to adjust for the nonzero lower array bound instead of run-time computation used in the previous example. Thus, R4 is loaded with the address of the illegal array element A[0] even though A[0] does not exist -- note that the address exists, in the sense that you can add to it to get the address of legal array elements, even though the array element A[0] does not.

Here is another example, using a 2 dimensional array of characters, declared in C as:

	static char screen[24][80];

This array consists of 24 sub arrays, each made of 80 characters. The keyword static declares screen to be allocated at a fixed address in RAM instead of in the activation record of a procedure, so the array might be declared in an assembly language program as:

	COMMON  screen, 1920	; 24 * 80 characters
Pscreen:W	screen

Because of linker restrictions on references to external symbols, references to the array screen would begin by loading Pscreen, the pointer to screen. To load screen[x][y] (character y on row x), the following code would be used, assuming that x and y are local variables in the current activation record:

	LOAD	R3,Pscreen
; r3 points to screen, the whole array
	LOAD	R4,R2,x	; get local variable x
	ADDSL	R4,R4,2	;  times 5
	ADDSL	R4,R3,4	;  times 16 plus Pscreen
; r4 now points to screen[x], one row of the array
	LOAD	R3,R2,y ; get local variable y
	ADD	R3,R4,R3
; r3 now points to screen[x][y], one element of the array
	LOADS	R4,R3
	EXTB	R4,R4,R3
; r4 now holds a copy of screen[x][y]

Note that no test of the array bounds was needed because C does not perform such tests, and that no subtraction to compensate for nonzero lower bounds on each dimension was needed because all C arrays are indexed starting at zero.

The Case Statement

Compilation of case statements (switch-case in C, case in Pascal, select in Ada) is complex. This form of control structure was poorly understood in early programming languages and only came into its own in languages developed after 1970. In general, there are many alternative ways of compiling such a statement. Consider the following bit of Pascal code:

	case x of
	  0: item(0);
	  1: item(1);
	  2: item(2);
	end;

The most obvious suggestion most programmers would have for implementing this is to translate it to the following equivalent Pascal code:

	if x = 0
	  then item(0)
	else if x = 1
	  then item(1)
	else if x = 2
	  then item(2)
	else error;

This code is easy to translate to assembly code, but the translation is not particularly efficient if the number of alternatives is large.

The problem is, the straightforward chain of if statements amounts to a linear search. A good compiler will always find a more efficient way to get to the various cases. One of the fastest ways of dispatching control to the various cases is the indirect indexed jump. Consider the following C example:

	switch (i) {
	case 1:	option(1); break;
	case 2: option(2); break;
	case 4: option(4);
	}

It would be possible to compile this as:

	if (i == 1)
		option(1);
	else if (i == 2)
		option(2);
	else if (i == 4)
		option(4);

But again, this is an inefficient linear search. Note, above, that we have illustrated a difference between C and Pascal! In Pascal, the case statement ends each case with an implicit branch to the end of the statement, and if the value of the index variable is not listed in the case list, it is an error. In C, cases in the switch statement must be followed with an explicit branch to the end of the statement -- done with a break statement, and if the index has a value that is not in the case list, it is not an error.

The efficient translation of this example cannot be directly expressed in either Pascal or C, but, but we can approximate it with a C-like pseudocode:

	if ((i >= 1) !! (i <= 4) {
		static code * jumptab(4) = { L1, L2, Lend, L4 };
		goto jumptab[i-1];
	L1:	option(1);
		goto Lend;
	L2:	option(2);
		goto Lend;
	L4:	option(4);
	Lend:	;
	}

This can be translated directly to assembly code as follows, assuming that the index variable i is a local variable and that option is a local procedure:

	LOAD	R3,R2,i	; get the case index variable
	CMPI	R3,4	; test for i within bounds
	BGT	Lend
	ADDI	R3,-1	; combine test of lower bound
	BLT	Lend	;   with adjustment of index
	LEA	R4,jumptab
	ADDSL	R3,R4,2	; multiply index by 4 as it is used
	LOADS	R3,R3	; get jumptable entry
	JUMPS	R3	; use it!

	ALIGN	4
jumptab:
	W	L1
	W	L2
	W	Lend
	W	L4

L1:	LIS	R3,1
	JSR	R1,option
	BR	Lend

L2:	LIS	R3,2
	JSR	R1,option
	BR	Lend

L4:	LIS	R3,4
	JSR	R1,option
	BR	Lend

Lend:

In the above code, note that bounds checking on the array reference is essential, and note how the test of the lower bound has been combined with adjusting the index variable so that the actual array indexing is done with a lower bound of zero.

More importantly, note that the array is stored in-line with the code, not somewhere else, and that it must be aligned.

In some contexts, the following variation on the above code is used:

	LOAD	R3,R2,i	; get the case index variable
	CMPI	R3,4	; test for i within bounds
	BGT	Lend
	ADDI	R3,-1	; combine test of lower bound
	BLT	Lend	;   with adjustment of index
	LEA	R4,jumptab
	ADDSL	R3,R4,2	; multiply index by 4 as it is used
	JUMPS	R3	; jump to array entry

jumptab:
	JUMP	L1
	JUMP	L2
	JUMP	Lend
	JUMP	L4

L1:	LIS	R3,1
	JSR	R1,option
	BR	Lend

L2:	LIS	R3,2
	JSR	R1,option
	BR	Lend

L4:	LIS	R3,4
	JSR	R1,option
	BR	Lend

Lend:

In this version, the jump table is an array of JUMP instructions instead of an array of addresses! Because the JUMP instructions are position independent, this means this code can be moved anywhere in memory without requiring changes to accomodate the relocation. On the other hand, jumping to a JUMP instruction is not recommended on modern machines that use pipelined execution because it causes a pipeline flush.

Pipelined execution is done on most modern processors. It involves fetching one instruction while the previous instruction is being executed while the results of the instruction prior to that are being stored. All forms of control transfers disrupt the pipelined execution, and jumps to jump instructions do so doubly!

Note that, when case or switch statements are compiled to use a jump table, sparse lists of case labels result in inefficient use of memory. If the only case labels are 1 and 1000, the jump table will contain 1000 entries, all but two of which point to the end of the case statement (in C) or the error handler (in Pascal). Some compilers first accumulate the list of case labels and then inspect it, selecting between alternative approaches to compiling it depending on the size of the table and the fraction of the table containing unused entries. Alternative approaches might include hashing (for very sparse tables), binary search and the indixed indirect jump described above.