22C:18, Lecture 22, Fall 1996

Douglas W. Jones
University of Iowa Department of Computer Science

Lexicographic Trees (continued)

Given the assembly language declaration of a node in a lexicographic tree, How can we write programs to do things with this? Recall the SMAL Hawk declarations:

	; lexicographic tree of strings.
	KEYSIZE = 16		; size of key field
	; fields of each node in the tree:
	LEFT	= 0		; pointer to left subtree
	RIGHT	= 4		; pointer to right subtree
	KEY	= 8		; an array of characters
	; size of each node in the tree:
	NODESIZE= 8 + KEYSIZE
One thing we frequently do with trees is traverse them. Consider the problem of doing an in-order traversal, printing the keys of each node. In Pascal, this might involve the following code:
	procedure traverse(ptr: noderef);
	begin
	    if ptr <> nil then begin
		traverse( ptr^.left );
		write( ptr^.key );
		traverse( ptr^.right );
	    end;
	end;
In SMAL Hawk assembly language, we can duplicate this logic, exactly! The following bit of code is written following cookbook methods, clearly separating receiving sequence from body, and using the full calling sequence for every call:
	;-------------------------

	; activation record structure
	;	 0	; return address        (4 bytes)
	TRAPTR = 4	; save pointer to node  (4 bytes)
	TRAAR =  8	; size of activation record

	TRAVERSE:		; traverse tree and print keys
				; expects R3 = pointer to node
				; uses R---
		STORES	R1,R2	; save return address and parameter
		STORE	R3,R2,TRAPTR
		; if pointer is nil, quit
		TESTR	R3		; 3
		BZS	TRAQT		; 3
		; call traverse(ptr^.left)
		LOAD	R3,R3,LEFT
		ADDI	R2,TRAAR
		JSR	R1,TRAVERSE
		ADDI	R2,-TRAAR	; 2
		; call dspstr(ptr^.key)
		LOAD	R3,R2,TRAPTR
		LEA	R3,R3,KEY
		ADDI	R2,TRAAR	; 1
		LOAD	R1,PDSPSTR
		JSRS	R1,R1
		ADDI	R2,-TRAAR	; 1
		; call traverse(ptr^.right)
		LOAD	R3,R2,TRAPTR
		LOAD	R3,R3,RIGHT
		ADDI	R2,TRAAR	; 2
		JSR	R1,TRAVERSE
		ADDI	R2,-TRAAR
		; windup and return
	TRAQT:				; 3
		LOADS	R1,R2	; recover return address
		JUMPS	R1	; return
This code can be optimized. The numbered comments above refer to optimizations; these are discussed below:
  1. The DSPSTR procedure in the monitor does not use R2, so adjustments to R2 around the call are not needed.

  2. So long as the uses made of R2 between the first and second recursive calls to TRAVERSE are properly adjusted, there is no need for intermediate adjustments to R2 between these calls.

  3. In a binary tree with n nodes, there are n+1 nil pointers! As a result, anything that can be done to speed the return when the pointer is nil will have a big effect on performance. The obvious solution to this is to move the test for nil right up to the entry point, before any registers are saved, and move the TRAQT label down to the return.
Taking these into account gives the following code:
	;-------------------------

	; activation record structure
	;	 0	; return address
	TRAPTR = 4	; save pointer to node
	TRAAR  = 8	; size of activation record

	TRAVERSE:		; traverse tree and print keys
				; expects R3 = pointer to node
		TESTR	R3	; see if we have a valid pointer
		BZS	TRAQT	;   quit if null
		STORES	R1,R2	; save return address
		STORE	R3,R2,TRAPTR
		ADDI	R2,TRAAR
		LOAD	R3,R3,LEFT
		JSR	R1,TRAVERSE
		LOAD	R3,R2,TRAPTR-TRAAR
		LEA	R3,R3,KEY
		LOAD	R1,PDSPSTR
		JSRS	R1,R1
		LOAD	R3,R2,TRAPTR-TRAAR
		LOAD	R3,R3,RIGHT
		JSR	R1,TRAVERSE
		ADDI	R2,-TRAAR
		LOADS	R1,R2	; recover return address
	TRAQT:
		JUMPS	R1	; return
This code is only 17 instructions long, while the original is 21 instructions; this is a savings of about 1/5. In fact, the savings is greater. Assuming that half the calls made have a nil pointer, half the calls to the old version would require executing 6 instructions where in the optimized version, only 3 instructions are needed.

A fair guess at the actual improvement results from adding the time taken for a non-nil call to the procedure to the time taken for a nil call under optimized and non optimized versions. These are 21+6 or 27 versus 17+3 or 20. We can conclude from this that the speed of tree traversal under the optimized version would be about 75% (20/27) of that under the original.

To achieve even better performance, we would have to abandon the framework established by the original code. The optimizations we did above adhered to this framework; such optimizations are sometimes referred to as peephole optimizations, since they can be made by looking at small parts of a program without attention to the global program structure. Global optimization of this program begins by rewriting the original procedure to move the tests for nil pointers:

	procedure traverse(ptr: noderef);
	{ called only when suspected that ptr = nil }
	begin
	    if ptr <> nil
	      then traverse1( ptr );
	end;

	procedure traverse1(ptr: noderef);
	{ called only when ptr known not to be nil }
	begin
	    if ptr^.left <> nil
	      then traverse1( ptr^.left );
	    write( ptr^.key );
	    if ptr^.right <> nil
	      then traverse1( ptr^.right );
	end;	
Here, we save the cost of procedure calls for each nil pointer by putting the test outside each call instead of inside of it. To do this, we had to introduce a second procedure to take care of testing the root of the tree for nil. For each nil pointer in the tree, this saves a call and a return instruction, and in addition, it eliminates the need for a TESTR instruction because the condition codes can be set by the LOADCC instruction used to get the left or right link.