22C:18, Lecture 21, Fall 1996

Douglas W. Jones
University of Iowa Department of Computer Science

An Example

It is time to work an extended example. One of the standard example data structures is the lexicographic tree. That is, a binary tree where each node in the tree contains a left and right pointer plus a key, typically a string, where all strings in the left subtree of a node are alphabetically less than or equal to the string in the node itself, which is strictly less than the stirngs in the right subtree.

In C, such a tree might be declared using something like the following:

/* nodes in a lexicographic tree of 16 character strings */
#define KEYSIZE 16
struct node {
	struct node * left;
	struct node * right;
	char key[KEYSIZE];
}
For comparison, this would be declare in Pascal as:
type
	{ lexicographic tree of 16 character strings }
	keyrange = 0..15; { range of indices to chars in a key }
	noderef = ^ node; { pointer to node }
	node = record     { the node itself }
		left: noderef;
		right: noderef;
		key: packed array [keyrange] of char;
	end;
In either case, the declaractions understood by the compiler say nothing about the tree structure! They just say that a node contains two pointers to other nodes, one named left and one named right, plus a key that is an array of characters. These nodes could be arranged into arbitrary graphs, doubly linked lists or many other data structures. Thus, in either language, comments are needed to indicate that these nodes are intended for making lexicographic trees!

I assembly language, we cannot declare a structure or record as such, but we can give symbolic names to all of the relevant parts of the record, and we can use commentary to tie the whole thing together. For example:

	; lexicographic tree of strings.
	KEYSIZE = 16		; size of key field
	; fields of each node in the tree:
	LEFT	= 0		; pointer to left subtree
	RIGHT	= 4		; pointer to right subtree
	KEY	= 8		; an array of characters
	; size of each node in the tree:
	NODESIZE= 8 + KEYSIZE

Allocating Records

These declarations are enough to allow us to write code to use tree records, but they leave one big question unanswered. Where are tree records allocated?

It is occasionally useful to statically allocate a record. In C, one might write:

	struct node n;
Or, in Pascal
	var n: node;
This can be done in SMAL Hawk assembly language:
	COMMON n, NODESIZE
But, none of these three examples are commonplace. Far more typically, we would expect to allocate tree nodes dynamically. Typically, such dynamic allocation is described as being on the heap. In Pascal, given that variable np was declared as a pointer to a node, we'd write:
	new(np);
to dynamically allocate a new node at run-time. In C, we would write something like:
	np = (struct node *)malloc(sizeof(struct node));
In C++, we would call the constructor for the node class, and this would, implicitly, obtain storage from the heap. In fact, all constructors do so by calling malloc()!

In assembly language, assuming that np is a local variable and that MALLOC is a library routine, the natural translation for the above C and Pascal operations would be:

	LIS	R3,NODESIZE
	LOAD	R1,PMALLOC
	JSRS	R1,R1
	STORES	R3,R2,np
This assumes that the MALLOC procedure takes the size of the node to be allocated in R3 and returns a pointer to a block of that many bytes in R3.

The C version of malloc and the MALLOC code called above are expected to behave identically, allocating the indicated number of bytes from the heap. The question is, where is the heap? The answer lies in the behavior of the linker. We create the heap explicitly, for example, with the following code, written as an externally linkable procedure:

	TITLE	Simple Heap Manager

HEAPSIZE = #1000
	COMMON	HEAP,HEAPSIZE
PHEAP:	W	HEAP

; the first word of the heap is used to point to the
; first byte that has not yet been allocated.
LCSAVE	=	.
.	=	HEAP
	W	HEAP+4	; let the loader initialize this word
.	=	LCSAVE

;------------------------
	INT	MALLOC
MALLOC:			; allocate memory on the heap
			; given size of record in R3
			; returns pointer to record in R3
			; wipes out R4-6
			; does not use R2
	LOAD	R4,PHEAP
	LOADS	R5,R4	; get heap allocation pointer
	ADD	R6,R5,R3
	STORES	R6,R4	; update allocation pointer
	MOVE	R5,R3	; return the pointer
	JUMPS	R1
This version explicitly asks the liker to allocate 1000 (hex) bytes for the heap. The first word of the heap is used to keep track of the next available location, and whenever a request for memory arrives, MALLOC increments the first word by the size of the requested block, then returns the original value of the first word as the pointer to the block itself.

This version of the heap manager has a serious drawback! It does not guarantee that the pointers it returns are aligned to a word boundary! This can be assured by rounding the requested size up to the nearest multiple of 4. The following sequence of instructions does this:

	ADDSI	R3,3
	LIS	R4,-4
	AND	R3,R4	; truncate result to next lower multiple of 4
Another problem with this code is that it requires that the user anticipate the size of the heap region needed! What we really want is to have the stack and the heap share all of the remaining memory. To do this, we need to know more about how the linker operates. The linker arranges programs and data in memory as follows:
	 ________
#000000	|        | Monitor
	|        |
	|________|
#001000	|        | Relocatable Code A
	|________|
	|________| Relocatable Code B
	|        |
	|        | Relocatable code C
	|________|
        |        |

	 ________
#010000	|        | Common TRAPBUF (used by monitor)
	|________|
	|________| Common DSPPTR (used by monitor)
        |        |
        |        | Common A (user)
	|________|
        |        | Common B (user)
	|________|
        |________| Common C (user)
	|        |
	|________| Common D (user)
UNUSED	|    ____| \
	|   /  __   |
	|__/  /  |  > free RAM
	 ____/   |  |
	|________| /
UNAVAIL
Both code and commons are allocated in the order the linker encounters them! The monitor has its own commons, and these always come first. Putting the heap in a common is not necessarily a good idea. Therefore, the SMAL linker gives an alternative. The linker defines the external symbol UNUSED to point to the lowest unused location in RAM, just above the end of the highest common, and we usually use this as the base of the stack for procedure and function calls. The linker also defines UNAVAIL as the address of the byte beyond the very last byte in RAM.

This allows the stack to grow upward from UNUSED and the heap to grow downward from UNAVAIL. The following external code for the heap manager utilizes this idea.

	TITLE	A Better Heap Manager

; the common HEAP holds one word pointing to the
; end of available memory.  Storage is allocated
; from that part of RAM not used by the linker.
	COMMON	HEAP,4
PHEAP:	W	HEAP

LCSAVE	=	.
.	=	HEAP	; setup so the loader initializes
	EXT	UNAVAIL ;   the initial word heap pointer
	W	UNAVAIL
.	=	LCSAVE

;------------------------
	INT	MALLOC
MALLOC:			; allocate memory on the heap
			; given size of record in R3
			; returns pointer to record in R3
			; wipes out R4-6
			; does not use R2
	ADDSI	R3,3	; round R3 up to allocate in full words
	LIS	R4,-4
	AND	R3,R4	;   done with rounding
	LOAD	R4,PHEAP
	LOADS	R5,R4	; get heap allocation pointer
	SUB	R5,R5,R3
	STORES	R5,R4	; update allocation pointer
	MOVE	R5,R3	; return the pointer
	JUMPS	R1
This version still has one shortcoming! It does not check for heap overflow. If the stack begins at UNUSED and runs upward, this involves a comparison of returned pointer in R3 with R2, the register used as a stack pointer. If R2>R3, it would be appropriate to halt with an error message!