11. The Burroughs Stack Architectures

Part of the 22C:122/55:132 Lecture Notes for Spring 2004
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Introduction

An obvious problem facing the designer of a digital computer from 1948 to today is that the logic we use to implement processors is far faster than the logic we use to implement main memory. Therefore, a well balanced design must minimize the number of memory cycles consumed in executing a program. One reason that Burks, Goldstine and von Neumann proposed packing multiple instructions in a word, beside the fact that words were big and they couldn't justify large instructions, was that this reduced the number of memory cycles needed to execute a program from two per instruction to 1.5 per instruction (a single fetch followed by two execute cycles, one per halfword).

One path to follow in improving performance is to try to pack as many instructions into a word as possible. and one way to do this is to eliminate as many operand fields as possible by having the central processor use contextual information to infer the operands. This sounds difficult, but there is one general case where it is easy, the stack or push-down architecture. This was first exploited by Burroughs Corporation in the early 1960's with a brilliantly designed computer, the B5000.

The B5000 was introduced by an article in Datamation, vol. 7, no. 5, pp. 28-32 (May, 1961); this is reprinted as chapter 22 of Bell and Newell's text, which has been transcribed to the web at

http://research.microsoft.com/~gbell/Computer_Structures__Readings_and_Examples/

This paper was written for popular consumption by the common programmers of the day, but much has changed since then.

First, note that the only high level language worth note in 1961 was IBM's FORTRAN language which had been around for almost half a decade. COBOL and Algol were considered new and exciting. The excitement of Algol is still with us, since languages like Pascal, C and Ada are all descendants of Algol.

Burroughs took a radical track in deciding that they would not develop an assembly language for their machine, but rather, that they would do all off their system programming in Algol. They did compromize, developing an Algol dialect called Algol S that was used only for system work, but with this, they wrote a larger fraction of their system in a high level language than you find in today's Unix systems.

The reason they were successful is that they designed their compiler and their machine language in tandem. This view of computer architecture, where the architecture plus the compiler are seen as an implementation of a programming language, is still a bit radical, since it does everything possible to avoid letting component technology or register-transfer ideas influence the development of the architecture at the ISP level.

The list of major innovations in the B 5000 is long:

Run-time relocation. Loading code in memory did not involve modifying the address fields of instructions to account for the addresses being used. Instead, all addresses used within a program were segment-relative, and the hardware and operating system took care of maintaining a segmented model of addressing.
Segmented virtual memory technology. This was a breakthrough! Each program was composed of a code segment, a stack and a base segment. Words within these segments could contain descriptors for other segments (additional code segments, data segments representing individual records or arrays, etc.)
A pure stack architecture in which the only memory references explicitly made by programs were to the variables being manipulated, and there was never any need to explicitly refer to registers or temporary locations.
Interrupt driven input-output and traps for servicing of operating system events such as segment faults.
At the register-transfer level, the top elements of the stack were stored in registers so that sequences of push and pop operations did not necessarily result in memory cycles. As a result, programs ran as fast as they would on a machine where registers were explictly referenced from the program.
At the processor-memory-switch level, multiple processors and direct memory access input-output data transfers sharing access to a multiport memory implemented with a crossbar switch. This proved quite successful, and for over a decade, Burroughs was the only computer manufacturer to achieve any success with multiprocessor architectures; through the 1970's, as the company lost creative energy, they were routinely selling machines with as many as 6 CPUs, while at the same time, IBM was hesitant about the potential of machines with more than 2.

In this discussion, we will focus on the innovations Burroughs made within the CPU and not at these higher levels, and specifically on questions of instruction coding and register-transfer level design.

Tagged Architecture

Unlike almost any other family of commercially successful computers (the exception is the IBM AS 400), Burroughs used a tagged architecture. The B 5000 and all of the later Burroughs stack machines descended from this used a 48-bit word, and the data type for operands was encoded in the operand itself and not in the instruction that referenced that operand:

 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
|_|_|_ _ _ _ _ _|_|_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _|
|f|s|  6 bits   |s|     39 bits                                         |
| | exponent    | operand                                               |

Integers and floating point numbers both used the same format - so, effectively, programmers could imagine that they were always using floating point. In effect, the flag bit f is used to distinguish between data words and words that have other uses. Today, we would refer to this as a tag bit.

In character mode, text was packed as 6-bit characters, 8 per word. In most cases, there was no way to accidentally refer to a word as a character string, but there were cases, and as a result, security on the B5000 was not as good as it is, for example, in Java (later models in the Burroughs product line fixed this problem).

Programs on the B5000 were composes of streams of instruction syllables, where each syllable was 12 bits. Thus, the fetch-execute cycle for straight line code could fetch one word of instructions and then execute 3 instructions before the next fetch. Many instructions were one syllable, but longer instructions were required for references to variables in the program.

The Stack

All arithmetic on the B5000 used operands on the stack. Thus, for example, the add instruction could be defined, in the abstract, as

   push(pop() + pop())

To allow efficient memory use, two registers were used to store the top elements of the stack, A and B. These acted as what we would now call a stack-top cache. The stack therefore existed in 3 states:

i, all in memory
j, top item in A, remainder in memory
k, top item in B, remainder in memory
l, top item in A, second item in B, remainder in memory

If the initial stack state was i (all in memory), a push changed the stack state to j (top item in A) without requiring any memory cycles to do the push operation. A second push would change from state j to state l, and if this was followed by a pop, the state would change to state k; a second pop would change the state to i.

Arithmetic operations such as add and subtract were implemented by first normalizing the stack to state l, bringing the top two items into the A and B registers using zero, one or two memory cycles to do so, and then doing the arithmetic and leaving the result in register B, with the stack in state k.

As a result, assuming that the stack was initially in state i (all in memory) the common instruction sequence push push add push add push add (used to add 3 numbers) would leave the stack in state k, with the sum in B, and no memory cycles would be used by the push and pop operations implied by the naive understanding of how such an instruction sequence would be implemented.

Later stack machines frequently extended this scheme to use 3 or 4 stack-top registers. The Collins Advanced Processing System (CAPS), that grew into the Rockwell Advanced Architecture Microprocessor (AAMP) is excellent example of an architecture that is directly descended from the Burroughs 5000 architecture, at least with regard to this element of the instruction set design and implementation. The AAMP I (from the mid 1980s) had 4 high-speed registers as its stack-top cache; current models may have more. (The AAMP family is the only 32-bit production microprocessor family designed in Cedar Rapids Iowa.)

The Instruction Set

The first 2 bits of the instruction were used to broadly classify the instruction

Operator syllable
 _ _ _ _ _ _ _ _ _ _ _ _
|_ _|_ _ _ _ _ _ _ _ _ _|
|0 0|  op     -- select one of about 60 word-mode operations

Literal syllable
 _ _ _ _ _ _ _ _ _ _ _ _
|_ _|_ _ _ _ _ _ _ _ _ _|
|0 1|  const  -- push the 10-bit integer constant on the stack

Operand call syllable
 _ _ _ _ _ _ _ _ _ _ _ _
|_ _|_ _ _ _ _ _ _ _ _ _|
|1 0|  displacement -- push the operand from memory on the stack

The operand call uses the displacement as a word address in the current program reference table; this table (we'd call it a segment today) contains global variables and descriptors of other segments. The tag bit on the entry in this table distinguishes between the two possibilities. If a simple variable is referenced, the variable is pushed on the stack.

If the tag says it's a descriptor, the next most significant bits distinguish between simple pointers, pointers to segments and pointers to functions. When a pointer to a simple variable is encountered, that variable pushed on the stack. This can actually go into a loop, if the variable is itself a pointer! If the tag bit indicates the descriptor for a segment, the top of the stack is popped and used as an index into the indicated segment.

Finally, the tag can say that the referenced item is a function! In that case, the function is called, any parameters passed to the function are popped, and the result returned by that function is pushed onto the stack!

Absolute chaos can result if the program finds a segment reference where the programmer expected a simple variable, but so long as the programmer is careful to use data structures in a manner that matches their creation, all will go well.

Descriptor call syllable -- push an address on the stack
 _ _ _ _ _ _ _ _ _ _ _ _
|_ _|_ _ _ _ _ _ _ _ _ _|
|1 0|  displacement

The final class of instruction pushes a descriptor on the stack instead of pushing the item referenced by the descriptor. This is valuable for pushing pointers to variables, for example, to pass parameters to a function, and it is valuable because assignment statments are encoded by pushing the desination address first. Consider

    A = B + C

This is compiled as:

    push descriptor A
    push operand B
    push operand C
    add
    pop

The final pop is a binary operator that stores the contents of the top item in the stack in the memory location pointed to by the descriptor under it, and then pops both of them off of the stack.

It is worth noting that the J-code used to represent Java programs in todays Internet world is identical in spirit to this code and can be considered a lineal descendant of the work done at Burroughs in the late 1950's and early 1960's!

Note that the address space seems to be only 10 bits and it seems to have no support for local variables! This would be awful if only direct addressing was allowed, but it is survivable because single words of the program reference table (the data segment) may themselves refer to other segments, so, in effect, a program may have up to 1024 globally defined variables, each of which is either a simple variable, an array, or a structure of some other type, where each entry in such a structure may itself be composed of further secondary segments.

In practice, some of these global segments are actually references to activation records in the program, so entry to a subroutine involves pushing the old descriptor for this static nesting level in the source code and then allocating a new segment for this level to hold the current local variables. later Burroughs architectures would refine this idea.

Implementation

Nobody would have imagined implementing such a complex architecture in the vacuum tube era! Only with transistors could such complexity be miniaturized enough to fit in a room! Not only that, but the control sturcture of the fetch execute cycle is extremely irregular, requiring an extremely complex control unit to control the register transfers of the processor. We solve this control unit problem by what is called microprogramming, the subject of the next lecture. We won't give the microprogram (or even the complete specification) of the B5000. If you are interested in such machines, look at the specifications for the J-machine for a far more modern (and quite useful) example.

Microprogramming is actually an old idea. If you look at photos of the analytical engine built by Charles Babbage's son Henry in 1904 (now in the Science Museum in London), you will see a drum just over the hand-crank on the lower right-hand side. This drum is essentially a music-box mechanism, that is, a read-only memory, and it holds what is, effectively, the microprogram for the control unit of the engine. Given the nature of a music-box mechanism, it is fair to say that it is expressed in binary!

Unfortunately, this idea was unknown to Maurice Wilkes, the director of the Cambridge Computing Laboratory, so he ended up reinventing microprogramming, and later, in his autobiography, he noted that, had he known of Babbage's work, it could have set his work forward by several years.

There are nice photos of Babbage's engines and pieces of them on the web. See the London Science Museum web site