9. The IAS Architecture

Part of the 22C:122/55:132 Lecture Notes for Spring 2004
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Introduction

In 1946, Arthur W. Burks, Herman H. Goldstine and John von Neumann submitted a report to the U. S. Army Ordinance Department entitled Preliminary discussion of the logical design of an electronic computing instrument. This report was a product of the Princeton Summer School that von Neumann had convened that year, bringing together most of those who were interested in the possibility of electronic computing. This paper has been republished in several places, among them, Chapter 4 of Bell and Newell's text, which has been transcribed to the web at

http://research.microsoft.com/~gbell/Computer_Structures__Readings_and_Examples/

This paper is not easy to read because, at the time, few of the ideas in computer science that we now take for granted were well established. The very idea of storing the program in the same memory as the operands saw its first appearance in print in this paper:

1.3. Conceptually we have discussed above two different forms of memory: storage of numbers and storage of orders. If, however, the orders to the machine are reduced to a numerical code and if the machine can in some fashion distinguish a number from an order, the memory organ can be used to store both numbers and orders. The coding of orders into numeric form is discussed in 6.3 below.

Note the use of the term orders for what we now call machine instructions, and note that the authors didn't take for granted that encoding of instructions in numerical form was obvious!

The paper puts quite a bit of thought into estimating how much memory might be required, looking at problems from numerical analysis for inspiration. The conclusion of this discussion is the following:

We consequently plan on a fully automatic electronic storage facility of about 4,000 numbers of 40 binary digits each. This corresponds to a precision of 2⁴⁰ ~ 0.9 x 10¹², i.e. of about 12 decimals. We believe that this memory capacity exceeds the capacities required for most problems that one deals with at present by a factor of about 10. The precision is also safely higher than what is required for the great majority of present day problems. In addition, we propose that we have a subsidiary memory of much larger capacity, which is also fully automatic, on some medium such as magnetic wire or tape.

Here, we see a realistic guess at the precision needed for numerical problem solving -- today, numerical analysts frown at 32-bit floating point representations, preferring to use 64 bits of precision; the 32 bit representations typically have only a 24 bit mantissa, and this is insufficient! On the other hand, Berks Goldstine and von Neumann clearly underestimated the amount of memory people would want.

Memory technology posed obvious problems. It is clear that an automatic computer requires some kind of RAM, and the possibility of RAM technology was entirely speculative at the time. The idea of secondary memory is also limited by the technologies of the day: Magnetic tape recording was then a real novelty, while its predecessor, magnetic wire recording, was established but not as easily adapted to information storage (multitrack tape is fairly easy to make, while multitrack recording on a hair-thin iron wire is impossible).

Berks Goldstine and Von Neumann speculated about using two different memory technologies, the RCA Selectron, a cathode ray tube using electron beam deflection to address spots on the face of the tube and capacitive storage of data on the face -- today we would refer to this as an early DRAM technology, and the use of feedback delay lines, essentially fast sequential memories. They reached the following conclusion:

4.2. We shall accordingly assume throughout the balance of this report that the Selectron is the modus for storage of words at electronic speeds. As now planned, this tube will have a capacity of 2¹² = 4,096 > 4,000 binary digits. To achieve a total electronic storage of about 4,000 words we propose to use 40 Selectrons, thereby achieving a memory of 2¹² words of 40 binary digits each. (Cf. again 2.3.).

It is worth noting that, while several successful first-generation machines used vacuum-tube based dynamic random access memory technology, including the Illiac I clones and the IBM 701, several other successful computers from that generation used mercury delay lines; among these, the Cambridge EDSAC was among the most important; it is frequently cited as the first stored-program digital computer (although the Manchester Mark I can also claim this title).

The basic ideas of sequential execution were not obvious, so they felt compelled to state them very explicitly:

3.4. It is clear that one must be able to get numbers from any part of the memory at any time. The treatment in the case of orders can, however, be more methodical since one can at least partially arrange the control instructions in a linear sequence. Consequently the control will be so constructed that it will normally proceed from place n in the memory to place (n + 1) for its next instruction.

From this, they draw the conclusion that each instruction will contain a memory address, ususally the address of an operand, but sometimes a branch address, and furthermore, they conclude that there must be a class of conditional branches, based on testing the sign of a number.

Later, they used the name control counter to refer to the register holding the address of the next instruction to fetch. Today, we would call this a program counter.

Number Systems

The quesiton of binary versus decimal number representations is given some thought, with the conclusion that the system should be constructed using binary:

5.2 In a discussion of the arithmetical organs of a computing machine one is naturally led to a consideration of the number system to be adopted. In spite of the longstanding tradition of building digital machines in the decimal system, we feel strongly in favor of the binary system for our device. Our fundamental unit of memory is naturally adapted to the binary system since we do not attempt to measure gradations of charge at a particular point in the Selectron but are content to distinguish two states. The flip-flop again is truly a binary device.
... An additional point that deserves emphasis is this: An important part of the machine is not arithmetical, but logical in nature. Now logics, being a yes-no system, is fundamentally binary. Therefore a binary arrangement of the arithmetical organs contributes very significantly towards producing a more homogeneous machine, which can be better integrated and is more efficient.
The one disadvantage of the binary system from the human point of view is the conversion problem. Since, however, it is completely known how to convert numbers from one base to another and since this conversion can be effected solely by the use of the usual arithmetic processes there is no reason why the computer itself cannot carry out this conversion.

Floating point data representations were considered and rejected in the design, and they pointed out that software implementations of high presision multiword arithmetic are far easier if done using integers than if done using floating point numbers for each component.

The ENIAC, the first fully electronic programmable calculator, was a decimal machine, and even after the retrofit of this machine to execute as a stored program computer, it remained a decimal machine. Several other first-generation computers were based on decimal arithmetic, including the IBM 703.

Arithmetic Unit

They argued for hardware implementations of add, subtract and multiply, and then noted that, given a fast multiply, the reciprocal of a could be computed by successive approximation using only 3 iterations of

x_i+1 = 2x_i - ax_i²

after starting with a table lookup in a 16 entry table. Therefore division hardware was (and is!) probably unnecessary. Nonetheless, they proposed to incorporate divide hardware in their machine. Similar arguments led them to rule out the inclusion of a hardware square-root implementation.

In thinking about the construction of the processor, they quickly came to the conclusion that they needed a register inside the processor to hold intermediate results:

5.5. The first part of our arithmetic organ requires little discussion at this point. It should be a parallel storage organ which can receive a number and add it to the one already in it, which is also able to clear its contents and which can transmit what it contains. We will call such an organ an Accumulator. It is quite conventional in principle in past and present computing machines of the most varied types, e.g. desk multipliers, standard IBM counters, more modem relay machines, the ENIAC. There are of, course, numerous ways to build such a binary accumulator.

The need for some way to accelerate carry propagation was recognized, and what they proposed was a probabilistic scheme, based on the observation that the average longest carry propagation in adding two n-bit numbers is log₂n. Therefore, instead of waiting O(n) time after every add, they could wait until some auxiliary circuitry indicates that carry propagation is complete, bringing the average add time down to O(log n) while leaving the worst case O(n). This scheme is wonderful but no-longer of interest, since we now know how to bring the worst case down to O(log n).

The questions of overflow and signed numbers was difficult:

Our numbers are 40 digit aggregates, the left-most digit being the sign digit, and the other digits genuine binary digits, with positional values 2^-1, 2^-2 , . . . , 2^-39 (going from left to right). Our accumulator will, however, treat the sign digit, too, as a binary digit with the positional value 2⁰ -at least when it functions as an adder. For numbers between 0 and 1 this is clearly all right: The left-most digit will then be 0, and if 0 at this place is taken to represent a + sign, then the number is correctly expressed with its sign and 39 binary digits.
... It should be noted that our convention of placing the binary point immediately to the right of the left-most digit has nothing to do with the structure of the adder.
... Summing up:
The accumulator may be taken to represent all real numbers modulo 2, and it adds them modulo 2. If x lies between -1 and 1 (precisely: - 1 < x < 1) - as it will in almost all of our uses of the machine - then the left-most digit represents the sign: 0 is + and 1 is -
... A subtraction x - y is therefore performed by the accumulator, Ac, as follows: Form x + y', where y' has a digit 0 or 1 where y has a digit 1 or 0, respectively, and then add 1 in the right-most position. The last operation can be performed by injecting a carry into the right-most stage of Ac-since this stage can never receive a carry from any other source (there being no further positions to the right).

Here, we see, after some difficulty, they have arrived at a 2's complement arithmetic system, although the use of that term was not yet widely understood. (they did understand and use the term complement itself.)

After working out the basic multiplicaton and division algorithms, they concluded that they needed a second register in the processor to hold the multiplier and quotient, and they also understood the need for shift operation that operated on this and the accumulator as a double-word.

Instruction Format

The discussion of the instruction interpretation cycle began as follows:

6.1. It has already been stated that the computer will contain an organ. called the control, which can automatically execute the orders stored in the Selectrons. Actually, for a reason stated in 6.3, the orders for this computer are less than half as long as a forty binary digit number, and hence the orders are stored in the Selectron memory in pairs.

This idea of packing pairs of instructions into one word came to dominate most of the large-scale computers of the next 30 years, on machines with word sizes of (chronologically) 40, 36, 48 and 32 bits.

The instruction format of the proposed machine is finally refined as follows, after drawing the conclusion that they needed at least 6 bits to select between the proposed machine operations:

As we pointed out in 2.3, and used in 4.2 et seq. and 5.7 et seq., our numbers will actually have 40 binary digits each. This allows 20 binary digits for each order, i.e. the 12 digits that specify a memory location, and 8 more digits specifying the nature of the operation (instead of the minimum of 6 referred to above).

In an aside, they noted that the use of an 8-bit order code (what we now call an opcode) in a 40 bit word suggests the use of a hexadecimal representation:

It is convenient, as will be seen in 6.8.2. and Chapter 9, Part II to group these binary digits into tetrads, groups of 4 binary digits. Hence a whole word consists of 10 tetrads, a half word or order of 5 tetrads, and of these 3 specify a memory location and the remaining 2 specify the nature of the operation. Outside the machine each tetrad can be expressed by a base 16 digit.

In summary, they concluded that the machine would have the an instruction set arranged as follows:

40-bit instruction register:
 ___________________________________________
|____________|________|____________|________|
| address x0 | order0 | address x1 | order1 |
|    instruction 0    |    instruction 1    |

13-bit control counter Cc
 ____________ _
|____________|_|
|  address   | halfword

40-bit accumulator Ac
 ___________________________________________
|___________________________________________|

40-bit register R (the multiplier-quotient)
 ___________________________________________
|___________________________________________|

With only modest modifications to the notation used in the original paper, here is the list of proposed operations. It is clear that the numeric encodings have not been formally assigned, but rather, that the numbering is purely for the purpose of enumeration.

function abbreviated name
1 Ac = M[x] x load
2 Ac = -M[x] x- load negated
3 Ac = abs(M[x]) xM load absolute value
4 Ac = -abs(M[x]) x-M load negated absolute value
5 Ac = AC+M[x] xh add
6 Ac = AC+M[x] xh- subtract
7 Ac = AC+abs(M[x]) xhM add absolute
8 Ac = AC-abs(M[x]) x-hM subtract absolute
9 R = M[X] xR load multiplier-quotient
10 Ac = R A shift register into accumulator
11 Ac|R = M[x]×R xX multiply
12 Ac = Ac%M(x) x% divide
R = AC/M
13 Cc = x|0 xC branch to left halfword
14 Cc = x|1 xC' branch to right halfword
15 if Ac>0 Cc = x|0 xCc conditional branch left
16 if Ac>0 Cc = x|1 xCc' conditional branch right
17 M[x]=Ac xS store accumulator
18 M[x].x0=Ac xSp modify left address field
19 M[x].x1=Ac xSp' modify right address field
20 Ac=Ac<<1 L shift left
20 Ac=Ac>>1 R shift right

	function	abbreviated	name
1	Ac = M[x]	x	load
2	Ac = -M[x]	x-	load negated
3	Ac = abs(M[x])	xM	load absolute value
4	Ac = -abs(M[x])	x-M	load negated absolute value
5	Ac = AC+M[x]	xh	add
6	Ac = AC+M[x]	xh-	subtract
7	Ac = AC+abs(M[x])	xhM	add absolute
8	Ac = AC-abs(M[x])	x-hM	subtract absolute
9	R = M[X]	xR	load multiplier-quotient
10	Ac = R	A	shift register into accumulator
11	Ac\|R = M[x]×R	xX	multiply
12	Ac = Ac%M(x)	x%	divide
R = AC/M
13	Cc = x\|0	xC	branch to left halfword
14	Cc = x\|1	xC'	branch to right halfword
15	if Ac>0 Cc = x\|0	xCc	conditional branch left
16	if Ac>0 Cc = x\|1	xCc'	conditional branch right
17	M[x]=Ac	xS	store accumulator
18	M[x].x0=Ac	xSp	modify left address field
19	M[x].x1=Ac	xSp'	modify right address field
20	Ac=Ac<<1	L	shift left
20	Ac=Ac>>1	R	shift right

It is a little surprising to find the opcode field to the right of the operand field in the instruction layout, but that appears to have been the intent of the design.

No real assembly code was given for the machine, since symbolic assemblers were not anticipated at that time; instead, this clerical job was religated to clerical employees. The abbreviated operation column in the above table is extremely cryptic, but apparently, this was a suggested notation for programmers to use in their instructions to the clerical staff who were expected to translate it to binary and punch it on paper tape for loading into the machine.

Summary

There are a lot of features missing from this architecture! Among them, it has only one addressing mode, direct memory addressing. The provisions in the instruction set supporting self-modifying code clearly indicate that the need for address computation was understood, but it was something of an advanced feature. It also has no mechanism allowing convenient function calls; the need for function calls was not understood in 1946.

While some things are missing, the broad support for absolute value computation provided by the architecture seems eccentric, and the ability to negate on load also seems generous. These are, perhaps, side effects of the arithmetic unit design that was anticipated.

It is important to remember that this was a paper design. This machine was not built, but this paper design inspired the construction of machines at universities and government offices around the United States and in England. Many of these machies were very similar to this proposal, but much of the documentation for these early machines has been lost (there are reams of dittoed technical reports at the University of Illinois, for example, where the blue ink has faded to the point that it is invisible.)

Of course, this preliminary report does not describe the machine that was eventually built! This differed in a number of ways from the original proposal. The final report on the project, Final progress report on the physical realization of an electronic computing instrument by Goldstine, Pomerene, and Smith was published in Jan. 1954 by the IAS Electronic Computer Project. This is available as:

http://www.stanford.edu/~lharcke/programming/IAS_Final_Report.pdf