1. What is Computer Architecture

Part of the 22C:122/55:132 Lecture Notes for Spring 2004
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Parsing the Course Title

The course title High Performance Computer Architecture says something important about this class.

High Performance -- This course is supposed to cover high performance computer architecture. This means that, in theory, everyone in this course should be familiar with ordinary or low performance computer architecture.
Computer Architecture -- This course is about computer architecture! The term computer architecture is frequently misunderstood, but in sum, it refers to the organization of the computer as seen by the user, with digressions below that level only as needed to understand the justification for the architecture.

Caveat

There is a serious problem with this course, and we will try to deal with it up front. As originally designed, this course assumed that incoming students had a solid grasp of ordinary or low performance computer architecture. Computer science students coming into the course had a 2-semester hardware sequence, first an assembly language programming course and then a digital systems course as prerequisites. Electrical engineering students also had an assembly language course, and their work on digital systems was expanded to two semesters, one on logic below the register transfer level and one on elementary computer architecture.

Unfortunately, since that time, the Computer Science curriculum has deemphasized hardware, collapsing that two semester course into one semester that tried to teach both assembly language programming and digital logic. Observation over the past few years shows that this course has generally failed. It either provided insufficient coverage of assembly language to motivate any understanding of computer architecture, or it provided insufficient digital systems background to understand the implementation constraints that limit computer architecture.

At the same time, the undergraduate curriculum in electrical engineering has shifted. Assembly language is no longer required for undergraduates in that field, although many learn low end assembly languages in courses on embedded systems, and the logic design courses have shifted more and more to the use of automated tools based on languages such as VHDL; these languages eliminate the need to understand the lower level details of the implementation, and this isolates the design engineer from an understanding of why one high-level specification of a design works well while another apparently equivalent one does not.

Therefore, this course will begin with a very fast-paced review of digital systems, followed by a fast-paced review of low-performance computer architecture. This background material will consume about 1/3 of the course, and by the end of that third, we will be talking about the implementation of several low-performance but general-purpose computer architectures. Only when we have completed that will we take off on an adventure studying high performance architectures.

What is Computer Architecture

Posing this question to a class typically generates the following kinds of responses:

It is the art of designing computers.
It is the study of the instruction sets.
It is the design of computer hardware.

All of these are true, to some extent, but it is useful to step back for a moment and discuss an older question, what is architecture, in the original sense of the word, and how does it differ from engineering, as the term applies to classical architecture.

What is architecture?

Architecture is the art of designing buildings that will be pleasing to the people who use or encounter them. Architects are concerned with such details as the layout of rooms, locations of doors and windows, and the relationship of a building to its surroundings. As such, architects are generally very interested in the intended uses of the buildings they design, and of course, they must have a firm grasp of the underlying technology, but one of the great maxims of Architecture places technology and art in their appropriate relationship: "form follows function" (Louis Henry Sullivan, 1856-1924, published in Lippincott's Magazine, 1896).

The point of this great quote is that the function dominates, shape and decoration should follow the function, not lead. Certainly, engineering underlies some of the functions. The roof of a picnic shelter must be supported, even though, at the top level functional description, it is to be an open sided unobstructed shelter. Once the need for roof supports is established, their function can, in turn, dictate additional elements of the form. In sum, Sullivan was an advocate of top-down design, in the sense we use the word in the world of software engineering.

The four buildings surrounding the old state capitol building on the University of Iowa campus provide another illustration of the difference between architecture and design. These buildings are architecturally compatable in the sense that they were designed to complement the old capitol building and each other, forming a visually pleasing group of five buildings on the crest of the bluff overlooking the Iowa River (hence the term Pentacrest for this group of buildings).

These buildings are indeed architecturally compatable, but they are based on three distinct underlying technologies. The older buildings, Shaeffer Hall and Macbride Hall, are of classical masonry and timber construction, as was the Old Capitol itself prior to its remodelling in the 1920's. MacLean Hall, built around 1910, is of modern reinforced concrete construction, and Jessup Hall, the newest of the group, was built using steel beam construction (steel beams also replaced many of the wooden elements of the Old Capitol in the 1920's). Obviously, whe we say that these buildings are architecturally compatable, we are referring to proportions, window layout and the thin veneer of limestone that hides their structure from outside viewers or the thin veneer of plaster that does the same for those inside.

When we use the term computer architecture, we use it in the same way! The architecture of a computer does not depend on whether it uses vacuum tubes, discrete transistors, integrated circuits or VLSI. In fact, many architecturally compatable computer families have been built spanning an immense range of implementation technologies. The IBM mainframe family of architectures was introduces in 1965 and continues in production today. The original machines were based on hybrid integrated circuits (ceramic substrates to which individual transistor and diode chips were bonded); later members of the family were based, progressively, on SSI, MSI, LSI and finally VLSI monolithic integrated circuits.

Of course, just as certain implementation technologies allow new structural options in the building trade, the same is true with computers. The introduction of steel beams in late 19th century buildings allowed the development of the skyscraper. Such structures could not have been made using classical stone and timber construction! In the same way, VLSI implementation allowed the massive complexity of the Pentium family and the beautiful pipeline designs found in modern RISC architectures. Such designs could not have been implemented using vacuum tubes, and only a few daring experiments with discrete transistors hinted at the kind of machine that dominates the marketplace today.

Three Levels of Abstraction

In Bell and Newell's seminal book, Computer Structures -- Readings and Examples, three distinct levels of abstraction were identified that are applicable to computer architecture. Every student of computer architecture should understand that any architecture can be described at any of these levels, and furthermore, that many architectures are extremely innovative at one level while being very conservative at another. These levels are:

The Processor-Memory-Switch (PMS) level: This is concerned with bus-level design and the switching mechanisms that interconnect processors, memory units, I/O devices and other top-level components of computer systems.
The Instruction-Set-Processor (ISP) level: This is concerned with the fetch-execute cycle of each processor, how instructions are encoded, and the details of operand addressing and register usage.
The Register-Transfer (RT) level: This is concerned with registers, multiplexors, arithmetic units and RAM arrays that underly the ISP computations.

Of course, all of these levels sit on top of the logic level, and the logic level itself sits on top of some underlying hardware technology. These levels are discussed in more detail in the following sections, with examples.

The Processor-Memory-Switch (PMS) Level

This level, with its focus on switching mechanisms used to interconnect top-level system components, is of primary concern to two groups: First, it is extremely important to the designers of input-output mechanisms. Only the smallest computers today have direct connections between their central processors and external devices. Most have ornate hierarchies of busses and other switching mechanisms sitting between the processor and the outside world.

A diagram of a commonplace modern computer system at the PMS level might look like the following:

	 _____   PRIMARY BUS  _____
	|     | ____________ |     |
	| CPU |<____   _____>| MEM |
	|_____|     | |      |_____|
	            | |    
	           __V__    ISA or PCI BUS
                  | Bus | _________________
                  |Link |<___   _____   ___>
                  |_____|    | |     | |
                            __V__   __V__    SCSI BUS
                           | ISA | |SCSI | ____________
                           |disk | |Cntrl|<__   _______>
                           |_____| |_____|   | |
                                            __V__
                                           |SCSI |
                                           |disk |
                                           |_____|

The second group with a central interest in PMS level innovations are the designers of high performance multiprocessors. When a single computer consists of numerous central processing units interconnected with a shared memory, the design of that interconnection network becomes a central issue in determining the performance of the system. When a network of tightly coupled computers is used as a distributed computing platform, again, the topology and nature of the interconnectons becomes a central issue.

The Instruction-Set-Processor (ISP) Level

The focus here on instruction encoding and the fetch-execute cycle of a processsor is of central importance to compiler writers, assembly language programmers and the designers of CPUs. Note that many different implementatons of the same ISP design may exist. Even if we fix the underlying technology, some may be fast and expensive while others are slow and inexpensive. For example, the Intel 8086 and 8088 implemented the same instruction set (the base instruction set from which the 80x86/Pentium family is descended); the 8086 was a 16-bit, relatively fast, and relatively expensive microprocessor while the the 8088 was an 8-bit, relatively slow and relatively inexpensive microprocessor.

Formal ISP descriptions are generally given in a programming language, and if compiled and executed, they serve as interpreters for the instruction set being described. The following example gives a partial ISP description of a very simple computer in C augmented with some comments:

	type word unsigned short int; /* assumed to be 16 bit */
	word memory[65536];
	word pc;	/* the program counter */
	word temp;	/* a temporary */
	word ac;	/* the accumulator */
	word mar;	/* the memory address register */

	word fetch( word addr );
	void store( word addr, word value );
	/* we'll assume these are defined somewhere else */

	void cpu() {
		for (;;) { /* the execution cycle never terminates */
			mar = fetch(pc);
			pc = pc + 1;
			temp = fetch(mar);
			mar = fetch(pc);
			pc = pc + 1;
			store(mar, temp);
		}
	}

The Register-Transfer (RT) Level

Given a PMS level system design and the ISP specification of an instruction set, you do not have a complete description of how a computer is built. You need to know which computations inside the machine are carried out in parallel and which are carried out in sequence, and you need to know how the data flows between the registers and combinational components of the system. This is the focus of the register transfer level, and it is the central issue with which the implementors of an architecture must deal.

A good engineer can implement any ISP specification, but if the specification is not written with a clear understanding of the limitations of the underlying RT level design, the result is unlikely to be fast or compact.

System descriptions at the RT level are frequently done in terms of block diagrams showing data flow between registers, combined with finite state automata descriptions of the control unit that evokes the required data transfers in the appropriate order. The following diagram illustrates this:

	       _________________________
              |  ___________   ________<
              | |           | |   Data from memory
        CPC  __V__    CMAR __V__
	 ---|> PC |    ---|>MAR |
            |_____|       |_____|
              | |____   ____| |
              |____  | |  ____|
                   | | | |     
          ADSEL  ___V___V___
            -----\ 0 MUX 1 /
                  \_______/   Address to Memory
                     | |____________
                     |______________>

It is important to note that any computer system can be viewed at all of these levels, and furthermore, it is important to note that any computer system can be completely specified at either the ISP or RT levels. The PMS level abstracts away major details, so a PMS specification cannot completely specify a computer system.

The Logic Level

It is important to remember that the Register Transfer level itself rests on top of the logic level, the level where we are concerned with gates and flipflops. There are compilers that will produce Register Transfer level designs from sufficiently detailed Instruction Set Processor specifications, and there are compilers that will produce Logic level specificaitons from Register Transfer level specifications.

Logic level specifications can be given equationally or equivalently by logic diagram. Thus, the following two descriptions of an RS flipflop are equivalent:

                 _     _
	Q = not( R and Q )
        _        _
	Q = not( S and Q )

        _          ______
	R --------|      |
                  | nand |-o----- Q 
                 -|______| |
                |          | 
                 ----------|-
                           | |
                 ----------  |
                |  ______    |
                 -|      |   |    _
        _         | nand |---o--- Q 
	S --------|______|

We will refer to the logic level quite frequently in this course, but this is not our primary concern! We will almost never refer to the underlying technology used to implement the logic level, except for purposes of example, or for purposes of exploring the economic impact of changes in technology. Thus, for most purposes, an understanding of relay logic (relays were invented by Samuel Morse over 150 years ago) or any other logic family will provide a sufficient foundation for this course.