1. Introduction

Part of 22C:60, Computer Organization Notes
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

 

Assembly Language, the Gateway to Computer Organization

For most of their users, Computers are very mysterious things, machines that can almost think, machines that can execute programs written in languages like Python, Java, C, Ada, Pascal, Basic, Algol, COBOL, or FORTRAN, and that can run applications that process what seems to be an infinite variety of data types.

Here, we assume that the reader understands how to program a computer, in at least one of these many and varied programming languages, and we ask, what is really going on inside the box? How does the computer execute that program? How can any physical mechanism carry out instructions expressed in a language as complex as even the simplest of the languages from the programming language zoo mentioned in the previous paragraph?

The answer to this question is best grasped in stages, just as the execution of a program is generally handled in steps. First, the high-level language program is translated to a simpler language, machine language, by a compiler, and then a computer is built that can execute this machine language.

Our goal here is to study an example of such a machine language, and in the process, to study two other things. First, we will informally study the translation process from high-level programming languages to machine language, and second, we will study the computer that executes this machine language.

What is Computer Architecture?

What is Architecture? This word is most easily discussed in the context of its original meaning with respect to the design of buildings. The Architecture of a building includes all aspects of the building's design that its occupants are aware of. This includes such details as the arrangement of rooms, location of doorways and windows, surface finishes and location of switches, lights, wall outlets and other conveniences.

Buildings, of course, are made using some technology, bricks, beams, plaster and stone, and these may be visible elements of the architecture. Brick exteriors and exposed beams are examples of engineering elements that are exposed to the occupant by the architecture. Note, however, that many engineering elements of the structure of a building may be hidden from the occupants, and that some exposed aspects of the engineering may be highly misleading. For example, a brick exterior does not indicate that the bearing walls of a building are brick; Brick exteriors are commonly applied over wood frame walls! Similarly, not all the exposed beams in buildings are actually structural elements. Architects have been incorporating false beams into their structures since Roman times!

The buildings on the University of Iowa Pentacrest provide an excellent example of architecture and engineering. The oldest building on campus, the Old Capitol, is a brick building with a limestone exterior and wood porticos. Most people seeing the porticos think they are stone! Macbride and Schaeffer halls are also brick and stone construction, but with real stone pillars holding up their porticos. Maclean Hall is reinforced concrete construction, with a very thin stone facing, and Jessup Hall is a steel frame building, also with a thin stone facing. These buildings are "architecturally compatable" while being based on very different engineering.

Computers also have architecture and engineering. The architecture of a computer system includes all aspects of the computer visible to programmers. For programmers working at the machine language level, for example, compiler writers, the architecture and the machine language are so intimately intertwined that the machine language can be described as the primary manifestation of the architecture.

Every design must be built using some underlying technology. Architects may draw plans, and computer architects may design machine languages, but these are of little use unless they are actually built of brick and stone or of silicon and copper. Once something is built, some aspects of the underlying engineering may show through in the architecture while others are completely hidden.

The architect Louis Henri Sullivan said that "form follows function," and with many computer architectures, this has been true. On the other hand, just as builders may use modern materials to build structures that retain forms that were dictated by ancient technologies, so too, modern computers are frequently built to be support architectures that were originally designed in terms of older technologies.
The DEC PDP-5, introduced in 1963
Photo © 1963 Digital Equipment Corporation, used with permission.
photo of 6-foot high rack-mounted computer with teletype and chair

For example, the DEC PDP-5 and PDP-8 architecture first appeared on the DEC PDP-5, sold in 1963; these machines were built using magnetic core memory and discrete transistors. The PDP-8, introduced in 1965, was an architecturally compatable reimplementation of this architecture using a newer generation of discrete transistor technology, with much better packaging. When these machines were introduced, they were among the smallest and least expensive computers on the marketplace.

The PDP-8/I was a reimplementation of the PDP-8 using TTL integrated circuits, that is, transistor-transistor logic integrated circuits, with typically fewer than 10 boolean logic operations (logic gates) per chip. The PDP-8/F, introduced in 1970, was another reimplementation using some MSI chips, medium-scale integrated circuits, where each chip had fewer than 100 boolean operators per chip. At this level of integration, components such as adders emerge as single chips.

The PDP-8/A, introduced in 1974, used some LSI chips, large-scale integrated circuits, with fewer than 1000 boolean operators per chip. This allows large components of the processor on one chip, such as the entire arithmetic-logic unit. In addition, around this time, semiconductor memory began to replace core memory.

The VT/78 was a reimplementation based on an architecturally compatable VLSI microprocessor; very large scale integrated circuits are those with over 1000 boolean opertors per chip; with the emergence of this technology, it became common to build entire processors on a chip, and core memory became ancient history.

The PDP-8 family ended with the end of the DECmate III+ production run in 1990. The range of physical size, price and above all, technology represented by this example is immense, yet a programmer who learned to program on the PDP-5 at the assembly-language level would notice very little change in programming a DECmate III+, except, of course, that where the PDP-5 processor and core memory occupied a full 6-foot high rack of electronics, the DECmate III+ was packaged like any IBM PC clone of the early 1990's. Of course, the programmer in 1963 would have had only a small amount of support software, while by 1990, there was a huge body of applications and supporting software available to the programmer.

Historical note: Digital Equipment Corporation was one of the most innovative developers of new computer architectures between 1960 and 1992. By the mid 1970's, it had grown to dominate the small computer market, and DEC's VAX series of 32 bit computers were the most widely used machines on the Internet in the mid 1980's. DEC was bought out by Compaq, which was bought out by Hewlett Packard.

There have been many families of computers that have undergone evolutionary developments similar to that of the PDP-8. IBM's 360 family of computers, introduced in 1965, is still around, in the form of machines that are usually called enterprise servers, and the Intel Pentium family that dominates the desktop and laptop computers of today remains compatable with the 8-bit Intel 8088 processor of the mid 1970's.

What Specific Computer Architecture Will We Study?

We will be studying the Hawk architecture. This fictional architecture combines elements of many modern RISC architectures with historical features that date back to the very first computers.
 

Why a fiction? After all, the Intel 80x86/Pentium family of computers dominates the marketplace, and it is used in many assembly language texts. Unfortunately, this architecture is in some ways comparable to a modern building built in the colonial style. While, under the skin, it may be constructed of steel and concrete, this is hidden under a brick and plastic skin that follows the forms of the Greek revival style that was popular in the Georgian era. This form, in turn, incorporates architectural elements from classical Grecian temples, but these, in turn, were stone structures that were, in many cases, imitations of wooden post and beam structures from the bronze age.

The Intel architecture of today evolved from Intel's first microprocessors of the early 1970's. Form followed function very closely in these early designs, but since the 1970's, Intel has been faced with the demand to offer compatable upgrades to older designs. At each step, new technology has been carefully hidden behind a veneer that allowed programmers to ignore these technical changes. As such, the 80x86/Pentium family is saddled with immense accidental complexity, making it very poorly suited for teaching.

The Hawk architecture, while fictional, is designed within the RISC (Reduced Instruction Set Computer) framework that dominates much modern thinking about computer architecture. The Apple/Motorola/IBM Power architecture formerly used in the Apple Power-PC based Macintosh, the IBM RS/6000, Sony PlayStation and newer Microsoft X-box systems is in this class, but this is a complex commercially viable architecture, while the fictional Hawk contains few features that are not motivated directly by the instructional context.

The Hawk architecture deliberately incorporates a few elements of older architectural styles, and as a result, those who have learned the Hawk architecture should not be surprised by elements of other architectures. Where there are strong contrasts or similarities between the Hawk architecture and others, these will be pointed out.

What Is Assembly Language?

Most high level languages do their best to protect programmers from having to learn anything about the computer architecture that actually runs their programs. Users of Python, Java, Ada or Pascal, for example, cannot even determine, from within the bounds of the standard language, whether the machine has bytes within its words! C++ and C programmers, in contrast, can explore the memory addressing model of the underlying machine, but this is usually a source of trouble and not a benefit of these languages.

Assembly language is almost universally used to teach elementary computer architecture, and many compilers produce their output in assembly language instead of machine language. Assembly languages completely expose the computer architecture to the programmer, providing a convenient textual way for expressing programs for the machine while doing nothing to hide the actual behavior of the hardware. Each assembly language statement typically corresponds to exactly one machine language instruction, and the only difference is that the assembly language statements are written in textual form with space for commentary intended for a human reader, while machine languages are expressed in binary codes that are very difficult for human readers to interpret.

A fragment of assembly language code
        ADDSI   R3, 1        ; R3 = R3 + 1
        ADDI    R4, R3, 10   ; R4 = R3 + 10
        LIL     R5, 100000
        ADD     R5, R5, R4   ; R5 = R4 + 100000

Assembly languages went through a great burst of creative development in the 1960's, but by the 1970's, it was clear that the majority of programmers would rarely need to know much assembly language. Today, aside from elementary assembly language and computer architecture classes, assembly languages are primarly used as the target languages for compilers. Thus, while most systems include assemblers, the code they assemble is usually written by other programs and not by humans. As a result, many modern assembly languages are not as well developed as the assembly languages of the mid 1970's that were designed to be read and written by human programmers.

It is worth noting that, while computer architectures are best studied at the assembly language level, assembly languages have only loose connections to the architectures they support. There are historically important examples of the use of assemblers designed to support one machine to assemble code for a completely different computer architecture. For example, all of the early code development for the DEC PDP-11 computer, a machine with a 16 bit word, was done on DEC PDP-10 computers, machines with 36 bit words, using DEC's MACRO-10 assembler, running on the PDP-10, to do the work.

What Assembly Language Will We Study?

We will be using the SMAL assembly language. SMAL stands, creatively, for Symbolic Macro Assembly Language (a fact nobody needs to remember) and it is far ritcher than many of the assembly languages used today, particularly those common with introductory assembly language texts.

SMAL includes well developed macro features and a syntax representative of some of the best assemblers of the 1970's. SMAL itself was developed in the early 1980's, predating the HAWK architecture by over a decade and even slightly predating the widespread recognition of RISC architectures. This does not have any impact on the utility of SMAL.

Where Next?

Before getting deeply involved in any specific machine language, we will focus on questions of data representation. In high level languages, we take for granted that the machine can represent data, whether it is in the form of numbers or text, but at the assembly language level, the programmer must take direct responsibility for all issues of representation. Conversion between number bases and questions of character coding will be at the center of this!