1. Introduction

Part of CS:2630, Computer Organization Notes
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

 

Assembly Language, the Gateway to Computer Organization

For most of their users, Computers are very mysterious things, machines that can almost think, machines that can execute programs written in languages like Python, Java, C++, Ada, Pascal, Basic, Algol, COBOL, or FORTRAN. These languages have many built-in data types such as integers, floating-point numbers and strings, and they provide tools allowing applications to create and process an infinite variety of other data types.

Here, we assume that the reader understands how to program a computer, in at least one of these many and varied programming languages, and we ask, what is really going on inside the box? How does the computer execute that program? How can any physical mechanism carry out instructions expressed in a language as complex as any of the languages mentioned in the previous paragraph?

The answer to this question is best grasped in stages, just as the execution of a program is generally handled in steps. First, the high-level language program is translated to a simpler language, machine language, by a compiler, and then a computer executes the machine language.

Our goal here is to study an example of such a machine language, and in the process, to study several other things. First, we will study the C programming language because, unlike many of today's high level lanugages, it exposes significant features of the underlying machine. We can separately describe how features of language lie Java or C++ are translated to C and how features of C are translated to machine language. Second, we will study how the hardware of a computer executes machine language.

What is Computer Architecture?

What is Architecture? This word is most easily discussed in the context of its original meaning with respect to the design of buildings. The Architecture of a building includes all aspects of the building's design that its occupants are aware of. This includes such details as the arrangement of rooms, location of doorways and windows, surface finishes and location of switches, lights, wall outlets and other conveniences.

Buildings, of course, are made using some technology, bricks, beams, plaster and stone, and these may be visible elements of the architecture. Brick exteriors and exposed beams are examples of engineering elements that are exposed to the occupant by the architecture. Note, however, that many engineering elements of the structure of a building may be hidden from the occupants, and that some exposed aspects of the engineering may be highly misleading. For example, a brick exterior does not indicate that the bearing walls of a building are brick; brick veneers over over wood frame walls are common. Similarly, not all the exposed beams in buildings are actually structural elements. Architects began incorporating false beams into their structures in ancient times.

The buildings on the University of Iowa Pentacrest provide an excellent example of the difference between architecture and engineering. The oldest building on campus, the Old Capitol, is a brick building with a limestone exterior and wood porticos. Most people seeing the porticos think they are stone. Macbride and Schaeffer halls are also brick and stone construction, but with real stone pillars holding up their porticos. Maclean Hall is reinforced concrete construction, with a very thin stone facing, and Jessup Hall is a steel frame building, also with a thin stone facing. These buildings are "architecturally compatable" while being based on very different engineering.

The Pentacrest, with Schaeffer and MacLean Halls on the left, the Old Capitol in the center, and a bit of Jessup Hall on the right.
Photo © 2018 Tony Webster, Creative Commons Attribution 2.0 license, from Wikimedia Commons
      photo of 4 of the 5 buildings on the Pentacrest      

Computers also have architecture and engineering. The architecture of a computer system includes all aspects of the computer visible to programmers. For compiler writers and other programmers working at the machine language level, the architecture and the machine language are so intimately intertwined that the machine language can be described as the primary manifestation of the architecture.

Every design must be built using some underlying technology. Architects may draw plans, and computer architects may design machine languages, but these are of little use unless they are actually built of brick and stone or of silicon and copper. Once something is built, some aspects of the underlying engineering may show through in the architecture while others are completely hidden.

The architect Louis Henri Sullivan said that "form follows function," and with many computer architectures, this has been true. On the other hand, just as builders may use modern materials to build structures that retain forms that were dictated by ancient technologies, so too, modern computers are frequently built to be support architectures that were originally designed in terms of older technologies.

For example, the DEC PDP-5 and PDP-8 architecture first appeared on the DEC PDP-5, sold in 1963. This was a second-generation digital computer, built using magnetic core memory and discrete germanium transistors. The PDP-8, introduced in 1965, was architecturally compatable with the PDP-5, built using a newer generation of discrete silicon transistor technology with much better packaging. When these machines were introduced, they were among the smallest and least expensive computers on the marketplace ($27,000 for the PDP-5, $18,000 for the first model of the PDP-8).

The PDP-8/I was a reimplementation of the PDP-8 using TTL integrated circuits, that is, transistor-transistor logic integrated circuits, with typically fewer than 10 boolean logic operations (logic gates) per chip. This brought the price down to $12,000 in 1968. The PDP-8/E, introduced in 1970, was another reimplementation using some MSI chips, medium-scale integrated circuits, where each chip had fewer than 100 boolean operators per chip. At this level of integration, components such as adders emerge as single chips, bringing the price down to $6,500 in 1970.

The DEC PDP-5, introduced in 1963
Photo © Copyright Digital Equipment Corporation, 1963. All rights reserved. Reprinted by permission.
 
photo of 6-foot high rack-mounted computer with teletype and chair

The PDP-8/A, introduced in late 1974 at $1,835, used some LSI chips, large-scale integrated circuits, with fewer than 1000 boolean operators per chip. This allows large components of the processor on one chip, such as the entire arithmetic-logic unit. In addition, around this time, semiconductor memory began to replace magnetic core memory.

The VT/78 was a reimplementation based on an architecturally compatable VLSI microprocessor in 1978; very large scale integrated circuits are those with over 1000 boolean opertors per chip; with the emergence of this technology, it became common to build entire processors on a chip, and core memory became ancient history.

The PDP-8 family ended with the end of the DECmate III+ production run in 1990. The range of physical size, price and above all, technology represented by this example is immense, yet a programmer who learned to program on the PDP-5 at the assembly-language level would notice only a few changes in programming a DECmate III+, even though the PDP-5 processor and core memory occupied a full 6-foot high rack of electronics, while the DECmate III+ was packaged like any IBM PC clone of the early 1990s. Of course, the programmer in 1963 would have had only a small amount of support software, while by 1990, a huge body of application and supporting software was available.

Historical note: Digital Equipment Corporation was one of the most innovative developers of new computer architectures between 1960 and 1992. By the mid 1970s, it had grown to dominate the small computer market, and DEC's VAX series of 32 bit computers were the most widely used machines on the Internet in the mid 1980s. DEC was bought out by Compaq, which was bought out by Hewlett Packard.

There have been many families of computers that have undergone evolutionary developments similar to that of the PDP-8. IBM's 360 family of computers, introduced in 1965, is still around in the form of machines that are called enterprise servers, and the Intel architecture family that dominates the desktop and laptop computers of today evolved from and retains features of the 8-bit Intel 8088 processor of the mid 1970s.
 

What Specific Computer Architecture Will We Study?

We will be studying the Hawk architecture. This fictional architecture combines elements of many modern RISC architectures but with some historical features that date back to the very first computers.

Why a fiction? After all, the Intel 80x86/Pentium family of computers dominates the marketplace, and it is used in many assembly language texts. Unfortunately, this architecture is in some ways comparable to a modern building built in the colonial style. Under the skin, it may be modern, made of steel and concrete, but this is hidden under a brick and plastic skin in the Greek revival style that was popular in the Georgian era. This, in turn, incorporates architectural elements from classical Grecian temples, but these, were stone structures that imitated designs originally developed in the bronze age for wooden post and beam structures.

The Intel architecture of today evolved from Intel's first microprocessors of the early 1970s. Form followed function very closely in these early designs, but since the 1970s, Intel has been faced with the demand to offer compatable upgrades to older designs. At each step, new technology has been carefully hidden behind a veneer that allowed programmers to ignore these changes. As such, the 80x86/Pentium family is saddled with immense accidental complexity, making it very poorly suited for teaching.

The Hawk architecture, while fictional, is designed within the RISC (Reduced Instruction Set Computer) framework that dominates much modern thinking about computer architecture. The Apple/Motorola/IBM Power architecture formerly used in the Apple Macintosh and IBM RS/6000, and later used in the the Microsoft X-box 360 and the Sony PlayStation 3 is in this class, as is the MIPS architecture found in many Windows CE devices as well as the Sony PlayStation 2 and PSP. The ARM architecture, used in many Android devices as well as the Raspberry PI computer, is another example. These are complex commercially viable architectures, while the fictional Hawk contains few features that are not motivated directly by the instructional context.

The Hawk architecture deliberately incorporates a few elements of older architectural styles, and as a result, those who have learned the Hawk architecture should not be surprised by elements of other architectures. Where there are strong contrasts or similarities between the Hawk architecture and others, these will be pointed out.

What Is Assembly Language, and Why C?

Most high level languages do their best to protect programmers from having to learn anything about the computer architecture that actually runs their programs. Users of Python, Java, Ada or Pascal, for example, cannot even determine, from within the bounds of the standard language, whether the machine words are divided into bytes. C++ and C programmers, in contrast, are free to explore the machine's memory addressing model, and this is frequently a source of trouble for new users of these languages.

Nonetheless, C and its object-oriented extension C++ have a very important place in the computer industry. First, C has a historical place: The C language was developed around 1970 to support the developent of the Unix operating system. The influence of C and Unix on later systems is extraordinary. Unix was not the first operating system developed using a high-level language. MULTICS was developed in PL/I and Burroughs used Algol-S for their MCP operating system.

Second, C remains widely used in low-level operating system programming today. Newer languages have been developed for system programming, but C remains important. A solid understanding of C is required for most work with current operating systems, microcontrollers, and Internet of Things devices.

C, however, is just a way-station on the road we are following. C exposes the programmer to the memory addressing model of the underlying machine, but it hides the instruction set. Our goal in this course is to explore the instruction set of a computer, that is, the actual machine language implemented by the hardware, and how physical hardware can be used to build a machine to implement that instruction set.

We could directly study the machine language, but it is easier to study it if we have a symbolic notation. These notation are called assembly languages. There many assembly languages, differing in details but all broadly similar. Assembly language is almost universally used to teach elementary computer architecture, and although some compilers produce their output directly in machine language, others produce assembly language.

Assembly languages completely expose the computer architecture to the programmer, providing a convenient textual way for expressing programs for the machine while doing nothing to hide the actual behavior of the hardware. Each assembly language statement typically corresponds to exactly one machine language instruction, and the only difference is that the assembly language statements are written in textual form with space for commentary intended for a human reader, while machine languages are expressed in binary codes that are very difficult for human readers to interpret.

A fragment of assembly language code
        ADDSI   R3, 1        ; R3 = R3 + 1
        ADDI    R4, R3, 10   ; R4 = R3 + 10
        LIL     R5, 100000
        ADD     R5, R5, R4   ; R5 = R4 + 100000

Assembly languages went through a great burst of creative development in the 1960s, but by the 1970s, it was clear that the majority of programmers would rarely need to know much assembly language. Today, aside from elementary assembly language and computer architecture classes, assembly languages are primarly used as the target languages for compilers. Thus, while most systems include assemblers, the code they assemble is usually written by other programs and not by humans. As a result, many modern assembly languages are not as well developed as the assembly languages of the mid 1970s that were designed to be read and written by human programmers.

It is worth noting that, while computer architectures are best studied at the assembly language level, assembly languages have only loose connections to the architectures they support. There are historically important examples of the use of assemblers designed to support one machine to assemble code for a completely different computer architecture. For example, all of the early code development for the DEC PDP-11 computer, a machine with a 16 bit word, was done on DEC PDP-10 computers, machines with 36 bit words, using DEC's MACRO-10 assembler, running on the PDP-10, to do the work.

What Assembly Language Will We Study?

We will be using the SMAL assembly language. SMAL stands, creatively, for Symbolic Macro Assembly Language (a fact nobody needs to remember) and it is far ritcher than many of the assembly languages used today, particularly those common with introductory assembly language texts. Unlike the Hawk computer, there is nothing fictional about SMAL, it is a real assembly language, and it has, at various times, been used to assemble code for real computers as well as fictional ones.

SMAL includes well developed macro features and a syntax representative of some of the best assemblers of the 1970s. SMAL itself was developed in the early 1980s, predating the HAWK architecture by over a decade and even slightly predating the widespread recognition of RISC architectures. This does not have any impact on the utility of SMAL.

Where Next?

Before getting deeply involved in any specific machine language, we will focus on questions of data representation. In high level languages, we take for granted that the machine can represent data, whether it is in the form of numbers or text, but at the assembly language level, the programmer must take direct responsibility for all issues of representation. Conversion between number bases and questions of character coding will be at the center of this.