32. Conditionals in Pipelined Systems

Part of the 22C:122/55:132 Lecture Notes for Spring 2004
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Skip Instructions

Prior to the introduction of the IBM System 360 in 1965, the most common machine level conditional control structures were based on one of two models:

Conditional skip instructions were extremely common. These only worked well if all the common instructions were the same size, but this was commonly the case on early machines where all instructions were one word or one half-word in size. If the fetch-execute cycle is:

	repeat
	    IR = M[PC];
	    PC = PC + 1;
	    execute
	forever

an unconditional skip instruction would do:

	    execute skip:
               PC = PC + 1;

Executing a skip instruction causes the CPU to skip over the immediately following instruction in the instruction stream. Typical skip instrucitons included:

skip if accumulator negative
skip if accumulator non-negative
skip if accumulator zero
skip if accumulator nonzero
skip if accumulator even
skip if accumulator odd

At the cost of only a small amount of extra logic, it was possible to add:

skip if accumulator positive (non-negative and nonzero)
skip if accumulator non-positive (negative or zero)

Particularly when these were absent, programmers became quite creative at combining skip instructions to achieve interesting effects, for example:

	skip if negative
	skip if nonzero   -- only executed if ac >= 0
	x                 -- only executed if ac <= 0

	skip if non-negative
	negate            -- takes the absolute value

        skip if negative
	branch            -- branches if ac >= 0

This approach to conditionals is natural on a single accumulator machine with a short word size, where the address field of the instruciton is used to encode the skip condition (and perhaps other non-memory-reference operations). When there are multiple accumulators, we typically add a register field to the skip instruction to indicate the register to be tested.

The skip instructions of the PDP-8, all encoded in the operate instruction (opcode = 111) provide an good example of this approach. See the discussion of the Group Two microcoded instructions on that machine.

When the word size is large, we have an address field that invites some use:

Use the address field for a branch destination address.: In this case, what would have been a skip instruciton becomes a conditional branch instruction.
Use the address field as an operand for a compare instruciton: In this case, the compare instruction can skip if the comparison meets the desired condition.

A very common variant on the second option above was a compare and skip instruction with the following semantics:

	    execute compare and skip:
	       if AC >= operand
		  PC = PC + 1;
	       if AC > operand
		  PC = PC + 1;

Typically, this was available in both an immediate form, where the operand is stored in the address field of the instruciton, and in a memory addressed form, where the operand was stored in memory at the indicated operand address. In either case, we get the following basic programming models:

	compare and skip, operand
	x                 -- only executed if AC < operand
	y                 -- only executed if AC <= operand
	z                 -- always executed

	compare and skip, operand
	branch x          -- only executed if AC < operand
	branch y          -- only executed if AC = operand
	z                 -- only executed if AC > operand

Clever programmers, of course, found other useful approaches to using these constructs, for example, for bounds checking:

	compare and skip, upper
	compare and skip, lower
	branch x                  -- AC < lower or AC = upper
	branch x                  -- AC = lower or AC > upper
	y                         -- lower < AC < upper

On some machines with a small word size, the compare and skip operation always compared the accumulator with zero; on others, it was possible to compare any of several accumulators with arbitrary constants.

This compare and skip family of instructions was present on the IBM 704 computer on which the FORTRAN language was developed in the mid 1950's. This explains the arithmetic IF statement that was the primary control structure in the original versions of the FORTRAN language. The statement IF(E)10,20,30 would goto 10 if the value of E was less than zero, 20 if E was zero, and 30 if E was greater than zero.
Because FORTRAN was one of the two dominant programming languages of the 1960's, computer architects in that era felt obligated to include 3-way conditional branch constructs in their machines. Today, this feature of FORTRAN is depreciated, never taught to most students learning FORTRAN, but still supported by most compilers.

Condition Codes

The designers of the IBM System 360 family in the mid 1960's adopted a new approach to handling conditional branches. These machines had a 2-bit status indicator in the CPU, and all instructions that could produce a result worth testing for a conditional branch set those status bits to indicate the results. These bits were called the condition code bits, and they were part of the machine's program status doubleword.

The IBM 360 condition codes were difficult to implement and difficult to interpret because the meaning of each condition code combination depended on the instruction that produced it in a somewhat unpredictable way.

The DEC PDP-11 family of computers introduced in 1970 introduced the view of condition codes that we now take for granted. On this machine, essentially every operation that routed its operands through the ALU set the condition codes in a uniform way; there were 4 condition code bits:

N: set when the ALU output is negative.
Z: set when the ALU output is zero.
V: set when there is a 2's complement overflow.
C: set when there is a carry out of the ALU.

This condition code model was adopted with only a few changes on the Motorola 68000, the Intel 8080 and later on the Intel 80x86. The most significant changes included:

Renaming the V condition code as the O condition code. DEC was concerned about the potential confusion of O and 0 (oh and zero), so they refused to use the letter O here in technical documentation.
Different processing for the C bit in subtract instructions. DEC openly allowed the C bit to reflect the realities of 2's complement arithmetic: C means not borrow, not C means borrow. Some programmers found that confusing, so some machine architects designed machines that inverted the sense of the carry bit after subtraction.

Given this PDP-11 set of condition codes, there are 14 useful conditional branch instructions, and if we add always and never to the list, we get 16 alternatives, suggesting that a 4-bit field of the branch instruction can be reserved to select the condition to be tested.

branch if N reset
branch if Z reset (branch if not equal)
branch if V reset
branch if C reset (branch if greater or equal, unsigned)
branch if greater or equal, signed (~N xor V)
branch if greater than, signed (~(N or Z) xor V)
branch if greater than, unsigned (C and ~Z)
branch never (false)
branch if N set
branch if Z set (branch if equal)
branch if V set
branch if C set (branch if less or equal, unsigned)
branch if less than, signed
branch if less or equal, signed
branch if less or equal, unsigned
branch always

The branch conditons listed in the second half of this list are all the simple inverses of those listed in the first half. Note that, after using a subtract instruction to compare two numbers, testing the Z condition code tests for equality. If the two numbers are both signed 2's complement numbers, the greater-than and less-than relationships are conveyed by the sign of the result, but if there is an overflow, the sign is wrong. If the two numbers are both simple unsigned numbers, the carry bit must be tested to determine which number was greater.

On the PDP-11 and essentially all of its successors that copied this condition code model, there were groups of 16 branch instructions that allowed testing of these condition codes. In the case of the PDP-11, these instrucitons were simple 16 bit instrucitons with a very short (8-bit) relative branch displacement field. This meant that conditonal branches could only be taken to target instructions that were nearby. On some later machines, the conditonal branch instruction was generalized to take an arbitrary operand address to specify the destination of the branch, but the truth is, most branches are to nearby instructions, so this may have been a waste of bits in the program representation.

Condition Codes in a Pipelined Setting

Consider a pipelined machine such as our running example:

	          ______       _____
	      IF |______|-----|     |
	          |>___|   _  |     |
	      AC |______|-| | |     |
	          |>___|  | | |     |
	      OF |______|=| |-|  M  |
	          |>___|  |R| |     |
	      OP |______| | | |     |
	          |>___|  | | |     |
	      OS |______|=|_|-|_____|

Whether this uses the now standard PDP-11 model of condition codes, or the older IBM System 360 model, the condition codes cannot be set until the operate stage in the pipe.

If all condition code tests are done in the same pipeline stage as the stage that sets the condition codes, then the tests will see the correct condition codes and conditional branches will be executed correctly. This holds for a simple pipeline, but it does not generalize to a superscalar pipeline, and it makes out-of-order execution difficult.

The chief designer of the DEC PDP-11 (C. Gordon Bell, well known for the textbook on computer architecture he co-authored) later concluded that one of the serious shortcomings of the PDP-11 design was that it overused the condition codes. Empirical studies of compiler output showed that the condition code values set by many instructions were never used, so the machine would have worked just as well if those instructions had not set the condition codes. Had this been the case, it would have made it far easier to apply ideas such as out-of-order execution to this machine.

In the general case, if out-of-order execution and superscalar architectures are to be supported, the condition codes must be treated as just another operand register. Instructions that depend on the condition codes must fetch the value of this register as they gather their operands, and instructions that set the condition codes must set this register as they set other destinations.

Therefore, we need to decode two new predicates from the opcode field, one to report that this instruction sets the condition codes and one to report that this instruction depends on the condition codes, and we need to add new interlock logic to ensure strict ordering of instructions that have condition code dependencies. While this model adds significantly to the number of pipeline interlocks we must check, at least the logic for detecting these interlock conditions is trivial -- a single and gate is all it takes to ask if one stage holds an instruction that needs the condition codes while the other stage holds an instruction that will set them.

Nonetheless, the complexities of dealing with condition codes are annoying enough that many more recent designers of pipelined machines have decided to abandon the whole idea of condition codes. The designers of the DEC Alpha, for example, eliminated condition codes in favor of instructions that tested the values in registers. So, you could branch if register positive, zero, or negative. Exceptional conditions such as overflow that fit so neatly into the condition code model were relegated to exception handlers -- on the assumption the use of 64 bit general registers would eliminate almost all of the overflows that traditionally plague programmers on machines with short word sizes.

Another alternative is to return to skip instructions. If there is no instruction reordering or if the interlock logic prevents reordering from applying to skip instructions, it becomes straightforward to implement skip instructions in a pipelined machine. The skip instruction simply invalidates its successor (the instruction in the previous pipeline stage) in order to cause the successor to be skipped. This is the model Hewlett Packard used in their PA RISC family, where each operate instruction may optionally test the ALU output and skip if it meets any of several conditions. The set of conditions tested by such a conditional skip will typically be selected by a 4-bit condition specification not too different from the set of conditions that the PDP-11 allowed to be tested.