8. Arithmetic and Logic.
Part of
22C:60, Computer Organization Notes

19th century logicians, most notably George Boole, identified the basic operators of what we now know as Boolean algebra, the and, or and not operations. These operators are, of course, the familiar operators used in Boolean expressions in common programming languages as well as being among the basic operators of the logic that philosophers have studied for millenia. What logicians like Boole did was bring these into the realm of mathematics by demonstrating that these operators formed an algebra in just the way that addition and subtraction on the integers form an algebra.
Our goal is to examine enough Boolean algebra to understand how basic arithmetic operations can be performed using only Boolean operations. This allows us to study how the arithmetic unit of a computer works. We will not discuss how Boolean operations are implemented in hardware, but will accept, on faith, that this can be done. Thus, we will not have to deal with transistors or other devices.
Boolean logic operates on the values true and false, and it was George Boole who recognized that if true is represented by the value one and false is represented by the value zero, we can think of or as being similar to addition, where adding zero does nothing, and we can think of and as being similar to multiplication, where multiplying by zero always gives zero, and multiplying by one does nothing.
Based on these observations, Boole proposed that we use the symbols for arithmetic operators to stand for logic operators. So, in Boolean notation, a+b means a or b, and ab means a and b. This notation is still widely used, with only one minor change. Where Boole used the minus sign to mean not, modern logicians tend to use an overbar, so modern logicians write a instead of –a to mean not a. One other symbol has come into common use in Boolean algebra, the use of a⊕b to mean a exclusiveor b. This is true if a is true or b is true, but not both.
The field of Switching theory emerged before World War II, largely motivated by the use of complex systems of electromechanical relays in telephone exchanges. This field applies Boolean logic to practical engineering problems, and with the development of electronic calculators and then computers, switching theorists quickly moved into the center of certain aspects of computer design.
Electrical engineers in specific have a long tradition of using graphical notations to do their design work, and by the late 1960's, a standard graphical notation emerged for Boolean logic that is well suited to the problem of expressing the uses made of this logic formalism in the context of computer design. The basic elements of this notation are given on the next page. In this notation, inputs are shown on the left and outputs are shown on the right. Lines connecting components are used to show the flow of data from outputs to inputs.
inputs  the output c  
a  b  and
a b 
or
a + b 
nand
a b 
nor
a + b 
not
a 
xor
a ⊕ b 
0  0  0  0  1  1  1  0 
0  1  0  1  1  0  1  
1  0  0  1  1  0  0  1 
1  1  1  1  0  0  0 
In addition to the conventional Boolean operators and, or and not, engineers quickly introduced two new operators, nand and nor. These correspond to and and or with their results inverted by a not operator, as indicated in the algebraic notation for Boolean logic by an overbar over the expression. These were introduced largely because the simplest amplifiers used in digital logic perform the not function, and a typical hardware implementation of the and or or function produces an output that must be amplified before it is used as an input to another logic operator.
The triangle symbol is an old symbol for an amplifier in electronic schematics, and it is still used in this way in some schematic diagrams for logic circuits, standing for a gate with inputs and outputs that are equal. The small circle on the output of the nand, nor and not gates indicates logical negation. Conventionally, this is placed on the output of the gate, but occasionally, you will see this little circle placed on an input in order to indicate the presence of an inverter or not gate built into that input.
The exclusiveor operator is also given a special symbol in this schematic notation, but not because it is a primitive operator that is easy to build in hardware. In fact, the exclusiveor operator is difficult to build, requiring hardware equivalent to 4 nand operators, but it is very common in arithmetic.
Electrical engineers refer to hardware devices that perform the functions of Boolean operators as logic gates. Each physical logic gate has connections for a power supply as well as connections for inputs and outputs. A schematic diagram shows the interconnection of inputs and outputs as lines between the symbols for the individual gates; these are frequently referred to as wires, even if the actual hardware involves conductive traces on silicon or on a printed circuit board. The following collection of simple gates, for example, are shown connected to perform the exclusiveor function:
c = a ⊕ b = (a b) + (a b) 
When explaining the exclusiveor function, there are several reasonable explanations. For example: The exclusive or is true if one of the inputs is true and the other is false. The formulation given in the figure above, in both schematic and equational forms, corresponds to the this English explanation.
Electrical enginners use some standard notational conventions in drawing schematic diagrams. First, inputs are usually on the top or left, and outputs are usually on the bottom or right. Thus, information within a logic circuit usually flows from top to bottom and from left to right. Second, where wires cross, this may be indicated by a schematic indication that one wire hops over the other, while where wires are connected, a dot (historically representing a screw) is added to emphasize the connection. To avoid ambiguity, connections are usually shown with lines meeting to form a T, while crossings form an X.
Historically, engineering students would buy templates to help draw the symbols for the logic gates, since freehand drawings can be quite sloppy. If you do draw them freehand, note that the key features that distinguish an or symbol from an and symbol are that the or has a curved input side and a point on the output side, while the input side to the and is straight and the output side is a perfect halfcircle.
Exercises
a) Consider the logic diagram given above for constructing an exclusiveor gate from and, or and not gates. What function does this compute if the 2 not gates are replaced by a single nand gate that combines the a and b inputs and passes its result to both of the inputs originally served by separate not gates.
b) Consider the problem of comparing two 1bit binary numbers for equality. This can be done with an exclusiveor gate and one other gate to give an output of 1 if the numbers are equal and an output of zero if they are different. Draw the schematic diagram for a a solution to this problem and give an equation in Boolean logic notation.
An algebra is characterized by a set of values and a set of functions over those values. In common algebra, we start with a set of numbers with the arithmetic operators. There are many numerical algebras, depending on whether we start with real numbers, complex numbers or something else. In Boolean algebra, we start with a set of just two values, true and false, operated on by the Boolean operators. The basic laws of this algebra define the behavior of these operators. Many of these laws look very much like the laws of numerical algebra, but a few are different:
zero  a + 1 = 1
a 0 = 0 
identity  a + 0 = a
a 1 = a 
commutative  a + b = b + a
a b = b a 
associative  (a + b) + c = b + (a + c)
(a b) c = b (a c) 
distributive  a (b + c) = a b + a c
a + b c = (a + b) (a + c) 
The first and last laws above do not hold for arithmetic. Numerical algebras have only one zero, but Boolean algebra has two values that can be described as zeros. Not only does anding anything with 0 give 0, but oring anything with 1 gives 1. In numerical algebra, multiplication distributes over addition, but not visa versa. In Boolean algebra, and distributes over or and or distributes over and. We can prove this using a truth table:
a  b  c  bc  a+bc  a+b  a+c  (a+b)(a+c) 
0  0  0  0  0  0  0  0 
0  0  1  0  0  0  1  0 
0  1  0  0  0  1  0  0 
0  1  1  1  1  1  1  1 
1  0  0  0  1  1  1  1 
1  0  1  0  1  1  1  1 
1  1  0  0  1  1  1  1 
1  1  1  1  1  1  1  1 
This is a proof by exhaustion. All combinations of a, b and c are shown. The column headed a+bc gives one side of the distributive law, while the column headed (a+b)(a+c) gives the other. The fact that the values in these columns are equal proves the law. The other columns show terms that were used to compute the key columns.
The exclusiveor operator is not covered by the basic laws because it is considered to be a derived operator. It is possible to prove from these laws that the following identities hold for it:
identity  a ⊕ 0 = a 
inverse  a ⊕ 1 = a 
commutative  a ⊕ b = b ⊕ a 
associative  (a ⊕ b) ⊕ c = a ⊕ (b ⊕ c) 
We can prove the above laws using algebra, starting with the algebraic definition of exclusiveor in terms of and, or and not. Students of algebra know that algebraic proofs can be hard. To prove the equivalence of two expressions, you have to find a path through a maze of algebraic substitutions. With only two values in the Boolean system, it is frequently easier to simply use a truth table.
There are two laws of Boolean algebra that have no precedent in numerical algebra, DeMorgan's laws. These complete the story of the symmetry of and and or that is apparent in the basic laws given above.
a + b  =  a b  
a b  =  a + b 
We can use DeMorgan's laws to create new symbols for some of the basic logic gates:
Remember, bubbles on the inputs and outputs of gate symbols mean not. Under DeMorgan's laws, an and gate with inverters on its inputs and outputs acts like an or gate, while an and gate with inverters on its inputs only acts like a nor gate. This allows us to redraw the schematic diagram already given for an exclusiveor gate, using three nand gates to replace the cascade of and and or gates in the original:
Exercises
c) Construct the truth table to prove that and distributes over or.
d) Construct the truth table to prove both versions of DeMorgan's laws.
e) Write out the algebraic formula for a ⊕ b using only the nand and not operators.
f) Here is a fragment of C code. Rewrite it using DeMorgan's Laws to replace the or operator with an and operator. Express the result as simply and compactly as possible:
if ((a < 0)  (a > 9)) error( "out of bounds" );
Each bit of a binary number can obviously be associated with a Boolean value, and George Boole's convention that 1 is used to represent true and 0 is used to represent false apply naturally when doing this. Thus, we can think of the 32bit binary number a as being composed of a vector of Boolean values a_{0} to a_{31}, where the subscripts incicate the power of two in the placevalue system.
Given numbers represented as arrays of bits, how can we build hardware to do arithmetic operations on them? To start with, how can we increment a number? It is useful to start with a review how to increment a number by hand. We add one to the least significant digit, and if the result is greater than the radix, we carry by adding one to the next digit. doing this may cause yet another carry. We have a local operation that we perform at each digit; this computes the output value for that digit and also the carry that is to be added to the next digit. In schematic terms, we have the following:
In the above diagram, the 4bit addend a is incremented by
c_{0} to
produce the 4bit sum s. The carry bits
c_{i} violate our
standard convention for the direction of dataflow, left to right and
top to bottom, because we have allowed our convention for writing
numbers to force the least significant bit to appear to the right.
The carry input to the entire incrementer is c_{0}. If this
is zero, the sum s will equal the addend a, while if
it is one, s will usually equal a+1. The intermediate carry
bits c_{1} to c_{3} hold the carry out of one
stage of the incrementer and into the next. The final carry bit,
c_{4} holds the carry out of the adder; if this is one,
the input a
must have been 1111 and the output s
is 0000, an unsigned overflow.
The logical operation performed at each bit position in the above incrementer,
within the box marked with a plus sign, is fairly simple. There are two
inputs, a_{i} and c_{i}, and two outputs,
s_{i} and c_{i+1}, so the truth table
for this function will have 4 rows and two output columns. Here it is:
inputs  outputs  
a_{i}  c_{i}  c_{i+1}  s_{i} 
0  0  0  0 
0  1  0  1 
1  0  0  1 
1  1  1  0 
The above truth table was filled in by summing the bits
in the input row and then recording this sum, as a binary
number, in the output row. The least significant bit of the output is the
sum output for that bit position, while the most significant bit of the
output is the carry into the next bit position. Having built the table,
it is easy to compare each output column with the outputs of the various
standard Boolean functions. On doing this, we see that
c_{i+1} is just the logical and of
a_{i} and c_{i}, and that
s_{i} is the exclusive or of
a_{i} and c_{i}. This leads to the
following gatelevel design for each digit of the incrememter:
c_{i+1}  =  a_{i} c_{i}  
s_{i}  =  a_{i} ⊕ c_{i} 
Exercises
g) Work out the truth table for decrementing a binary number, with inputs a_{i} (a bit of the input) and b_{i} (the borrow input to that bit) and outputs b_{i+1} and d_{i} (the decremented output). Follow through and give a gatelevel schematic for one bit of a decrementor. Note: Assume b_{i} is true when one is to be taken from a_{i} and false when no change is to be made.
h) From algebra, a1 is ((a)+1), and in the two's complement system, x is x+1. Combining these allows us to build a decrementor from an incrementer and some added inverters. Show the logic diagram for this.
We can attempt to design an adder using the same approach we used for an
incrementer. As with the incrementer, addition involves
propagating carries from one digit to the next, but instead of a
single input, we have two inputs, the addend,
which we will call a and the augend,
which we will call b. Addition is commutitive,
so we could call them both addends; the terms addend and
augend distinguish the number that is
augmented from the number added to it. At the top level, a 4bit
adder will look like the following:
The truth table for one bit of an adder is very similar in form to that for
one bit of the incrementer, but with 3 inputs instead of 2. As with the
truth table for the incrementer, we can read each twobit output row as a
binary number giving the number of one bits in the corresponding
inputs, but now, our sums run from zero to three instead of from zero to two.
Here is the table for the adder:

The first 4 rows of the above table output are the same as the table for the increment function, but the whole table is messy enough that it is hard to see how to get the outputs from the inputs.
Focusing only on the sum output s_{i}, this output is one only when an odd number of ones appear in the corresponding input row; furthermore, so long as any input is zero, this output is the exclusive or of the other two inputs. In fact, s_{i}=a_{i}⊕b_{i}⊕c_{i} We do not need to parenthesize here because exclusiveor is associative, so every order of evaluation gives the same result.
The carry output c_{i+1} is more complex, but it is not difficult to determine that c_{i+1}=a_{i}b_{i}+a_{i}c_{i}+b_{i}c_{i}.
Another adhoc reasoning path leads to the fact that an adder can be made from two incrementers plus an additional or to combine the carry outputs. Because of this, the increment circuit is sometimes called a half adder and the threeinput adder is distinguished from the incrementer by calling it a full adder.
Adhoc design can be slow and labor intensive unless you are lucky enough to have the right flash of inspiration. If we want to build interesting computers without waiting for inspiration, we need a systematic approach to designing logic circuits that implement arbitrary Boolean functions. Switching theory, a field that began to develop in the 1920's motivated by the development of the dial telephone system, is a field that combines this practical interest in engineering with the abstract mathematics of Boolean logic.
Switching theorists have come up with the following general methodology for converting any truth table to a logic circuit to compute the Boolean function described by that truth table:
 Arrange a column of and gates, one gate per row of the truth table but only for those rows where the output section of the truth table is not all zeros. Number these and gates using the rownumbers in the truth table, where the row number is obtained by reading the input values for that row as a binary number. The and gates have their inputs on the left and their outputs on the right.
 Arrange a row of or gates below and to the left of the and gates, using one or gate per output of the function. The inputs to these gates are on top, and their outputs are on the bottom. Label the outputs in the same order they are in the truth table.
 Arrange the inputs in a row above and to the left of the column of and gates, and for each input, add an inverter so that both the input and its inverse are available.
 For each of the and gates, wire its inputs according to the ones and zeros in the corresponding row of the input half of the truth table. The input half of the truth table is sometimes known as the and array because of this. If there is a one for some input in that row, that input should be connected directly as an input to that and gate. If there is a zero, the inverse of that input should be connected as an input to that gate. Each of these and gates will have an output of one only when the inputs are set to the binary code for that row.
 For each gate in the row of or gates, wire it according to the ones in the corresponding column of the output half of the truth table. The output half of the truth table is sometimes known as the or array because of this. For each one at the intersection of a row and column in the output half of the truth table, wire the output of the and gate for that row to the input of the or gate at the base of that column. This completes the circuit.
Except in trivial cases, this methodology produces results that are optimal in speed, assuming that the delay through any logic gate is constant, but it does nothing to minimize the number of logic gates. It is worth noting, however, that the designs that emerge from this methodoloty are excellent starting points for optimization exercises. The circuit below illustrates the use of the above methodology to implement a fulladder:
A common notational abbreviation is used on the left above. Instead of cluttering the diagram with inverters, bubbles on the inputs to the and gates show which inputs are inverted. You can read the binary representation of the row number from the inputs of each and gate, reading bubbles as zeros and uninverted inputs as ones, starting at the top. The and array is redrawn on the right without this abbreviation.
Once a circuit has been designed using this bruteforce method, we can optimize it by combining rows whenever those rows have identical outputs but inputs that differ in exactly one bit position. Where this is true, one and gate can be substituted for two, where the new gate has one less input than the old one. If we combine all combinable rows that differ in the first bit position, and then combine all combinable rows that differ in the second bit position and so on, this produces an optimal circuit. We will not emphasize optimization here.
Exercises
i) Show the truth table for a subtractor that computes d=ab, with borrow signals connecting the digit positions. The toplevel schematic view of this subtractor should be identical to the toplevel schematic view of the adder. Also, answer this auxiliary question: What function does it compute when the least significant borrow input is true?
j) Draw out the gatelevel schematic view of an adder given above, using explicit not gates instead of bubbles on the inputs to the and gates. Note: This can be done with exactly 3 inverters, since no input needs inversion more than once.
It is time to ask how the condition codes are derived from the outputs of an adder built as discussed above. The N condition code is the simplest. Because the Hawk is a two's complement machine, we can treat the most significant bit of the sum as the sign and copy it into the N condition code.
The Z condition code should be set if every bit in the sum is zero. Equivalently, Z should be reset if any bit in the sum is nonzero. The equivalence of these two follows from DeMorgans laws. As a result, we can use a multiinput nor gate to combine all of the bits of the sum to produce the Z condition code.
The C bit is simply the carry out of the most significant bit of the adder. V, the overflow bit, is more difficult. There are several equivalent ways to compute V. There is an overflow if the signs of the addends are the same but the sign of the result is different. This require two sign comparisons. Alternatively, there is an overflow if the carry into the sign position differs from the carry out of the sign bit. This takes just one exclusiveor gate:
The above schematic violates our standard drawing convention that data should flows from left to right and top to bottom. Here, we have the carry input c_{0} on the right and all 4 condition code outputs on the left. We do this because we always write numbers with the least significant bit on the right and all but one condition code reports on things that concern the most significant bit, while the carry in is an input to the least significant bit.
Exercises
k) The text mentions that there is an overflow when the signs of the operands are the same but the sign of the result is different. Express this as a truth table, assuming a 4bit adder, so the inputs are a_{3}, b_{3} and s_{3}. Reduce this truth table to logic gates using the systematic bruteforce method.
l) Draw the part of the schematic diagram for the derivation of the condition codes that pertains to the Z condition code, but using an and gate and, if necessary, some inverters.
We can use a simple adder for the arithmetic involved in indexed addressing, but what we really want at the heart of our central processing unit is a general purpose block of logic gates that can both add subtract, as well as performing other operations. This is called an arithmetic logic unit or ALU because, usually, the operations of Boolean logic are included among the operations this unit can perform.
How do we take our adder and extend it so that it can perform other operations? First, note that an adder can subtract if we negate one operand, and negation is easy in two's complement arithmetic. To negate a two's complement number, we invert each of the bits, taking the one's complement, and then add one to the result. Furthermore, the carry in to the adder can be used to add that extra one. We do our subtraction by adding the one's complement with a carry in of one.
Next, recall that a ⊕ 0 = a while a ⊕ 1 = a; this means that exclusive or can be used to select between a logical value and its inverse. This leads to the following design for an addersubtractor:
Here, if the sub input is zero, the b inputs are not changed
by the exclusiveor gates and c_{0}, the carry in
to the adder, is zero. As a result, the circuit computes the sum
s=a+b+0 in this first case.
On the other hand, if the sub input is one, the b inputs are inverted by the exclusiveor gates and c_{0} is one, so the adder computes s=a+b+1. Given that we are using two's complement representations, so that b+1=–b, we can conclude that the circuit computes the difference s=a–b in this second case.
We need more operations. Looking at the set of operations supported by the C programming language, for example, we find not just addition and subtraction, but a much longer list that includes and, or and exclusive or as well a shift operators and more advanced arithmetic operators such as multiply and divide. We will ignore multiply, divide and floating point for now, and concentrate on the simpler of these operators.
Modern hardware designers build highly optimal arithmetic logic units to perform these, but we are not interested in optimal design here. We simply need a simple circuit that allows us to select between functions depending on a control input. In this case, we need to select between addition and subtraction, on the one hand, and the and, or and exclusiveor logical operators on the other hand.
A logic circuit that has its output equal to exactly one of several data inputs depending on a control input is called a multiplexer, abbreviated MUX. In general, a multiplexer with n data inputs inputs must have log_{2}n control inputs. For our example arithmeticlogic unit, we need a multiplexer with 4 data inputs and 2 control inputs. The Hawk CPU, with 15 registers, must contain several 16input multiplexers to select among these, each with 4 control inputs. We will not give full truth tables for these larger multiplexers, the tables are too big, but the truth table for a multiplexer with 2 data inputs d_{0} and d_{1} and one control input c is as follows:
inputs  output  
c  d_{0}  d_{1}  f 
0  0  0  0 
0  0  1  0 
0  1  0  1 
0  1  1  1 
1  0  0  0 
1  0  1  1 
1  1  0  0 
1  1  1  1 
The above truth table reduces to a simple logic circuit involving only 4 gates:
f = (d_{0} c) + (d_{1} c) 
The schematic symbol shown at the right is
sometimes used for other functions that combine
multiple inputs into one output, but if there is no label indicating another
function, you can assume that it is a multiplexer.
Here is the gatelevel design for a 4input multiplexer:
f = (d_{0} c_{1} c_{0} ) + (d_{1} c_{1} c_{0} ) + (d_{2} c_{1} c_{0} ) + (d_{2} c_{1} c_{0} ) 
Here, we have used an abbreviated notation for
inverters. While there are 4 inversion bubbles in the figure, we would only
use 2 inverters to build this circuit, one per control input.
In the abbreviated notation to the right, the reader can infer
that the labels 0, 1, 2 and 3 inside the multiplexer symbol correspond to
00, 01, 10, and 11 on the control lines, where c_{1}
is the most significant bit and c_{0} is the least significant.
We now have the parts we need to build a simple arithmetic logic unit that combines an addersubtractor with three simple logic gates using a multiplexer. Here, we have drawin just one one bit of this unit, with data inputs a_{i}, b_{i} and c_{i}, data outputs s_{i} and c_{i+1}, an f_{2} input to control the addsubtract function of the adder, and two more control inputs f_{1} and f_{0} to control the multiplexer.
We call this a bit slice of the arithmetic logic unit. The full arithmetic logic unit for a machine like the Hawk would require a total of 32 such bit slices.
Highperformance arithmetic logic units are rarely constructed as shown here. There are two major categories of optimization that are typically pursued. First, the carry path through the arithmetic logic unit is almost always carefully optimized, so that carry propagation from the carry in to the least significant bit to the carry out from the most significant bit is far faster than is suggested by the designs given here.
The second common optimization is to make the adder itself compute the and, or and exclusiveor functions so that there is no need for a multiplexer for function selection. This is done by adding auxiliary inputs to the adder that control which terms are active. The net result is that the arithmetic logic unit is reduced to one exclusiveor gate per bit for selecting between addition and subtraction, plus a slightly augmented adder to perform all arithmetic and logic. A typical modified adder is shown on the next page.
Auxiliary input f_{1} has been added to control and gates 2 and 4 of the adder, and input f_{2} has been added to control and gate number 6. A single gate has been added to the and array in the modified adder, labeled x in the figure. This gate is enabled by the auxiliary input f_{3}. If f_{1} and f_{2} are both set true and f_{3} is set false, the modified adder behaves exactly like the original.
If f_{2} is false and there is no carry in to the least significant bit of the adder, all of the carry bits will be zero, and as a result, so long as f_{1} is true, the adder will compute the exclusive or of the a and b inputs. Setting f_{3} true converts exclusive or to or, and setting f_{1} false while holding f_{3} true converts or to and.
Exercises
m) In the addersubtrtactor, the carry output means carry, exactly, when the circuit is used for addition. When the circuit is used for subtraction, however, the situation is a bit muddy. What is the value of the carry output when the high bit of the subtractor outputs a borrow request and when the high bit of the subtractor does not output a borrow request?
n) Naive students frequently assume that the borrow signal should flow from more significant places in the subtractor to less significant places because when you borrow one, you take one from a more significant place and give it to a less significant place. Explain why this is wrong.
o) Redraw the 2input multiplexer given above with only nand gates (this is an application of DeMorgan's laws).
p) Another way to build a 4input multiplexer is with a binary tree of 2input multiplexers. Draw the schematic diagram for this tree, and then estimate the number of and, or and not gates required to build this tree and compare this with the 7 gates required to build the 4input multiplexer directly. Can you give corresponding figures for an 8input multiplexer built using this binary tree approach compared to an 8input multiplexer built directly?
q) The bitslice of an arithmetic logic unit given above has 3 control inputs, f_{2}, f_{1} and f_{0}. Therefore, it can compute 8 distinct fuctions of its a and b inputs. List them, and then go throught the Hawk instruction set and figure out which Hawk registertoregister instructions can be implemented with each. Ignore memoryreference instructions; indexing on Hawk uses a separate adder.
r) Show how to build an exclusiveor gate from a 2input multiplexer plus an inverter. Use the trapezoid symbol to expres your result.
s) Show how an adder can be built from two 8input multiplexers, one to compute the sum and one to compute the carry. Hint: All of the data inputs should be constants, either 0 or 1.
t) In the adder modified to perform logic, there are 3 control inputs, 4 control inputs, f_{1}, f_{2} f_{3} plus c_{0}, the carry into the least significant bit. Make a table of the 16 combinations, and for each table entry, identify whether that combination is potentially useful and if so, what function the modified adder computes.
In addition to arithmetic and logic operations, most computers and many programming languages support shift operators. In languages descended from C, such as C++ and Java, these are >> meaning shift right and << meaning shift left. The effects of these operators are illustrated below, starting with the value 8:
expression  value  

binary  decimal  
i  00001000  8 
i >> 1  00000100  4 
i >> 2  00000010  2 
i >> 3  00000001  1 
i << 1  00010000  16 
i << 2  00100000  32 
i << 3  01000000  64 
Shifting is included in the instruction set for two reasons: First, it allows moving bits around, for example, to extract successive bits from a word for transmission over a serial data line, or to pack multiple 4bit data items into a word. The second reason is apparent when you examine the decimal equivalent of the binary values in the example above. Shifting left c places multiplies by 2^{c} and shifting right c places divides by 2^{c}.
A hardware device that takes an nbit number as input and shifts that number c places is called a shifter. A shifter for an nbit word can be made from n multiplexers, where the number of inputs per multiplexer determines how many different values of the shiftcount c the shifter can handle. Here is a 4bit right shifter:
We have made a notational compromise above. In order to save space, we have drawn the control inputs as if they passed under the multiplexers. This is commonly done in schematic diagrams, but only when the input signals in question are distributed identically to all of the parts to which they connect. That is the case here.
The shifter shown above computes s = d >> c; that is, it shifts its 4bit input left from 0 to 3 places. This poses an interesting question: What value should be "shifted into" the output for bits that were not part of the input number? In the above, this value is provided by an auxiliary input, d_{4}. For leftshift operations, the usual answer is to shift zeros in from the left, so that the result is the result you would expect for multiplication by 2^{c}.
For rightshift, there are two useful answers. First, we could shift in zeros, as with leftshift; in this case, the result will be the correct result for division by a power of two, assuming that the input number was treated as an unsigned integer. This is called a logical right shift or an unsigned right shift. The other alternative is to shift in copies of the sign bit. For the 4bit shifter shown above, this would involve connecting the d_{4} input to d_{3}. In this case, if the input is a two's complement binary number, the output will be the correct two's complement result for division by a power of two, with one very important exception: If the quotient is not exact, that is, if there is a nonzero remainder after division, the result will be rounded to the next lower integer instead of being rounded toward zero. As a result, 5>>1 is 3, not 2. This is called an arithmetic right shift or a signed right shift.
Note that In C and C++, when you declare a variable to be of type int and then apply the >> operator to it, this implies a signed right shift, while if you declare the variable to be of type unsigned int and then apply the same operator, this implies an unsigned shift. In contrast, in Java, where the builtin integer type is signed and there is no way to declare unsigned integers, the there are two distinct rightshift operators, >> and >>> for the signed and unsigned shifts.
The Hawk machine has both left and right shift operators, and right shifts are available in both signed and unsigned form. One feature of the Hawk machine that separates it from many other machines is that the shift operators are combined with a variant of the add instruction, so that one instruction can both shift a number and add to it. The Hawk instructions for this are ADDSL, ADDSR and ADDSRU (the latter being the unsighed right shift). To accomodate these shifted versions of the add operation, there is a rightshifter on the output of the Hawk arithmetic logic unit and a leftshifter on one of the inputs of the Hawk arithmetic logic unit.
As with the discussion of arithmetic logic units, the design of shifters used in real computers is frequently more complex than that given here. The primary problem with the approach given here is that a 32bit shifter with a 4bit shiftcount would take 32 multiplexers with 16 inputs each! In designing computers, it is common to avoid using extravagant amounts of silicon for rarely used functions, and in the case of shifters, the solution is to use a shift tree, also called a barrel shifter, where one stage of the shifter shifts only a short distance, say 0 to 3 bits using a 2bit shift count, while the next stage shifts in multiples of 4, again using a 2bit shift count. As a result, we can achieve exactly the same shift function with 64 4input multiplexers instead of 32 16input multiplexers.
Exercises
u) Draw the diagram for a 4bit left shifter able to shift from zero to three places.
v) Draw the diagram for a 4bit right shifter composed of a cascade of two levels using 2bit multiplexers at each level, where the top level shifts either 0 places or 2 places, while the second level shifts either 0 places or 1 place. Make sure to label the inputs and outputs so that they are exactly compatable with the 4bit shifter illustrated above.
w) Explain how the Hawk EXTB instruction is implemented using eight 4input multiplexers. What bits of what register(s) are used to control these multiplexers? What bits of what register(s) are used as inputs to these multiplexers? What bits of what register(s) take the outputs of these multiplexers.
We have shown that addition and subtraction operators can be built in hardware using a number of logic gates proportional to the word size. One bit of an adder can be built using about 12 and, or and not gates. Adding an exclusiveor gate to one input of this adder allows it to be used both to add and to subtract, at a cost of 4 more primitive gates to build the exclusive or. Since we also need to be able to do logic operations, each bit position also needs a second exclusive or, an and and an or. Of course, we also need a multiplexer in each bit position to select between the different functions. This costs us 7 more gates per bit position. Adding these up gives us the cost of one bit slice of an arithmetic logic unit, and multiplying by the 32 bits per word gives us an estimate for the number of gates in the Hawk arithmeticlogic unit.
Modified Adder  13  
exclusive or to make addersubtractor  + 4  
Total gates per bit  17  
word size  × 32  
Estimated gate count for the Hawk ALU  544 
This estimate, 928 gates, supports only the simple logic operations, add and subtract. To obtain the left and right shift operations, we must add two shift networks, one for left shift and one for right shift. The Hawk shift instructions take shift counts between 1 and 16, so each shift network can be made of either 32 16input multiplexers or 64 4input multiplexers. We will assume the latter.
4input multiplexer  7  
word size  × 32  
gates per 32bit 4way shifter  224  
2 layers of shifters  × 2  
gates per 32bit 16way shifter  448  
need both left and right shifts  × 2  
Total gates for the Hawk shift network  896 
Adding the 896 gates for the two shift networks to the 544 gates in the arithmetic logic unit gives us an estimate for the total complexity of the Hawk's arithmetic and shifting hardware: 1,440 and, or and not gates. This estimate ignores the cost of optimizing for fast carry propagation.
Exercises
x) Estimate the cost, in gates, of the ALUshifter for a 16bit digital signal processor chip (a DSP), assuming it supports only the following shift operations: byteswap, left shift one place, right shift one place, or don't shift at all. These options apply to the output of the ALU.
y) The accounting above assumed the use of two layers of 4way multiplexers to construct a shift network that could shift from 0 to 15 places. How does the result differ if you assume 4 layers of 2way multiplexers to do this job.