# Final Study Questions

Note: These study questions are not exam questions. There is no correct answer. No more than half of the exam will be based on this material, so you must broaden your study for the exam beyond this material.

## Another Hypothetical Architecture

Imagine that some chip foundry owner came across the Ulitmate RISC paper http://homepage.cs.uiowa.edu/~dwjones/arch/risc/ and was so inspired that he decided to invest his fortune in building a 32-bit pipelined Ultimate RISC on a chip. He's nuts, of course, but consider:

This machine will have a 32-bit address, so each move instruction is 64 bits, composed of nothing but a source address and a destination address, and all operands are 32-bit words.

The Instruction Execution Unit on this machine is to be pipelined, so that, in each clock cycle, it stores the destination operand of the previous move instruction, fetches the source operand of the next move instruction, and fetches the source and destination addresses of the instruction after that.

Of course, it would be stupid to build a single-accumulator arithmetic unit on a machine such as this! The arithmetic unit will be addressed whenever addresses of the form FFFFFxxx16 are encountered, where the final 12 bits specify what register contains the operand and how it is to be used. In binary, the the address is broken down as follows:

``` _______________________________________ _______ ___ _______ ___
|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
|1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0|x x x x|z z|r r r r|b b|
|                  ALU                  |   X   | Z |   R   | B |
```
r r r r -- the register
used to select one of 16 32-bit registers.

z z -- the operand size
operands may be byte, halfword or word sized:
00 - ???
01 - byte
10 - halfword
11 - word

On load from a register, the operand is extended at the indicated length. On store to a register, only the indicated number of bits of the register are changed.

b b -- the operand rotation
operands may be rotated in increments of 8 bits on load or store:
00 - no rotation
01 - rotate 8 bits
10 - rotate 16 bits
11 - rotate 24 bits

Rotation is to the left on store, to the right on load. The net result is that the 16 registers may be addressed as 32 halfwords or 64 bytes in additional to allowing the normal 32-bit interpretation.

x x x x -- the operation
on loading from a register, the operation specifies a unary transformation of the value of that register:
0 - - 0 fetch unsigned value (zero-padded to 32 bits)
0 - - 1 fetch signed value (sign-extended to 32 bits)
0 - 0 - fetch value without complementing it
0 - 1 - one's complement value before extending to 32 bits
0 0 - - fetch value without incrementing it
0 1 - - increment value after extension to 32 bits
1 s s s circular-shift right sss places before zero padding

on storing to a register, the operation specifies a binary operator on the destination register:

0 s s s R = operand << sss (circular left shift)
1 0 0 0 R = R + operand (add)
1 0 0 1 R = -R + operand (reverse subtract)
1 0 1 0 R = R and operand
1 0 1 1 R = R or operand
1 1 0 0 R = R xor operand
1 1 0 1 R = (not R) xor operand
1 1 1 0 R = ?
1 1 1 1 R = ?

Of course, the machine also needs the ability to perform a conditional branch. We base conditional branches on skip instructions, or more specifically, we set up a range of memory addresses where a store to one of these addresses will cause the next instruction in sequence to be skipped if the designated condition is true of the value just stored. The addresses for this take the following form, useful only as a destination address:

``` _________________________________________________________ _____
|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
|1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0|c c c|
|                            ALU                          |COND |
```

The skip conditions are given by the 3 least significant bits of the address:

0 0 0 skip if zero
0 0 1 skip if less than zero
0 1 0 skip if greater than zero
0 1 1 skip if even
1 0 0 skip if nonzero
1 0 1 skip if greater than or equal to zero
1 1 0 skip if less than or equal to zero
1 1 1 skip if odd

For one, can you recognize how to program it? You've heard this before: The first step in evaluating a computer architecture is to evaluate whether it is, in fact, a general purpose machine. For a Von Neumann architecture, this means verifying that its memory addressing is sufficiently general to allow all possible data and control structures to be exploited.

How do you manipulate data structures? It would be nice to be able to use something like indexed addressing, but our millionaire patron seems to have disregarded this in the above proposal. How could you add anything like indexing to this kind of architecture? Hint: Who needs the full generality of 32 bit addresses. Consider using the high 4 bits of the address to select the index register. In that case, the indexed addressing mechanism resembles a memory management unit, adding 28-bit displacements to the contents of one of 16 base registers.

A second broad class of questions revolves around implementation of this instruction set.

Work out the data paths in the ALU/register system, particularly, the alignment network and other logic involved in getting a source operand and the alignment network and ALU data paths involved in storing a destination operand to this unit.

Thinking about the ALU, in what ways does this machine resemble a VLIW machine, and in what ways is it definitely not one.

Work out the pipeline structure of this machine, with a focus on interstage registers and data paths in the instruction execution unit. Then think about the memory interface of this system: How many ports are there to the memory arbitration unit? What are their relative priorities. Are all ports the same width, or are there wide ports and narrow ports, and if so, what width. Does your answer change if you require all instructions to begin with an even-numbered 32-bit word?

How do you add indexing to the pipelined structure? How does this involve the ALU with the internal details of the CPU, assuming that we go for general purpose addressing?

What problems does this machine pose if you want to add pipeline interlocks and block instructions that depend on others that are not yet complete? Also, are there any barriers to result forwarding on this machine?

What alternative models for interrupt processing might work with this machine.