14. Inside a Modern CPU
Part of
22C:60, Computer Organization Notes
|
Up to this point, we have described the central processing unit as executing a sequential fetch-execute cycle, in which each instruction is executed completely before the next instruction is fetched. Starting in the mid 1960s, new approaches to executing instructions were developed that allowed for much higher performance by overlapping the fetch of one instruction with the execution of its predecessors.
While there were several approaches to doing this in the computers of the late 1960s, one approach came to dominate all others in the 1970s. This is called pipelined execution. Different pipelined machines have had different numbers of pipeline stages. Short pipelines have been built, with only two stages, while others have been built with five or more stages. The four stage model illustrated below is typical:
IF | Instruction Fetch |
OF/AC | Operand Fetch / Address Computation |
ALU/MA | Arithmetic Logic Unit / Memory Access |
RS | Result Store |
The basic idea of a pipelined processor is that each instruction is processed, in turn, by each of the pipeline stages. In the 4-stage pipeline illustrated above, the instruction fetch stage begins the exeuction of each instruction by fetching it, and then passing it off to the operand-fetch, address-computation stage, which gathers operands from registers and computes the effective address. After this is done, the arithmetic-logic-unit, memory-acces stage does whatever arithmetic is required for register-to-register instructions, or goes to memory for memory reference instructions, and then passes the value to be stored in the destination register to the result-store stage.
As a result, for the processor illustrated, during each execution cycle, there are four instructions in various stages of execution, one being executed by each stage. It takes four execution cycles to complete each instruction, but one instruction is completed during each cycle. The following figure, called a pipeline diagram, illustrates the execution of a short sequence of instructions on a pipelined processor:
LOADS R3,R4 | IF | OF/AC | ALU/MA | RS | |||
ADDSI R4,1 | IF | OF/AC | ALU/MA | RS | |||
STORES R3,R4 | IF | OF/AC | ALU/MA | RS | |||
time |   1 |   2 |   3 |   4 |   5 |   6 |
---|
This diagram shows the instruction LOADS R3,R4 being fetched at time 1. At time 2, the contents of R4 are taken from the registers as the effective address of this instruction, while at time 3, this memory address is used to load one word from memory. Finally, at time 4, this value is stored in R3, completing the execution of this instruction.
Similarly, the instruction ADDSI R4,R1 is fetched at time 2. At time 3, the contents of R4 are taken from the registers as the operand of this instruction, while at time 4, this operand is incremented. Finally, at time 5, the incremented value is stored back in R4, completing the execution of the second instruction.
a) With reference to the pipeline diagram given above, during what time step does the STORES instruction compute its effective address? Given this, what anomolous behavior would you expect with regard to the effects of the immediately preceeding ADDSI instruction?
The Hawk architecture was designed to be pipelined using the 4-stage model illustrated above. In order to understand how this is done, we must examine how the different pipeline stages communicate. The output of each pipeline stage is stored in an interstage register that serves as input to the next pipeline stage. So, for example, the instruction register is an interstage register loaded by the instruction fetch pipeline stage and used as an input to the operand fetch and address computation stage, and the effective address register is an output of the address computation stage and an input into the memory access stage. The first step in designing a pipelined processor after laying out the basic pipeline stages is to figure out what all of the necessary interstage registers are. For the Hawk, we can determine the following.
Given these registers, we can now begin to work out, in algorithmic form, the behavior of each pipeline stage. Consider, for example, the instruction fetch stage. If we ignore the problem of implementing branch instrucitons and if we ignore all long instructions, so we assume all Hawk instructions are 16 bits, this stage becomes very simple:
for (;;) { /* iterate once per clock cycle */ if_ir = * (halfword_pointer) if_pc; if_pc = if_pc + 2; } |
The effective address computation stage is a bit more complex, even if we ignore the long instructions of the Hawk, since the operation is performs depends on the contents of the instruction register:
for (;;) { /* iterate once per clock cycle */ of_ir = if_ir; if (is_memory_reference_instruction( if_ir )) { register = extract_rx_field( if_ir ); if (register == 0) { of_ea = if_pc; } else { of_ea = register[ register ]; } } if (needs_value_of_rd( if_ir )) { register = extract_rd_field( if_ir ); if (register == 0) { of_op1 = 0; } else { of_op1 = register[ register ]; } ... similar code for other operand registers ... } } |
More to be added