Homework 11

22C:122, Fall 1999

Due Monday Dec 6, 1999, in class

Douglas W. Jones

Background: Consider the following memory interface:
```
     CPU           cache
   or cache       or RAM
                  
   =====> address  =====>
   <====> data     <====>
   -----> need-mem ----->
   -----> write    ----->
   <----  ready    <-----
   -----> clock    ----->
```
Assume a fully synchronous design, where all register transfers are finalized on the falling edge of the clock signal. During any clock pulse, if need-mem is high, the cache or memory may respond. During any clock pulse, if write and need-mem are high, the CPU is doing a write operation. The server (cache or RAM) may force the client (CPU or cache) to wait by failing to raise ready by the time the clock pulse ends. If made to wait, the client should repeat exactly the same request until it is satisfied.
Assume that the ultimate client is a CPU that, under normal circumstances, would issue a memory read or write request on just about every clock cycle.
Assume that you have an associative memory with the following characteristics:
```
            |       On clear, all stored values
        ____|____   are invalidated.
       |  value  |  
       |         |  Given an address, if there
   --->| address |  is an associated value, the
       |         |  hit output will be true and
   ----|>strobe  |  the value will be output.
       |         |  
   --->| clear   |  On strobe, the input value
       |         |  will be associated with the
       |hit_value|  given address and some other.
         |    |     address/value association may
         |    |     be forgotten.
```
The Problem: Give a state machine specification (using state diagrams) for the control unit of a write-through cache that uses this interface to the outside world! (indeed, all of the inputs and outputs of the state machine are defined above, if very briefly!)
Background: A simple branch lookahead unit associates only one next address with each address. If there is a miss, the fetch stage of the pipeline fetches the next address in sequence. If there is a hit, the fetch stage fetches the address given by the associative memory. Whenever there is a branch, the address of the branch instruction and the address of its successor are stored in the lookahead unit.
If, during execution, a later pipeline stage finds that the next instruction in sequence was not the correct instruction, it cancels the next instruction and all instructions following it and requests a fetch from the correct location.
This basic branch lookahead unit stores address-successor pairs, but there may be more useful information. Specifically, consider function calls. The successful prediction of a return address depends on the likelyhood that most calls will come from the same place. If many calls come from other places, successful prediction requires that the associative memory use more context.
Part A: Consider a CPU that, each time a branch is executed, stores the address of that branch instruction in a previous-branch-address register. Propose a way to use this previous branch address to improve predictions.
Part B: Would it perhaps be better to distinguish between instruction types, so that, instead of previous branch instruction, we save the address of the previous call, the previous unconditional jump, and so on. In answering this question, explain your reasons one way or the other.
Part C: Why might it be a mistake to incorporate too much history into the branch prediction cache?