Assignment 12, due May 2

Part of the homework for 22C:122/55:132, Spring 2003
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Always, on every assignment, please write your name legibly as it appears on your University ID and on the class list!

  1. Background: In building a large-scale multiprocessor, crossbar switches have a cost that grows as O(n2) for a system with n CPUs and n memory modules. An alternative is to use a banyan tree, also known as a butterfly network. In the most popular form, this uses 4x4 crossbar switches as components, but for the sake of simplicity, we'll consider a system using 2x2 crossbars as components. In this case, we can build a 4-processor system (at the processor-memory-switch level) as follows:
             |    |    |    |
            ========  ======== crossbars
             |    |    |    |
             |     \  /     |
             |      \/      |
             |      /\      |
             |     /  \     |
             |    |    |    |
            ========  ======== crossbars
             |    |    |    |

    Here, each crossbar switch has been drawn squashed with CPU-side connections on the top and memory connections on the bottom. We have a 4-way interleaved memory in this system! The path from any one CPU to any particular memory module is a binary tree, using the crossbar switch as a branching node in this tree.

    Part A: Draw the corresponding diagram for an 8 by 8 system. In doing this, you should begin to see why it's called a butterfly switch.

    Part B: In general, for n CPUs and n memory modules, how does the cost grow as a function of n. For crossbar switches, it was O(n2). Also, how does the switching delay between CPU and memory grow as a function of n. For a crossbar switch, this was O(1) (a constant independent of n).

  2. What is wrong with the following picture?
                __________              |  
            IF |__________|------o------x-- read only memory port
                _|>_____|_    ___|___   |
            AC |__________|  |I cache|  |
                _|>_____|_   |_______|  |
            OF |__________|------o------x-- read only memory port
                _|>_____|_    ___|___   |
            OP |__________|  |O cache|  |
                _|>_____|_   |_______|  |
            OS |__________|-------------x-- read-write memory port
  3. Background: A write-back cache has a complex controller that runs a cache replacement policy. Consider the following simple alternative. We put a one-word write cache between the CPU and memory. The write cache holds the most recent word written to memory and the memory address. This allows the CPU to clock the data into the cache very quickly and then go on with other computations while the slow memory eventually completes the write cycle. If I combine this idea with a conventional write-through cache, read and write cycles will both appear very fast.

    Problem: Under a conventional read-only cache, the only thing that caused a memory cycle to be delayed was a cache miss. Under what circumstances would a write using our modified cache cause additional delays? (There are two specific circumstances!)

  4. Background: Consider the following two options: The bus request signal is blocked by the cache until the cache has determined whether this memory cycle involves a hit or a miss. Only then does the cache allow the bus request to be passed on to the memory arbitration system. Alternatively, the cache does not block the request, but rather, in satisfying the request early, the cache causes the CPU to withdraw the request on the next clock cycle.

    Problem: Each approach has advantages and disadvantages. Consider both in the context of the I-caches of two CPU's that share access to memory through a crossbar switch, and explain these advantages and disadvantages. Your job here is most definitely not to say which is better -- that depends on experiments you can only design after you outline the potential positive and negative impacts of each design.