Homework 12

22C:122/55:132, Spring 2001

Due Friday Apr 27, 2001, in class

Douglas W. Jones

Background: Assume you have a 2x2 crossbar switch as a standard component, where the memory bus connections to this switch are matched to your CPU and main memory modules.
Consider using these switch modules to build a system with 4 CPUs and 4 memory modules. There are two ways to do this:
```
                            -/\-       -/\-
        |_|   |_|          -/  \-     -/  \-
CPU ---|   |-|   |-   CPU --\  /-------\  /--- M
CPU ---|___|-|___|-   CPU ---\/--     --\/---- M
        |_|   |_|                 \ /
CPU ---|   |-|   |-         -/\-   X   -/\-
CPU ---|___|-|___|-        -/  \- / \ -/  \-
        | |   | |     CPU --\  /-     -\  /--- M
        M M   M M     CPU ---\/---------\/---- M
```
Problem A: What is the difference in switching delay between CPU and M for these two approaches?
Problem B: How do these two approaches scale? Address both switching delay and number of components in your answer, as a function of N, the number of CPUs (equal to the number of Memory modules).
Problem C: Both systems require that there be some specialization in each crossbar swtich to customize it for its setting in the interconnection system. Clearly identify the aspects of this specialization that differ in the two interconnection schemes!
A Problem: The logic diagram in section 2 of Lecture 36 for a simple set associative associative memory includes no provisions to determine whether an entry in the memory is valid or invalid. The final paragraph of the notes for lecture 35 assert that we must have some way to invalidate all entries in a cache. Describe how this can be done (it is up to you to determine how best to describe this!)
A Problem: Given a system with 4 CPUs, 4 memories and a crossbar switch, there are two natural places to add cache memories to the system in order to improve its performance. These are indicated in the following figure:
```
      ?   |_|   |_|  
CPU --c--|   |-|   |-
CPU --a--|___|-|___|-
      c   |_|   |_|  
CPU --h--|   |-|   |-
CPU --e--|___|-|___|-
      ?   | |   | |
         ???cache???
          | |   | |  
          M M   M M
```
In either case, we add 4 caches; in one case, one per CPU, and in the other case, one per memory. Contrast these! What problems does each pose for the cache designer, and what are the potential benefits of each approach.