Homework 6 -- Final Exam

22C:122, Spring 1996

Douglas W. Jones

DRAFT

Today, with the scale of processor integration that is now routine, it is easily possible to build a moderately large-scale multiprocessor or a superscalar pipelined uniprocessor on a single chip. We know that for typical utilization of parallelism in applications programs, adding new CPU's to a system seems to be subject to laws of diminishing returns, and that for typical instruction streams, widening the superscalar execution path by adding more pipes in parallel also seems to be subject to laws of diminishing returns.

The Big Question: Given the option to make a chip bigger than a Pentium, would we be better off to build a few small CPU's on one chip or one like the Pentium that applies even more parallelism to the execution of a single instruction stream.

Don't try to answer this question directly! Instead, answer the following:

1) What limits the performance of a superscalar processor within a single basic block (that is, a sequence of instructions entered at the start and containing no branches, conditional or otherwise).

2) Suppose a superscalar processor uses anticipatory execution along both the taken and not-taken paths following a branch instruction (up to but not beyond the next branch). How would you expect the effective throughput of this processor to vary as a function of the average length of a basic block?

3) What limits the performance of a shared-memory multiprocessor, given that there are sufficient processes available to keep all CPUs busy?

4) Consider two single-chip machines that support the same instruction set: The first uses a pipelined superscalar CPU with 4 parallel pipes and a large on-chip cache, while the alternative chip has 4 parallel simple pipelined CPU's, each with snooping caches and interconnected by an on-chip bus to the external memory.

Some parts of these two chips would be very similar -- each has 4 pipelines for instruction execution, and where one has one cache, the other has 4 caches of 1/4th the size. What major parts of these chips would be different?

5) What characteristics of the application program mix would you measure from a typical modern computer system in order to estimate whether you would be better off using a monolithic superscalar system or a monolithic multiprocessor system occupying the same total area of silicon?

6) Discuss at least two alternatives to the use of the snooping cache -- shared bus interconnection structure we have assumed in our discussion of a single chip multiprocessor. What would be the possible advantages and disadvantages of each, and in light of this discussion, how do these alternatives compare with the interconnection system we have assumed?

7) Discuss the problems that must be solved if a large scale multiprocessor is to be built out of many single-chip multiprocessors.