Next: PetaFLOPS Architectures Design Up: Architecture Working Group Previous: Architecture Working Group

Metrics and Limitations

In this section, some basic notions that dictate various characteristics of architectures are examined. The section opens with a list of key metrics and some brief explanations, and closes with a statement of three important laws.

The key metrics used to characterize high-performance computers are

Basic Laws

  1. Concurrency=latency times bandwidth.

    Assume that a processor is fed operands at the bandwidth of operations per cycle, each of which requires an elapsed time of cycles. That is, each cycle the processor accepts sets of input operands and produces outputs, and the time elapsed between a specific input and its corresponding output is cycles. Then the internal concurrency of the processor is . In other words, at any given cycle, there are distinct computations proceeding concurrently within the processor. This formula specifies how much concurrency must exist within a processor for a specified bandwidth and latency.

    The formula also can be applied to the latency and bandwidth of a memory system. To sustain operations at maximum rate from a memory system whose bandwidth is and whose latency is requires concurrent access streams from the memory system to the processors, and thus the memory must be able to support concurrent operations internally.

    Corollary: If operation latency is on the order of 1 ns ( s), the concurrency required to achieve a processing bandwidth of PetaFLOPS () must be on the order . To reduce concurrency below 10,000, the operation latency must be less than 10 ps ( s), and to reduce concurrency below 10, the operation latency must be less than 10 femtoseconds ( s).

  2. In a machine limited by the speed of light, i.e., when the machine diameter is much greater than one cycle, if bisection bandwidth is nearly the maximum possible, the average dependence latency grows at least as fast as where is the number of processors. Physical constraints on packaging and layout may raise this bound to

    In a machine not limited by the speed of light, that is, in a machine whose diameter is a small number of cycles close to unity, or in a machine with a very small bisection bandwidth, the dependence latency can grow at a rate proportional to

  3. In a system that makes use of multiprogramming of independent processes on a single processor to improve performance, memory requirements grow linearly with the degree of multiprogramming, and thus scale proportionally with performance. Parallel applications can save memory relative to multiprogramming because memory is shared. Also, as bandwidth increases in shared-memory systems, memory requirements diminish because sharing of data can replace making copies of data.



Next: PetaFLOPS Architectures Design Up: Architecture Working Group Previous: Architecture Working Group


gcf@npac.syr.edu