Next: Metrics and Limitations
Up: No Title
Previous: Device Technology Summary
Architecture and Systems
Participants:

Summary
The PetaFLOPS computer is achievable at reasonable cost with
technology available in about 20 years. No paradigm shift is required
to make this computer: it can be made using the paradigms that exist
today. This projection is based on a number of assumptions, and it
brings with it a number of challenges and directions for future
activity. The key underlying issues are
- Silicon technology can satisfy the majority of the requirements
if it continues at the same rate improvement over the next 20 years.
However, the Semiconductor Industry of America (SIA) technology road
map projects forward only through the year 2007 because at that point
feature sizes are projected to be reduced to 0.1
m; below this
size, tunneling effects alter the behavior of active devices. To
sustain an additional seven years of improvement in device technology
will require advances currently not projected by the SIA. However, as
discussed in the section on semiconductor device options, it appears
that since 1992, enough has been learned to allow a rational projection
beyond 2007 to at least 0.05
m. This appears sufficient for the
PetaFLOPS machines discussed here. Consequently, device manufacturing
technology and semiconductor science are important areas of investment
to sustain technological advances when feature sizes fall below 0.1
m.
- The PetaFLOPS machine will rely heavily on technology developed
for the larger market of machines that are much less powerful than the
PetaFLOPS machine. This is a consequence of the very large parts count
for the PetaFLOPS machine, even for the projected technology of 20
years hence. To keep the price per part as small as possible, the
parts must be produced in volume for a mass market. Technology and
parts developed exclusively for the PetaFLOPS market may be very
expensive relative to those for the general market, and the leverage
they provide must be very high to justify the premium paid for them.
- The panel did not reach consensus on whether or not to recommend
investment in manufacturing technology for the niche markets that cover
PetaFLOPS technology. The panel, however, encourages continued
research in these areas to seek advances that can provide very high
leverage on performance. Niche technology might turn out to be useful,
even if expensive, and may be attractive to mainstream computing also.
In the latter case, high volume production of such technology for
mainstream computers could reduce costs significantly for the use of
such technology in PetaFLOPS computers.
- Memory latency and memory bandwidth are the most critical factors
that constrain performance and narrow the choices of computer
structures. Latency across the longest paths in a petacomputer, when
measured in machine cycles, will grow in the coming years rather than
decrease. Hence, machine structures will tend to incorporate various
techniques that remove or hide latency. Local memory and cache memory
tend to remove or reduce latency. Pipelining and multithreading tend
to hide latency without reducing it. The latency problem will spawn
highly perfected forms of the techniques mentioned here as well as new
techniques better fitted to future applications and device
technologies.
- The bandwidth per memory part, if it evolves at its present rate,
in 20 years will be somewhere between 10 to 1000 times too low to
support PetaFLOPS computing. However, the internal bandwidth of memory
chips, that is, the bandwidth between the on-chip memory array and a
multiplexor to the output pins, is much larger than the bandwidth
available at the pins. Therefore, existing internal bandwidth may be
within the limits required. Future directions for memory technology
will seek ways to make high bandwidth available externally and to
develop architectures that make effective use of the internal memory
bandwidth by placing computational logic within the memory.
- Memory requirements for a PetaFLOPS machine are based on a basic
assumption that a balanced system requires memory bandwidth of
bytes per cycle per FLOPS for a small fixed constant
. This
assumption forces total memory bandwidth to scale linearly with the
GigaFLOPS performance of a machine. Independently, and for other
reasons, a second basic assumption is that memory size in bytes scales
linearly with problem size. To the extent that these assumptions are
valid, they place extraordinary demands on future memory technology.
Consequently, if algorithm developers for PetaFLOPS applications
successfully develop means to conserve the use of memory per GigaFLOPS,
the demands on memory bandwidth and memory size for PetaFLOPS machines
may be decreased significantly. This could lead to earlier deployment
and lower cost than our estimates indicate.
- The panel speculates that I/O requirements grow less than
proportionally with increases in GigaFLOPS of performance. If so, the
I/O requirements for a PetaFLOPS machine may be significantly less than
predicted by simple scaling formulas.
Next: Metrics and Limitations
Up: No Title
Previous: Device Technology Summary