Next: Metrics and Limitations Up: No Title Previous: Device Technology Summary

Architecture Working Group

Architecture and Systems

Participants:

Summary

The PetaFLOPS computer is achievable at reasonable cost with technology available in about 20 years. No paradigm shift is required to make this computer: it can be made using the paradigms that exist today. This projection is based on a number of assumptions, and it brings with it a number of challenges and directions for future activity. The key underlying issues are

  1. Silicon technology can satisfy the majority of the requirements if it continues at the same rate improvement over the next 20 years. However, the Semiconductor Industry of America (SIA) technology road map projects forward only through the year 2007 because at that point feature sizes are projected to be reduced to 0.1 m; below this size, tunneling effects alter the behavior of active devices. To sustain an additional seven years of improvement in device technology will require advances currently not projected by the SIA. However, as discussed in the section on semiconductor device options, it appears that since 1992, enough has been learned to allow a rational projection beyond 2007 to at least 0.05 m. This appears sufficient for the PetaFLOPS machines discussed here. Consequently, device manufacturing technology and semiconductor science are important areas of investment to sustain technological advances when feature sizes fall below 0.1 m.

  2. The PetaFLOPS machine will rely heavily on technology developed for the larger market of machines that are much less powerful than the PetaFLOPS machine. This is a consequence of the very large parts count for the PetaFLOPS machine, even for the projected technology of 20 years hence. To keep the price per part as small as possible, the parts must be produced in volume for a mass market. Technology and parts developed exclusively for the PetaFLOPS market may be very expensive relative to those for the general market, and the leverage they provide must be very high to justify the premium paid for them.

  3. The panel did not reach consensus on whether or not to recommend investment in manufacturing technology for the niche markets that cover PetaFLOPS technology. The panel, however, encourages continued research in these areas to seek advances that can provide very high leverage on performance. Niche technology might turn out to be useful, even if expensive, and may be attractive to mainstream computing also. In the latter case, high volume production of such technology for mainstream computers could reduce costs significantly for the use of such technology in PetaFLOPS computers.

  4. Memory latency and memory bandwidth are the most critical factors that constrain performance and narrow the choices of computer structures. Latency across the longest paths in a petacomputer, when measured in machine cycles, will grow in the coming years rather than decrease. Hence, machine structures will tend to incorporate various techniques that remove or hide latency. Local memory and cache memory tend to remove or reduce latency. Pipelining and multithreading tend to hide latency without reducing it. The latency problem will spawn highly perfected forms of the techniques mentioned here as well as new techniques better fitted to future applications and device technologies.

  5. The bandwidth per memory part, if it evolves at its present rate, in 20 years will be somewhere between 10 to 1000 times too low to support PetaFLOPS computing. However, the internal bandwidth of memory chips, that is, the bandwidth between the on-chip memory array and a multiplexor to the output pins, is much larger than the bandwidth available at the pins. Therefore, existing internal bandwidth may be within the limits required. Future directions for memory technology will seek ways to make high bandwidth available externally and to develop architectures that make effective use of the internal memory bandwidth by placing computational logic within the memory.

  6. Memory requirements for a PetaFLOPS machine are based on a basic assumption that a balanced system requires memory bandwidth of bytes per cycle per FLOPS for a small fixed constant . This assumption forces total memory bandwidth to scale linearly with the GigaFLOPS performance of a machine. Independently, and for other reasons, a second basic assumption is that memory size in bytes scales linearly with problem size. To the extent that these assumptions are valid, they place extraordinary demands on future memory technology. Consequently, if algorithm developers for PetaFLOPS applications successfully develop means to conserve the use of memory per GigaFLOPS, the demands on memory bandwidth and memory size for PetaFLOPS machines may be decreased significantly. This could lead to earlier deployment and lower cost than our estimates indicate.

  7. The panel speculates that I/O requirements grow less than proportionally with increases in GigaFLOPS of performance. If so, the I/O requirements for a PetaFLOPS machine may be significantly less than predicted by simple scaling formulas.




Next: Metrics and Limitations Up: No Title Previous: Device Technology Summary


gcf@npac.syr.edu