Next: Contents

Enabling Technologies for Peta(FL)OPS Computing

(PDF version also avaliable)

Thomas Sterling
Universities Space Research Association

Paul Messina
California Institute of Technology

Paul H. Smith
National Aeronautics and Space Administration HQ

Abstract

The Workshop on Enabling Technologies for Peta(FL)OPS Computing was held on February 22 through 24, 1994 at the DoubleTree Hotel in Pasadena, California. More than 60 experts in all aspects of high-performance computing technology met to establish the basis for considering future research initiatives that will lead to the development, production, and application of PetaFLOPS scaled computing systems. The objectives of the workshop were to: 1) Identify applications that require PetaFLOPS performance and determine their resource demands, 2) Determine the scope of the technical challenge to achieving effective PetaFLOPS computing, 3) Identify critical enabling technologies that lead to PetaFLOPS computing capability, 4) Establish key research issues, and 5) Recommend elements of a near-term research agenda.

The workshop focused on four major and inter-related topic areas: Applications and Algorithms, Device Technology, Architecture and Systems, and Software Technology. The workshop participants engaged in focused sessions of small groups and plenary sessions for cross-cutting discussions. The findings produced reflect the potential opportunities and the daunting challenges that confront designers and users of future PetaFLOPS computing systems. A PetaFLOPS computing system will be feasible in two decades and will be important, perhaps even critical, to key applications at that time. This prediction is based, in part, on the key assumption that there will be a continuation throughout the twenty-year period of the current semiconductor industry advances both in speed enhancement and in cost reduction through improved fabrication processes. While no paradigm shift is required in systems architecture, active latency management will be essential requiring a very high degree of fine-grain parallelism and the mechanisms to exploit it. A mix of technologies will be required including semi-conductor for main memory, optics for inter-processor (and perhaps inter-chip) communications and secondary storage, and possibly cryogenics (e.g., Josephson Junction) for very high clock rate and very low power processor logic. Effectiveness and applicability will rest on dramatic per device cost reduction and innovative approaches to system software and programming methodologies.

Near-term studies are required to refine these findings through more detailed examination of system requirements and technology extrapolation. This report documents the issues and findings of the 1994 Pasadena PetaFLOPS workshop and makes specific recommendations for near-term research initiatives.

Acknowledgments

The editors of this publication wish to thank all those who participated in the workshop for making it an historical event in the evolution of high-performance computing. In addition, the editors wish to acknowledge the important contributions made by several associates who were responsible for the excellent workshop arrangements and the high professional quality of this publication. Michael MacDonald provided technical editing, reviewing all aspects of this report and contributing substantively to a number of its sections. Terri Canzian provided exhaustive and detailed editing of the entire text and is responsible for the document's professional format and typesetting. Tina Pauna's painstaking editing weeded out countless awkward phrases and glitches. Tim Brice is credited for the success of the local arrangements and excellent logistical support throughout the workshop. Michele O'Connell provided important assistance to the workshop organizers prior to, during, and following the workshop and was responsible for coordination between the organizing committee, program committee, and local arrangements. Mary Goroff, Erla Solomon, and Chip Chapman assisted with registration, computers and copying equipment, and in handling the many details that arise in the course of a dynamic workshop.

Executive Summary

A PetaFLOPS is a measure of computer performance equal to a million billion operations (or floating point operations) per second. It is comparable to more than ten times all the networked computing capability in America and is ten thousand times faster than the world's most powerful massively parallel computer. A PetaFLOPS computer is so far beyond anything within contemporary experience that its architecture, technology, and programming methods may require entirely new paradigms in order to achieve effective use of computing systems at this scale. For the U.S. to retain leadership in high-performance computing development and application in the future, planning and even early research into PetaFLOPS system design and methodologies may be essential now. To start these processes a number of Federal agencies combined to sponsor the first major conference in this emerging area.

The Workshop on Enabling Technologies for Peta(FL)OPS Computing was hosted by the Jet Propulsion Laboratory in Pasadena, California from February 22 through 24, 1994 and included over 60 invited contributors from industry, academia, and government. They met to establish the basis for considering future research initiatives that will lead to U.S. preeminence in developing, producing, and applying PetaFLOPS-scale computing systems.

The broad goal of the Workshop on Enabling Technologies for Peta(FL)OPS Computing was to conduct and produce the first comprehensive assessment of the field of PetaFLOPS computing systems and to establish a baseline of understanding of its opportunities, challenges, and critical elements with the intent of setting near-term research directions to reduce uncertainty and enhance our knowledge of this field. The major objectives of the workshop were to

Identify Applications
of economic, scientific, and societal importance requiring PetaFLOPS scale computing.

Determine Challenge
in terms of technical barriers to achieving effective PetaFLOPS computing systems.

Reveal Enabling Technologies
that may be critical to the implementation of PetaFLOPS computers and determine their respective roles in contributing to this objective.

Derive Research Issues
that define the boundary between today's state-of-the-art understanding and the critical advanced concepts to tomorrow's PetaFLOPS computing systems.

Set Research Agenda
for initial near-term work focused on immediate questions contributing to the uncertainty of our understanding and imposing the greatest risk to launching a major long-term research initiative.

The workshop was sponsored jointly by the National Aeronautics and Space Agency, the Department of Energy, the National Science Foundation, the Advanced Research Projects Agency, the National Security Agency, and the Ballistic Missile Defense Organization. Invited participants were selected to ensure the highest quality and coverage of the driving technical areas as well as representation from all elements of the high-performance computing community. The direction and nature of the workshop were set by opening talks presented by Seymour Cray and Konstantin Likharev. The workshop was organized into four working groups reflecting the pace-setting disciplines that both enable and limit progress toward practical PetaFLOPS computing systems. These working groups were

The Applications Working Group considered the classes of applications and algorithms that were both important to national needs and capable of exploiting this scale of processing. Through these discussions, some understanding of the resource requirements for such applications was derived. The Device Technology Working Group explored the three most likely technologies to contribute to achieving PetaFLOPS performance: semiconductor, optics, and cryogenic superconducting. This group established projections of the capabilities for each technology family and distinguished them in terms of their strengths and weaknesses in supporting PetaFLOPS computing. The Architecture Working Group examined three alternative structures comprising processor, communication, and memory subunits enabled by future technologies and scaled to PetaFLOPS performance. They investigated the most likely organizations and mixes of functional elements at different levels of technology capability to reveal a spectrum of possible systems. The Software Technology Working Group took on the challenging task of delineating the principal obstacles imposed by current software environments to effective application of future PetaFLOPS computing systems. They also examined the implications of alternative environments and functionality that might substantively contribute to enhanced usefulness.

This first comprehensive review of the emerging field of PetaFLOPS computing systems produced a number of important findings that broadly define the challenge, opportunities, and approach to realizing this ambitions goal. These were as much derived from interactions among the working groups as coming from deliberations within any single group. The following reflect the major findings of the workshop combining key contributions from all four of the working groups:

  1. Construction of an effective PetaFLOPS computing system will be feasible in approximately 20 years, based on current technology trend projections.

  2. There are and will be a wide range of applications in science, engineering, economics, and societal information infrastructure and management that will demand PetaFLOPS capability in the near future.

  3. Cost, more than any other single aspect of a PetaFLOPS initiative, will dominate the ultimate viability and the time frame in which such systems will come into practical use.

  4. Reliability of PetaFLOPS computer systems will be manageable but only because cost considerations will preclude systems having a much greater number of components than current massively parallel processing systems.

  5. No fundamental paradigm shift in system architecture is required to achieve PetaFLOPS capable systems. Advanced variations on the NUMA MIMD (and possibly SIMD) architecture model should suffice, although specific details may vary significantly from today's implementations.

  6. It is likely that a PetaFLOPS computer will exhibit a wide diameter, i.e., the propagation delay across a system measured in system clock cycles. Latency management techniques and very high concurrency on the order of a million-fold will be key facets of systems of this scale.

  7. The PetaFLOPS computer will be dominated by its memory. But, at least for science and engineering applications, memory capacity will scale less than linearly with performance. A system capable of PetaFLOPS performance will require on the order of 30 terabytes of main memory.

  8. To achieve PetaFLOPS performance, such computers will comprise a mix of technology providing better performance to cost than possible by any single technology. Semiconductor technology will dominate memory with some logic, and progress toward this goal will be tied to advances in the semiconductor industry. Optics will provide high bandwidth, inter-module communication at all levels and mass storage but little or no logic. Superconducting Josephson Junction technology may yield very high-performance logic and exceptionally low power consumption.

  9. Major advances in software methodologies for programming and resource management will be necessary if such systems are to be practical for end-user applications.

During the course of deliberations among the workshop participants, many issues were brought to light, clarifying the space of opportunities and obstacles but leaving many questions unanswered. For example, assumptions about semiconductor technology in 20 years were derived from SIA projections to the year 2007 and required extrapolation beyond that point. The economics of specialty hardware was questioned, leaving unresolved the degree to which any future PetaFLOPS computer design must rely on commodity parts developed for more general commercial application. The nature of the user base for PetaFLOPS computers was highly contested. The possibilities included classical science/engineering problems, total immersion virtual reality human interfacing, and massive information management and retrieval. The difficulty of programming even today's massively parallel processing systems left open the possibility that significant resources would be committed to achieving ease-of-use at the cost of sustained performance. But how such systems would ultimately be programmed is uncertain. The narrow scope of architectures examined was still very broad with respect to the technology issues they posed. Although for each of the three architectures latency is seen as an issue driving system architecture decisions, the space of alternatives was too wide to permit a specific approach to be recommended over all others. And, beyond the approaches explicitly examined, there remains the possibility of completely untried architectures that might accelerate greatly the pace to PetaFLOPS computing. These and other issues, while revealed as important at this workshop, remained unresolved at its close.

Finally, the workshop concluded with key recommendations for near-term initiatives to reduce uncertainty and advance U.S. capability toward the achievement of PetaFLOPS computing. In the area of device technology, it was considered imperative that better projections for semiconductor evolution be developed, and that the true potential of superconducting technology be better understood. With regards to applications, specific examples identified as candidates for PetaFLOPS execution should be studied in depth to determine the balance of resources required at that scale in order to validate the appropriateness of the primary candidate architectures. Such a study should include at least one example of an application for which there is little current use but which is potentially important to the future. The architecture working group covered many facets of PetaFLOPS architecture and produced a meaningful overview of a tenable PetaFLOPS computer structure, but many details had to be left unspecified. It is recommended that a near-term study be initiated to fill in the gaps, determining the requirements of the constituent elements of such a future machine. These specifications are essential for validating the approach and determining requirements for all of the technologies used in its implementation.

In conclusion, the Workshop on Enabling Technologies for Peta(FL)OPS Computing was an historic meeting that brought together a remarkable set of experts in the field of high-performance computing and focused their talents on a question of great future importance to our Nation's strength in science and engineering, as well as its economic leadership in the world of the next century. Ideas, both conservative and controversial, were explored and the workshop resulted in an initial set of findings that will set the course toward the ultimate achievement of a PetaFLOPS computer. But, an important immediate consequence of this workshop beyond the greater understanding of PetaFLOPS computing systems achieved was the extraordinary synergism and cross fertilization of ideas that occurred among some of this Nation's major contributors to computer science.




Next: Contents


gcf@npac.syr.edu