In the areas of device technology and computer architecture it is possible to make reasonable predictions based on known laws of physics and industry trends. From these predictions we can say with fair certainty that a PetaFLOPS machine can be built within the next 20 years. Unfortunately, there is no technology road map for software innovation. However, some disturbing trends are clear. First, the software for the current generation of 100 GigaFLOPS machines is not adequate to be scaled to a TeraFLOPS and it will certainly fail on a PetaFLOPS system. While there are bright spots, such the proposed High Performance Fortran (HPF), the rest of the supercomputing software environment is seriously underfunded and underdeveloped. More specifically, multi-user operating systems, truly scalable I/O, and portable programming and debugging tools are, with a few notable exceptions, missing from scalable parallel systems being built today.
Part of the problem rests with our vision of the way a PetaFLOPS machine is used. Traditionally we have built supercomputers that had little software because there was a large market in the national laboratories for machines with very high performance and very specialized use. These machines were more like large laboratory instruments than general-purpose computers. This trend has accelerated as large-scale parallel systems have proved to be the only way to achieve the speed levels envisioned for future applications.
Another important factor has emerged. Parallel machines suffer from greater performance instability than sequential systems. Consequently, if a machine is not designed with the problems of software in mind, only a few highly tuned, data-parallel programs will achieve high efficiency. This is due to a variety of factors including the inability of many systems to help software hide memory latency or to exploit the nested and dynamic forms of parallelism that some applications require. In addition, the computer architects at this workshop have warned that even in a best-case scenario, memory-system considerations will require a PetaFLOPS machine to have a million operations in parallel execution at all times in order to achieve the system's full speed. Realistic memory-design assumptions will very likely push this parallelism requirement several orders of magnitude higher. Thus, a PetaFLOPS system, unless properly designed, will be even more difficult to program than current systems.
The impact of this situation on our plans to build a PetaFLOPS system will be enormous. An overemphasis on specialized designs without a software model that rewards development investment will have serious consequences. More specifically, independent software vendors (ISVs) can not afford to port their products to large-scale parallel systems if a heroic effort is required for each new machine. Unless we can find new ways to design software that is portable across a wide variety of computer architectures, many of our current supercomputer vendors will not survive long enough to build a PetaFLOPS system.
While this may seem a bleak scenario, we must remember that the computing revolution of the 1980s was about software as much as silicon. The microcomputer industry discovered that by focusing on only two architectures, it was possible to build a vast portable infrastructure of software. The result was a creative revolution. Unix and X windows has had a similar unifying impact on the workstation and high-end computing market. The National Information Infrastructure is about to create a new wave of change for the entire computing marketplace, including the HPC world, that may dwarf what we have seen so far. The driving force at the heart of this coming software revolution is a new view of computing that is based on a model of application behavior in which applications are, in reality, networks of concurrently interacting objects that are distributed across the data highway. In this model, computing resources extend from the desktop to the TeraFLOPS as a seamless environment.
The applications section of this report outlines two types of computations envisioned for PetaFLOPS. One family of applications is the class of large-scale scientific and Grand Challenge applications that we are currently investigating. However, over the next 20 years these Grand Challenge applications will evolve to resemble multidisciplinary scientific virtual laboratories demanding a complex mix of simulation, human interaction, and multimedia scaled far beyond what is possible today. The other class of applications can be characterized as the compute-intensive component of the NII. These NII applications reflect the possible use of PetaFLOPS systems as vast million-user information and analysis resources. These applications will not be easy to design with a pure data-parallel model. However, techniques that integrate the efficiency of data parallelism with the flexibility of distributed object systems are starting to emerge and they relate directly to the software directions being taken for the NII at large.
Finally, we must consider the economic foundations for sustainable development of HPC technology over the long term. There have been many examples illustrating the power of the marketplace to focus resources on commercially valuable and viable technology. To the extent that the development of software technology (or any technology) for HPC can ride this wave, progress can be greatly accelerated. To the extent that it can not, HPC software technology will be paddling upstream, doomed to progressing more slowly than mainstream software technology. As an example of the importance of this market leverage, consider the increasingly widespread use of commodity microprocessor technology in building parallel architectures. A key advantage of using commodity microprocessor technology is the substantial research and development budgets made possible by the multi-billion-dollar workstation and personal computer markets. In contrast, HPC software efforts have not been similarly cross-subsidized and consequently they are underfunded. The resulting parallel software supplied by HPC vendors and others lacks the robustness and polish found in software aimed at the commercial world. Only an expanded market will give vendors the resources to improve matters.
Thus, the HPC industry is in peril if it is overly dependent upon government subsidy. One of the goals of government support for the HPC program is to fund pre-competitive research that will enable computer vendors to bring scalable parallelism into the mainstream of commerce and industry. To succeed at this, the hardware and software of such systems must (1) target applications that ultimately have the potential to be commercially self-sustaining and (2) must implement a common set of programming models and methods to support portability of applications between different vendors' systems. This latter point is a precondition to attracting serious interest from independent software vendors.
In the following paragraphs we outline a more complete vision of the possible future roles of PetaFLOPS architectures. We look more closely at the problems of current HPC scalable architectures and then make a number of recommendations that should be implemented as soon as possible. To wait for the PetaFLOPS initiative will be too late.