Device technologists and computer architects can easily measure metrics of progress such as device switching times, clock cycle times, and peak arithmetic operation rates. The ultimate importance of these metrics is realized, of course, only when the performance that they imply is actually delivered through the various software layers to end users. The relentlessly increasing difficulty of doing this is one of the main obstacles facing high-performance computing (HPC). This obstacle is already formidable in the domain of TeraFLOPS computing. If we simply extrapolate current trends, this obstacle looks nearly insurmountable at the PetaFLOPS level for all but an extremely specialized range of applications. Building PetaFLOPS systems that can be used successfully for a wide range of applications will require a design approach relying much more heavily on joint, holistic problem-solving by architects, software technologists, and application specialists, with each discipline contributing what it can toward the goal of maximizing the delivered value of the system to users, rather than internal figures of merit such as peak theoretical performance.
Unfortunately, one of the challenges of the software technology area is the difficulty of defining quantitative, meaningful, and measurable metrics of the value delivered to users. Such metrics need to capture two important determinants of value: cost and benefit. A closer look provides an instructive illustration of the difficulties of quantifying these metrics.
When all these dimensions of value are considered, the reasons for many frequently seen trade-offs become more evident. For example, there is almost always a trade-off between reducing computing time (capital investment) or reducing programming time (human investment). The human investment is often the more significant, especially for a program that may only be run a few times before being changed. Thus, it is logical that some application developers persist in using regular grids and straightforward algorithms even at the cost of much longer execution times: they use more computer time but need much less program development time. Similarly, hardware or software providing a global address space can be used, encouraging programmers not to worry about data layout. Again, the programming task is simplified at the expense of increased execution time or system cost. If the metrics used do not take all these factors into account, the logic behind such decisions will not be noticed.
Unfortunately, these metrics are challenging to evaluate even in ``real-world'' application situations using mainstream computing technology; witness the current debate as to whether or not the use of computer technology has improved white-collar productivity in American business. The metrics are even more challenging to evaluate in a typical advanced computing scenario where the applications are research problems and the economic value of solving them may not be known for many years, and where the human contributors are also highly specialized and hard to value in quantitative economic terms. Even greater is the difficulty of using these metrics to guide the design of an advanced computing system not yet built, yet this is precisely the goal we must strive for. In the absence of quantifiable metrics of overall value that can be estimated reliably from a description of a system before it is built, we must base our system design approach on common sense derived from experience with past systems, taking into account both the easily quantifiable aspects of delivered value and the aspects that are not easily quantifiable. We must avoid temptations to optimize mindlessly the quantifiable metrics while disregarding the others.
These temptations, unfortunately, are strong. Easily quantifiable metrics work well as ``sound bites'' for attracting attention in the media, commercial, and political arenas. Consistent, delivered value that is hard to quantify is much more challenging to sell. There has been a tendency in the development of the HPC field - and particularly where massive parallelism is concerned - to focus attention on the development of raw hardware capabilities, which lend themselves more readily to quantification, and relegate to software technology the job of converting these raw capabilities into delivered value for users. Software technologists have thus found themselves asked to bridge an ever-widening gap between hardware-supplied capabilities and user needs, without enough recognition of the fact that success in bridging this gap requires the software technologists who will build the bridge to work as equal partners with hardware architects and application specialists in defining the endpoints that the bridge should connect.
Thus, while quantifiable metrics such as peak arithmetic performance and lines of code per programmer-day still have some utility as milestones along the road, reaching PetaFLOPS capabilities in a meaningful way will require a more holistic approach in which the advancement of application, software, and hardware capabilities are all recognized, and financially supported, as equally vital, and specialists in all three fields work together to develop systems that can deliver real value to users.