Software technology is the enabling logical medium that matches the functional requirements of the user application programs to the capabilities of the computing system's underlying hardware resources. Ideally, software technology presents a logical interface to the user that facilitates programming while achieving efficient execution. Unfortunately, the realities of contemporary practice in the field of high-performance computing exhibit little of this ``virtual machine'' methodology. The current status of software technology in support of MPP architectures provides neither ease-of-programming nor effective execution. Rather, programmers have to work very close to the iron to achieve real efficiencies and this happens only after much labor. Considering the challenge of providing software technology support for systems ten-thousand to a hundred-thousand times more powerful than today's most aggressive massively parallel processing systems is daunting and brings in to question its viability.
The impact of software technology on high-performance computing is difficult to quantify with the costs incurred and benefits derived often of an intangible nature. This compares unfavorably with developers of the hardware platforms that can measure key factors of their systems or the applications programmers who can show runtimes achieved. While the time to first run of an application program or the degree of execution efficiency achieved are both severely impacted by the effectiveness and utility of the software tools available, the actual benefits achieved are not measured easily. Yet it is apparent from recent experience that effective application of computing systems at the PetaFLOPS scale will be impossible without fundamental changes in the nature of the support provided. The added complexity of million-way parallelism and distributed wide-diameter systems will overwhelm conventional parallel programming methodologies. Even now, no existing single parallelism model adequately covers the spectrum required by applications today.
Programmers are forced to write their applications in reasonable time that run with mediocre performance or invest heavily in optimization and fine tuning of their programs to realize high performance. Historically, the programmer has been forced to make the trade-off between reducing computing time or reducing programming time. The reasons for this are that system design has not been done with system software in mind, and supercomputers in the past have been treated more as high-cost, special-purpose laboratory instruments than general-purpose, easily applied computing systems. Software technology, what there is of it, has been relegated the job of converting these raw capabilities into delivered value for the users. But the widening gap between hardware-supplied capabilities and the needs of user application programs cannot be addressed by current software technology. PetaFLOPS computing systems will require a new paradigm for software technology and this will come only with a holistic design philosophy incorporating advances in algorithms, hardware, and system software designed to be mutually supportive.
The software for the current generation of 100 GigaFLOPS machines is not adequate to be scaled to a TeraFLOPS and will likely fail on a PetaFLOPS system. In part, this is due to the ``big laboratory instrument'' mindset that assumes users are dedicated experts entirely consumed for months with tweaking and fine tuning codes to get it just right. Another reason is that supercomputing software environments are seriously underfunded and underdeveloped-a factor driven by the relatively small market. But the challenges confronting PetaFLOPS systems software go beyond inconvenience and are central to the actual feasibility of that scale of computing. This is because parallel machines suffer from ``performance instability''-small changes in the relationship between a user program and the underlying hardware resources which can cause dramatic changes in delivered performance. One aspect of this relates to the increasingly ``High Q'' nature of the processors used. Near peak performance is delivered by a given processor only if the data and control are set up just right. Otherwise, dramatic performance degradation may result. Another aspect of this relates to the drastically increased system diameter-the number of clock cycles it takes for a logical signal/packet to cross the system-that will be characteristic of the PetaFLOPS computer resulting in very long access latencies. Finally, in order to hide this latency, many millions of transactions will have to be active simultaneously requiring that diverse nested and dynamic forms of parallelism be exploited, something done at best very poorly with current methodologies.
Currently, attention has been focused on message-passing and data-parallel programming models which, with some effort, have proven useful for a narrow class of scientific problems but which neither respond well to the challenges above nor generalize easily to broader irregular and dynamic computations. A fully general programming model is required to expose the diverse modes of parallelism, enable portability across platforms, and provide a common programming framework to which commercial software may be targeted by independent software vendors (ISVs). Economic viability will depend on the widest possible usage of the common programming methodology to leverage commercial investment. Such a model must extend beyond a specific platform and encompass heterogeneous ensembles of computing systems as encouraged and enabled by the emerging national information infrastructure (NII). For example, it can be envisioned that applications will become collections of subprograms logically connected as abstract networks of functionality reminiscent of some object-oriented techniques but conducive to mapping across arrays of systems in a seamless environment. Such a methodology would enable, but not be limited to, large scientific programming. Other multidisciplinary interacting programs not currently available would be supported in this extended framework and made possible by PetaFLOPS computing systems. Without tie-in to commercial investment and development driven by the mainstream computing market, a PetaFLOPS architecture initiative would depend entirely on government funding which could not possibly match the resources being applied to general processing technologies.
Software technology for a PetaFLOPS computing system, as well as for more conventional parallel computers, serves two principal purposes:
There is an important overlap between these two considerations in that the runtime resource management establishes the virtual model that is the logical interface to the hardware for the programming system. The programming methodology supported by the software technology comprises the programming language(s) and libraries as well as the sets of tools used for debugging and optimizing application code. The resource management elements of software technology provide services such as file management, virtual memory page swapping, and network interfacing as well as runtime control such as task scheduling (including process level and light weight threads) and resource allocation, process synchronization, and interprocessor communication.
Programming models in current use on high-performance computers can be categorized as: data parallel, message passing, control flow, functional, and object oriented. Even here, languages and models intermix so that one can program in a data-parallel style using distributed memory message-passing languages or shared-memory, control flow languages. Resource management on big systems tends to be limited in the extreme. Either it lacks in functionality leaving little between the programmer and the iron, or it lacks in efficiency which most programmers will reject thus leaving little between the programmer and the iron. This is a consequence of the fact that sophisticated functionality found at the operating system level expects to manage very coarse-grained objects spending hundreds of microseconds performing its services. Highly parallel systems use medium or fine granularity to provide sufficient concurrency to fully utilize all resources. For static regular and/or loosely coupled problems, hand-crafted codes can yield good results. But this simply substantiates the narrowness and difficulty of effective application programming on today's highly parallel systems.
Tools exist for helping programmers examine the time and state of the application program during execution. But they are not used widely yet, although this is likely to change somewhat in the near future. The major reasons for their lack of impact are programmer intransigence, long learning curve, lack of commonality across platforms, lack of availability on some platforms, inaccuracies, and inadequate functionality. Perhaps more to the point is the gap between what these tools present and what the programmer needs to do to achieve improvements in performance. It is often impossible to appreciate the subtle complexities involved in the relationship between program alteration and changes in system behavior. Resource management software is rudimentary in most cases. Often only one user can use a set of system resources at a time. Resource allocation is usually manual and static with poor or nonexistent locality management. And there is no feedback from system behavior to management mechanisms.
Software technology is both crucial to the success of PetaFLOPS computing and requires substantial advances in the current state of the art and practice to achieve the desired success. Advances must be made in both the areas of machine efficiency and human resource effectiveness.