Next: Programming Technology
Up: Discussion of Software
Previous: Discussion of Software
Programming languages and models are the notations and underlying
concepts used for writing programs. Although there is not a sharp
dividing line, we can distinguish two kinds programming methods:
building software ``from scratch'' and assembling software by ``wiring
together'' pre-existing modules, with comparatively little
customization. This section focuses on technology supporting the
former method; technology for the latter is discussed in the next
section under ``Programming Technology.''
Generally, programmers have been able to use Fortran and message
passing to port scientific and engineering codes that do not require
particularly high-performance I/O to multiprocessors. Some classes of
codes have also proved to be amenable to the commercial predecessors
of HPF, CM Fortran, and MasPar Fortran. These efforts have been the
most successful in the case of dense matrix codes, but programming
costs are still high and portability is low. We are also making
progress in developing tools for some limited domains of sparse matrix
computations, but this is still a struggle. Except for massively
parallel, coarse grain, regular programs execution speeds are often
slower than comparable programs running on vector supercomputers and
developed at less cost.
Parallelizing compilers have not achieved the initially expected
success. There are still ongoing efforts to parallelize code and also
automatically distribute data for distributed memory machines. Even
though these efforts may achieve good performance for small kernels,
it is not clear (at least until now) how successful they can be for
full programs, which tend to be large and have complicated logic.
For more detailed discussion, it is best to subdivide the broad area
of programming languages and models into several (overlapping)
categories as follows:
- Data-parallel programming models and languages. Traditionally
this model has meant lock-step parallel processing of dense vector and
array data, without task parallelism or nested parallelism. Ideas such
as segmented scans have extended the domain of this model somewhat, and
perhaps a more general way to characterize it today is ``parallelism
with a low diversity of instruction mix.'' This model is exemplified by
C*, Fortran D, Vienna Fortran, HPF (High Performance Fortran) and pC++,
as well as lower-level programming tools for SIMD machines, such as
MasPar's MPL. Vector- and matrix-oriented, but not overtly parallel,
programming languages like Fortran 90 and APL also may be considered to
fall within this category. However, of the above languages, only HPF
is being implemented commercially as a portable programming language
for parallel machines. Since HPF implementations are only starting to
appear, its potential will not be known for some time.
Although the majority of successes in massively parallel
processing have come from the use of the data-parallel model,
there is concern that it can not handle highly dynamic or irregular
computations gracefully, and therefore the range of applications
that it can handle may be limited. Nevertheless, it probably will
be possible to port effectively a significant subset of scientific
and engineering codes using data-parallel language extensions such
as HPF (and related extensions for irregular problems).
- Shared-memory programming models and languages. These are
traditionally models in which all data is equally accessible to all
computations, without regard to where in the machine the data may have
been placed or where a computation may be performed. Examples include
thread packages such as Presto, and the technologies that have been
used for programming quasi-shared-memory machines such as the Stanford
DASH, MIT Alewife, and Kendall Square Research KSR-1 machines.
Generally, the shared-memory model is viewed as the easiest model
for programmers to use, but it is often felt to be inappropriate
for massively parallel computing because it hides communication
operations. However, research into software-based distributed
shared memory techniques continues, with some promising results at
moderate scales of parallelism. Also, hardware architectures such
as the DASH, Alewife, KSR-1, and Tera computers suggest other ways
of scaling up the shared-memory model. It should be noted that
data-parallel languages like HPF (or Fortran 90) and functional
languages like Sisal may also be seen as instances of the
shared-memory computing model in that they hide communication
operations (or at least make them less explicit).
- Distributed-memory (``message passing'') programming models and
languages. In contrast to shared-memory models, distributed-memory
models make the location of data and computations explicit, and require
explicit, programmed communication actions to bring them together as
needed. Examples include message-passing systems like PVM, Linda, P4,
and Express. Unfortunately, developing large-scale parallel programs
using these packages is very difficult and time-consuming. Currently,
the MPI committee is trying to standardize a message-passing library,
but although this may become a more widespread standard, it has
essentially the same characteristics as the packages mentioned above.
Linda offers higher-level communication and synchronization primitives
but still requires the programmer to manage the mapping of data and
computations to processors. Other languages that are available include
Fortran-M (Argonne) and Merlin (ICASE).
- Functional programming models and languages. These models are
based on programming without side effects. The premise is that
eliminating side effects eliminates one of the major sources of
difficulty in parallel programming (read-write and write-write timing
races). The majority of functional languages guarantee deterministic
execution, and in fact make it impossible to write nondeterministic
programs. Sisal is one example of such a language. However, some
``functional'' languages include constructs such as non-deterministic
stream merges that allow nondeterministic programs to be written.
Although still viewed as experimental by much of the community, Sisal
has proven to be able to support both optimization and expressivity.
Higher-order functional languages add even more expressive power, but
good performance in production situations is not yet demonstrated.
- Object-oriented programming models and languages. These are
models based on the object-oriented abstraction capabilities of
languages like Smalltalk and C++. Instead of simplifying parallel
programming the way functional languages do (by eliminating side
effects), object-oriented parallel programming models use the
information-hiding character of the object abstraction as a simplifying
principle: the limits on interaction between an object's users and its
implementation reduce the number of interactions that a parallel
programmer needs to worry about.
Since some object-oriented programming models are explained in
terms of sending ``messages'' to objects, one might think of
object-oriented programming as a natural match for
distributed-memory programming; however, a ``message'' sent to an
object is fundamentally an abstraction-or logical
communication-operation, whose purpose is quite different from
the physical communication performed by messages in a
distributed-memory computing model. In fact, object-oriented
ideas can be used with any of the models mentioned above.
Numerous interesting research and advanced development projects have
been launched to extend C++ with some support for parallelism. These
include COOL (Stanford), Mentat (Virginia), and Tera C++. However,
none of these languages have been widely used yet.
-
Locality/latency management. In any scenario for PetaFLOPS
computing architectures, communications management emerges as a
key issue. Some programming models, such as Fortran D and HPF,
allow and/or require programmers to specify the location of data
and/or computations. Other models automate these decisions. In
the long run, with massively parallel machines and dynamic
applications, programmer-controlled locality management may prove
to be unworkable; on the other hand, automating this task with
acceptable results across a broad range of applications still
poses many research challenges. In any case, software technology
will have to help with
-
Locality management: minimizing communication by suitably
allocating, replicating, and scheduling data and computations.
-
Latency management: techniques such as prefetching and
multithreading (fast context switching) to maintain
computational speed even in the face of long communication or
memory latencies.
Hardware support has often proven to be valuable here: examples
include the use of cache memory and virtual memory for locality
management. Latency management has a shorter history but is
likely to become increasingly important as we move toward
PetaFLOPS. We need to understand better how hardware and software
can help each other solve this problem.
Next: Programming Technology
Up: Discussion of Software
Previous: Discussion of Software