Near the end of the CP work at Caltech, we did some important experiments using Fortran 90 which formed the basis of the aspects of the FortranD project overviewed in Section 13.1. These were partly motivated by Fox's change of architectural environment. At Caltech, he was surrounded by MIMD machines and the associated culture; at Syracuse's NPAC facility, the centerpiece in 1990 was a -node SIMD CM-2. In reading the CMFortran (Fortran 90) manual, Fox noted that the Fortran 90 run time support included all the important collective communication primitives (such as combine and broadcast) we had found important in CrOS and Express.
The first experiment involved a climate modelling code using spectral methods [Keppenne:89a;90a]. We had rashly promised a TRW group that we would be able to easily parallelize such a code. However, we had not realized that the code was written in C with extensive C++-like use of pointers. ParaSoft-responsible for the code conversion-was horrified and the task seemed daunting! However, Keppenne was interested in rewriting the code in Fortran 90, which was a ``neat'' language like C++. ParaSoft found that the resultant Fortran 90 code was straightforward to port to a variety of parallel machines, as shown in Tables 13.6 and 13.7. Note that the new version of the code had an order-of-magnitude-higher performance than the original one on a single CPU CRAY Y-MP. The discipline implied by Fortran 90 allowed both ``outside computational scientists'' and the Cray compiler to ``understand'' the code. We analyzed this process and believe that we could indeed replace our friends at ParaSoft for this problem by a compiler-initially Fortran 90D-which could generate good SIMD and MIMD code. This is, of course, the motivation of use of the array syntax feature in High Performance Fortran as it captures the parallelism in a transparent fashion.
Table 13.6: Logistics of Migration Experiment on Climate Code
Table 13.7: Performance of a Climate Modelling Computational Kernel. In each case, only minor (obviously needed) optimizations were performed.
This experiment motivated the Fortran 90D language [Fox:91f], [Wu:92a], and we followed up the climate experiments with some other simple examples, which are summarized in Table 13.8. This compares ``optimal hand-coded'' Fortran-plus message-passing code with what we expect a good Fortran 90D (HPF) compiler could produce from the (annotated) Fortran 90 source. The results are essentially perfect for the Gaussian elimination example and reasonable for the FFT. These estimates were borne out in practice [Bozkus:93a;93b] and the prototype Fortran 90D compiler developed at Syracuse produced code that was about 10% slower than the optimal node Fortran 77+ message-passing version.
Table 13.8: Effectiveness of Fortran 90 on Two Simple Kernels. The execution time is given as a function of the number of nodes used in the iPSC2 multicomputer.