Next: Device Technology Working Up: Applications Working Group Previous: Video Image Fusion

Algorithmic Issues for the PetaFLOPSComputer

Algorithms sit between the application and the machine and interact with both. Advances in the performance and architecture of the machine may have a serious impact on the choice of algorithms and vice versa.

The main algorithmic issue in going from a TeraFLOPS machine to a PetaFLOPS machine is scalability (because the problems to be solved on the latter will be necessarily much larger), in terms of both the algorithmic efficiency (i.e., the serial complexity of the algorithm) and the parallelization efficiency. Unfortunately, these are often two conflicting goals.

We list, in Table 4.5, three different types of algorithms viewed in these two terms:

We expect a PetaFLOPS machine to have two essential features that will influence critically the proper choice of algorithms: the memory will be hierarchical (in terms of access time) and the processor interconnect will be hierarchical (e.g., a MIMD network of SIMD machines). We expect that these architectures will favor algorithms that are also hierarchical in nature, because there will be a better match in terms of the algorithm's demand for synchronization/communication and the efficiency with which the machine can deliver them. For example, for elliptic PDEs, a hierarchical architecture should favor hierarchical algorithms such as multigrid and domain decomposition. These types of algorithms require both local and global data interaction but the amount of interaction decreases with the distance between the processors. Such an architecture may also favor a preconditioned iterative approach where the preconditioner can be chosen to better match the architecture.

On the other hand, a potential problem is that it is often difficult to implement hierarchical algorithms to run at perfect parallel efficiency. A PetaFLOPS machine will necessarily involve a large degree of parallelism and it may increase the percentage of time an algorithm spends on synchronization and communication (the Amdahl's Law effect). For example, on the coarser grids, there are fewer data to keep the machine busy. Much more research is needed, both in new algorithms that parallelize across grid levels and also in architectures that do not impose a heavy penalty for these kinds of multilevel communication patterns.

Another issue is problem scalability. For some problems, the scale-up in the size of the problem does not give rise to increased parallelism. For example, in certain molecular dynamics applications, one may want to fix the number of molecules but increase the number of time steps. Unfortunately, it is difficult to extract parallelism in the time direction. We may need to develop better algorithms to deal with these kinds of problems.

A final issue is precision. We may need to increase the precision as the problem size grows. For example, for many elliptic PDE problems, the conditioning grows like (2nd-order problems) or (4th-order problems), where is the number of grid points in each co-ordinate direction. Therefore, for , a 4th-order elliptic solver may not have any accurate significant digits on current 64-bit word machines. A PetaFLOPS machine may need to have increased precision. Alternatively, more stable and better conditioned discretizations can be employed. For example, Monte Carlo methods are less sensitive to precision than those based on solving differential equations. Further, such statistical methods are often straightforward to parallelize. Thus, we anticipate that further investigation of Monte Carlo algorithms may be important for PetaFLOPS scale applications.



Next: Device Technology Working Up: Applications Working Group Previous: Video Image Fusion


gcf@npac.syr.edu