Lisa, Here are the SCCS abstracts for the next issue of PCN. Please note the following: 1) SCCS-74, not SCCS-74c. If different, please send abstract to me. 2) SCCS-154, not SCCS-154b. If different, please send abstract to me. 3) For SCCS-101, I have "Investigation of the Two-dimensional O(3) Model using the Overrelaxation Algorithm" by J. Apostolakis, C. Baillie, and G. C. Fox. Your log has it as "The 2-d O(3) Model by Over- relaxation". Please find out what is correct and let me know. If I'm in error, please send the abstract to me. 4) Please send me abstracts for SCCS-137,139,141, and 169. I thought I had everything, but somehow these were never sent to me. 5) Is SCCS-163 and SCCS-189 the same document? If not, please send me the abstract for SCCS-189. 6) Please find out from Geoffrey what abstract should be typed for SCCS-283. There are also a few others that did not have abstracts, but introductions. I either typed the first paragraph or so, or typed the entire introduction. Geoffrey should look at these before sending them off to print. I'll get the next issue to you soon. Please let me know when it will be due so that I know how much time I have. You probably already told me, but I forgot to write it down. Thanks, Terri Parallel Genetic Algorithms and Application to Load Balancing for Parallel Computation Nashat Mansour and Geoffrey C. Fox SCCS-74 (Short Code: Mansour:91b) A new coarse grain parallel genetic algorithm (PGA) and a new implementation of a data-parallel GA are presented in this paper. They are based on models of natural evolution in which the population is formed of discontinuous or continuous sub-populations. In addition to simulating natural evolution, the intrinsic parallelism in the two PGA's minimizes the possibility of premature convergence that the implementation of GA's often encounters. The PGA's have been implemented on a hypercube and a Connection Machine, and their operation is demonstrated by applying them to the static load balancing problem, also referred to as domain decomposition, in parallel computation. The PGA's have found near-optimal solutions that are better than those found by a sequential GA. In addition to the speed-up due to the parallel implementation, the PGA's evolve fit genotypes in a smaller number of generations by virtue of the intrinsic parallelism of the underlying evolution models. On one hand, the PGA's accentuate the advantage of parallel computers for simulating natural evolution. On the other hand, they represent a new approach to domain decomposition. A Comparison of Load Balancing Methods for Parallel Computations Nashat Mansour and Geoffrey Fox SCCS-154 (Short Code: Mansour:91e) Three physical optimization methods are considered in this paper for load balancing parallel computations. These are simulated annealing, genetic algorithms, and neural networks. Some design choices and the inclusion of additional steps lead to new versions of the algorithms with different solution qualities and execution times. The performances of these versions are critically evaluated and compared for test cases with different topologies and sizes. Orthogonal recursive coordinate bisection is also included in the comparison as a typical simple deterministic method. Simulation results show that the algorithms have diverse properties. Hence, different algorithms can be applied to different problems and requirements. For example, the annealing and genetic algorithms produce better solutions and do not show a bias towards particular problem structures. But, they are slower than the neural network and recursive bisection. Preprocessing graph contraction is one of the additional steps suggested for the physical methods. It produces a significant reduction in execution time necessary for their applicability to large-scale problems. Parallel Programming as an Optimization Problem or Intelligent Compilers for Parallel Computers Geoffrey C. Fox and Vasanth Balasundaram SCCS-110 (Short Code: Fox:90d) For at least the next few years, FORTRAN will be a critical language for scientific computation. One can view it either as a primary user interface or as a portable ``machine-language'' for which excellent compilers exist and which is a target for more user-friendly systems. Mapping a scientific problem onto a high performance computer can be viewed as a hard optimization problem where one minimizes some combination of user program development time and production program execution time. We consider how expert systems, neural networks, simulated annealing, and related methods can be integrated into a FORTRAN compiler to both give better code and feedback to the user on the appropriateness of a particular problem to particular hardware. Physical Optimization Methods for Allocating Data to Multicomputer Nodes Nashat Mansour and Geoffrey C. Fox SCCS-122 (Short Code: Mansour:91d) Three optimization methods derived from natural sciences have been applied to the problem of allocating data to multicomputer nodes. These are a Simulated Annealing Algorithm, a Hybrid Genetic Algorithm, and a Bold Neural Network. In this paper, a review of the three methods is presented. Their performances are critically evaluated and compared for a number of examples with different topologies, sizes, and granularities. The performance criteria are solution quality, execution time, robustness, bias, and parallelizability. Simulation results show that the three methods have fairly diverse advantages and drawbacks. Hence, they can be suitable for different requirements. For example, the annealing and genetic algorithms yield comparable solutions and do not show a bias towards particular problem structures. The neural network produces solutions of a lower quality, but in a considerably shorter time. Implementing Spatial Environmental Models on Parallel Computing Systems Kim Mills, Geoffrey C. Fox, and Roy Heimbach SCCS-150 (Short Code: Mills:91d) Parallel computing will play an important role in integrating spatial environmental models for large-scale systems. Intervisibility analysis with error simulation in the digital elevation model is used to illustrate an approach to developing parallel models, and to demonstrate some benefits of high performance computing. The line of sight algorithms, and data distribution strategies for two intervisibility models run on the Connection Machine are described and partial program listings are appended. Collaboration between environmental and computational scientists will be required to develop more complex spatial environmental models. Distributed Memory Compiler Methods for Irregular Problems---Data Copy Reuse and Runtime Partitioning R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis SCCS-163 (Short Code: Das:91a) This paper outlines two methods which we believe will play an important role in any distributed memory compiler able to handle sparse and unstructured problems. We describe how to link runtime partitioners to distributed memory compilers. In our scheme, programmers can implicitly specify how data and loop iterations are to be distributed between processors. This insulates users from having to deal explicitly with potentially complex algorithms that carry out work and data partitioning. We also describe a viable mechanism for tracking and reusing copies of off-processor data. In many programs, several loops access the same off-processor memory locations. As long as it can be verified that the values assigned to off-processor memory locations remain unmodified, we show that we can effectively reuse stored off-processor data. We present experimental data from a 3-D unstructured Euler solver run on an iPSC/860 to demonstrate the usefulness of our methods. Parallel Processing for Computer Vision and Image Understanding Alok Choudhary and Sanjay Ranka SCCS-178 (Short Code: Choudhary:92b) What is vision? From one point of view, ``it is the process of recognizing objects of interest from images''. The word ``process'' refers to some form of processing performed on the input data, which may be one or more images. The phrase ``objects of interest'' implies the existence of a context under which this processing must take place and the existence of a representation used in the processing. For example, if we were asked, ``Is there a table in this room?'', we would have some representation of a table in our mind and look for something similar to it in the room. In that process, we would ignore objects that did not look like a table. That is, we started with a model and looked for some instance of the model in the room. On the other hand, if we were asked, ``Describe all the objects in this room'', we would scan the room, form some representation of the objects, and match them to some objects that are part of our ``knowledge base''. However, we may or may not know what to expect in the room. Both problems can be considered ``vision problems''. A general vision problem, therefore, is considered to be an ill-posed problem from a computational perspective. Vision has fascinated researchers from various disciplines such as psychology, neural science, computer science and engineering, etc. for a long time and a good discussion on various aspects of vision can be found. Hierarchical Tree-Structures as Adaptive Meshes David Edelsohn and Geoffrey C. Fox SCCS-193 (Short Code: Edelsohn:91b) New adaptive mesh refinement algorithms provide an opportunity to utilize the same hierarchical tree-structures developed for multipole-based particle simulations in grid-based simulations of both continuum and particle problems. Representing both a multipole method simulation and an adaptive mesh simulation with this same structure provides a natural formalism with which to unite these two classes of solvers. This paper discusses how both methods exploit the same basic principle of locality evident in many systems such as those governed by Poisson's Equation. The locality of the systems and the resulting algorithms provide important benefits for implementations on massively parallel computers. Evaluating the Performance of PARTI Ishfaq Ahmad and Min-You Wu SCCS-197 (Short Code: Ahmad:91e) In this report, we describe the evaluation of overhead associated with PARTI (Parallel Automated Run Time Toolkit at ICASE). PARTI is a set of primitives which are used in loops for irregular problems on distributed memory architectures. When irregular arrays are distributed over the processors of the system, the operations within a loop on some processor may require array elements from the memory of some other processors. Communication for accessing array elements, for multiple times, within a loop can result in significant overhead which in turn results in loss of performance. If the elements which need to be fetched within the loops are accessed before executing the loop, significant savings can be made in the time spent in communication otherwise. This leads to the concept to INSPECTOR and EXECUTOR for problems with irregular structures. An INSPECTOR is a program module which performs pre-processing by examining the loops and generates the set of communication patterns which are required within the loops to access off-processor data elements. The information generated by the INSPECTOR is then used to pre-fetch those data elements. The actual execution of the program is done by the EXECUTOR which uses the pre-fetched data elements within the loops without incurring the penalty for communication overhead during processing. Lessons from Massively Parallel Architectures on Message Passing Computers G. C. Fox SCCS-214 (Short Code: Fox:91n) We review a decade's work on message passing MIMD parallel computers in the areas of hardware, software and applications. We conclude that distributed memory parallel computing works, and describe the implications of this for future portable software systems. Complete Mode-Locking in Models of Charge-Density Waves A. Alan Middleton, Ofer Biham, P. B. Littlewood, and P. Sibani SCCS-217 (Short Code: Middleton:91b) Mode-locking in a.c. driven charge-density waves (CDW's) is studied numerically in two and three dimensions, using both continuous equations of motion and a cellular automaton model. As the system size is increased, a complete devil's staircase of steps is approached. The fractal dimension of the set of gaps in the staircase is found to be D_0 = 0.75 +/- 0.04, 0.88 +/- 0.05 in two and three dimensions, respectively, in the automaton model. The spectrum of singularities f(alpha) is calculated and is related to the dynamic critical exponent zeta for the CDW depinning transition. Benchmarking the CM-5 for Image Processing Applications Ravi Shankar, Ravi Ponnusamy, and Sanjary Ranka SCCS-233 (Short Code: Shankar:92a) This paper presents benchmarking results for image processing algorithms on the Connection Machine model CM-5 and compares them with the results from the CM-2 and the Sun-4. Image processing algorithms with varying communication and computational requirements were implemented, tested and timed. The performance and the scalability of the CM-5 were analyzed and compared with that of the CM-2. Parallelization of CHARMM Molecular Dynamics Code on Multicomputers S. Ranka, G. C. Fox, J. Saltz, and R. Das SCCS-236 (Short Code: Ranka:92a) This paper discusses different algorithms for implementing molecular dynamics in the program CHARMM on multicomputers. The molecular dynamics is the computationally intensive part of CHARMM and requires almost all of the time for a typical application. The molecular dynamics can be divided into four main categories of subroutines: the calculation of forces, the integration of equations of motion, the correction of atomic coordinates because of constraints and the generation and update of the non-bonded list. We examine each of them in detail and discuss their scalability on multicomputers. Simulated Tempering: A New Monte Carlo Scheme Enzo Marinari and Giorgio Parisi SCCS-241 (Short Code: Marinari:92a) We propose a new global optimization method (Simulated Tempering) for simulating effectively a system with a rough free energy landscape (i.e., many coexisting states) at finite non-zero temperature. This method is related to simulated annealing, but here the temperature becomes a dynamic variable, and the system is always kept at equilibrium. We analyze the method on the Random Field Ising Model, and we find a dramatic improvement over conventional Metropolis and cluster methods. We analyze and discuss the conditions under which the method has optimal performances. Visualization of High Energy Physics Data using AVS Tomasz Haupt SCCS-243 (Short Code: Haupt:92a) There is an increasing gap between the computational tools used in High Energy Physics research and achievements in modern hardware and software technologies. More and more often one can hear opinions that physicists should reduce their time spent on generating code and start making use of commercially available high-tech packages. Apparently, this seems to be the only way to fully utilize the available hardware, including a high resolution three-dimensional color graphics, capability of distributed or parallel computing, etc. To verify this, I used the commercially available Application Visualization System (AVS) package by Stardent/AVS, Inc., as a tool for visualization of experimental data. There are many places in the analysis of high energy experiment I found the visualization to be very useful. One example is an event by event analysis. The means provided by the AVS system make it possible to create a very realistic event display, with all geometric aspects preserved. The three-dimensional rotations and continuous zooming capability gives full access to any detail of the analyzed event. Another class of applications is the monitoring of a detector's performance. Superimposing of a detector characteristics on a high fidelity visual representation of the detector geometry provides a very clear picture of the detector's performance, an extremely useful tool for monitoring and trouble-shooting. Moreover, the AVS systems enables creation of simple animations to demonstrate changes of the detector parameters with time. Maybe, the most interesting application of an advanced visualization system is, however, the off-line analysis of the experimental data. Three dimensional histograms (possibly supplemented by color and animation as a tool to operate on a five-dimensional space of parameters) gives deeper insight into the nature of the investigated problem. In addition, an easy way of applying even very complex cuts by manipulating real or virtual dials and sliders, and making projections onto flat or curvilinear subspaces and finding iso surfaces makes an analysis fast and very efficient. Examples of each category mentioned above are presented in this report, to illustrate capability of the AVS system. However, this should be regarded as a pilot study to determine the feasibility of using the AVS system for visualization of high energy physics data rather than a fully comprehensive catalog of its possible applications. A Test Suite Approach for Fortran90D Compilers on MIMD Distributed Memory Parallel Computers Min-You Wu and Geoffrey C. Fox SCCS-244 (Short Code: Wu:92b) This paper describes a test suite approach for a Fortran90D compiler, a source-to-source parallel compiler for distributed memory systems. Different from Fortran77 parallelizing compilers, a Fortran90D compiler does not parallelize sequential constructs. Only parallelism expressed by Fortran90D parallel constructs is exploited. We discuss compiler directives and the methodology of parallelizing Fortran programs. An introductory example of Gaussian elimination is used, among other programs in our test suite, to explain the compilation techniques. Concurrent Interstate Conflict Simulations: Testing the Effects of the Serial Assumption Gavan Duffy SCCS-250 (Short Code: Duffy:91a) The interstate conflict simulation model of Bremer and Mihalka, as extended by Cusack and Stoll, is reimplemented in a massively parallel computational environment. By observing the impact of controlling parameters upon pluralism in simulated state systems, Cusack and Stoll investigate the consequences of competing propositions of the realist school of international relations theory. The simulation method avoids reliance on the highly unrealistic simplifying assumptions characteristic of deductive assessments of realisms. However, their simulation model relies on the serial assumption: that only one conflict can occur at one time. The massively parallel implementation removes this assumption, simulating multiple conflicts concurrently. While some minor differences across the serial and parallel implementations are observed at the system level, sharper differences appear in the survival probabilities of states adopting particular styles of decision making. Additionally, relaxation of the Cusack-Stoll assumption that states misperceive others' capabilities at a constant rate of error produces substantial effects at the level of system endurance. Compiling Fortran 77D and 90D for MIMD Distributed-Memory Machines A. Choudhary, G. Fox, S. Ranka, S. Hiranandani, K. Kennedy, C. Koelbel and C-W. Tseng SCCS-251 (Short Code: Choudhary:92c) Fortran D provides a set of data decomposition specifications for either Fortran~77 of Fortran~90. In this paper we present an integrated approach to compiling Fortran~77D and Fortran~90D programs into SPMD (Single Program Multiple Data) message-passing programs for efficient execution on MIMD distributed-memory machines. The integrated Fortran~D compiler relies on two key observations. First, array constructs may be scalarized by the Fortran~90 front end into forall loops without loss of information. Second, loop fusion, partitioning, and sectioning optimizations in the Fortran~D back end can be useful for both Fortran~D dialects. Implementation and Scalability of Fortran~90D Intrinsic Functions on Distributed Memory Machines I. Ahmad, A. Choudhary, G. Fox, K. Parasuram, R. Ponnusamy, S. Ranka, and R. Thakur SCCS-256 (Short Code: Ahmad:92a) We are developing a Fortran~90D compiler, which converts Fortran~90D code into Fortran~77 node programs for distributed memory machines. This paper presents the performance results of Fortran~90D intrinsic functions in Intel iPSC/860 and iPSC/2 hypercubes. We discuss the implementation of those intrinsic functions and show that our implementations are scalable. Benchmarking the CM-5 Multicomputer Zeki Bozkus, Sanjay Ranka, and Geoffrey Fox SCCS-257 (Short Code: Bozkus:92a) In this paper, we study the performance of the CM-5 multiprocessor. We provide a number of benchmarks for its communication and computation performance. Many of the operations, like scans and global reduction, can be performed using special hardware available on the CM-5. We have benchmarked these operations. We also describe how to embed a mesh and a hypercube on a CM-5 architecture and provide timings for some mesh and hypercube communication primitives on the CM-5. Prospects for Parallel Computing Geoffrey C. Fox SCCS-265 (Short Code: Fox:92g) The best enterprises have both a compelling need pulling them forward and an innovative technology solution pushing them on. In high performance computing, we have the need for increased computational power in many applications and the inevitable long term solution is massively parallelism. In the short term, the relation between pull and push may seem unclear as novel algorithms and software needed to support parallel computing. However, eventually parallelism will be present in all computers---including those in your children's video game, your personal computer or workstation and the central supercomputer. Scheduling Regular and Irregular Communication Patterns on the CM-5 Ravi Ponnusamy, Rajeev Thakur, Alok Choudhary, and Geoffrey Fox SCCS-274 (Short Code: Ponnusamy:92a) In this paper, we study the communication characteristics of the CM-5 and the performance effects of scheduling regular and irregular communication patterns on the CM-5. We first study the impact of path length and message size on communication time. Then we consider the scheduling of regular communication patterns such as complete exchange and broadcast. We have implemented four algorithms for complete exchange and two algorithms for broadcast. We have also implemented four algorithms for scheduling irregular communication patterns and studied their performance on the communication patterns of several synthetic as well as real problems such as the conjugate gradient solver and the Euler solver.