Comparison of Cluster Algorithm for Two-Dimensional {Potts} Models} Clive F. Baillie and Paul D. Coddington SCCS-12 (Short Code: Baillie:91b) We have measured the dynamical critical exponent z for the Swendsen-Wang and the Wolff cluster update algorithms, as well as a number of variants of these algorithms, for the q=2 and q=3 Potts models in two dimensions. We find that although the autocorrelation times differ considerably between algorithms, the critical exponents are the same. The results for q=2 favor a logarithmic increase in the autocorrelation time with lattice size, implying that z is zero. ***** Static Performance Estimation in a Data Partitioning Tool Vasanth Balasundaram, Geoffrey Fox, Ken Kennedy, and Ulrich Kremer SCCS-14b (Short Code: Balasundaram:91i) The choice of the data domain partitioning scheme is an important factor in determining the available parallelism and hence the performance of an application on a distributed memory multiprocessor. In this paper, we present a performance estimator for statically evaluating the relative efficiency of different data partitioning schemes for any given program on any given distributed memory multiprocessor. Our method is not based on a theoretical machine model, but instead uses a set of kernel routines to ``train'' the estimator for each target machine. We also describe a prototype implementation of this technique and discuss an experimental evaluation of its accuracy. ***** Statistical Mechanics and Phase Transitions in Clustering Kenneth Rose, Eitan Gurewitz, and Geoffrey Fox SCCS-20 (Short Code: Rose:90c) A new approach to clustering based on statistical physics is presented. The problem is formulated as fuzzy clustering and the association probability distribution is obtained by maximizing the entropy at a given average variance. The corresponding Lagrange multiplier is related to the ``temperature'' and motivates a deterministic annealing process where the free energy is minimized at each temperature. Critical temperatures are derived for phase transitions when existing clusters split. It is a hierarchical clustering estimating the most probable cluster parameters at various average variances. ***** Constrained Clustering as an Optimization Method Kenneth Rose, Eitan Gurewitz, and Geoffrey Fox SCCS-21 (Short Code: Rose:90e) Our deterministic annealing approach to clustering, derived on the basis of the principle of maximum entropy, is independent of the initial state, and produces natural hierarchical clustering solutions by going through a sequence of phase transitions. This approach is modified here for a larger class of optimization problems by adding constraints to the free energy. The concept of constrained clustering is explained, and three examples are given where it is used as means to introduce deterministic annealing. First, the previous clustering method is improved by adding cluster mass variables and a total mass constraint. Secondly, the Travelling Salesman Problem is reformulated as constrained clustering, yielding the Elastic Net approach to the problem. More insight is gained by identifying a second Lagrange multiplier which is related to the tour length, and can also be used to control the annealing process. Finally, the ``open path'' constraint formulation is shown to relate to self-organization and dimensionality reduction in unsupervised learning. A similar annealing procedure is applicable in this case as well. ***** A Hybrid Genetic Algorithm for Task Allocation in Multicomputers Nashaat Mansour and Geoffrey Fox SCCS-36 (Short Code: Mansour:91a) A hybrid genetic algorithm for the task allocation problem (HGATA) in multicomputers is proposed. It minimizes the possibility of premature convergence and finds good solutions in a reasonable time. HGATA includes elitist ranking selection, variable rates for the genetic operators, the inversion operator and hill-climbing of individuals. Hill-climbing is done by a simple heuristic procedure tailored to the task allocation problem. HGATA also makes use of problem-specific information to evade some computational costs and to reinforce favorable aspects of the genetic search at some appropriate points. The experimental results on realistic test cases support the HGATA approach for task allocation. ***** Numerical Study of Laminar Separation Over an Annular Backstep A. Gaber Mohamed, D. T. Valentine, and R. E. Hessel SCCS-46 (Short Code: Mohamed:91a) Numerical simulations of laminar flows with recirculation regions were performed. The vorticity-stream function representation of the Navier-Stokes equations for two-dimensional flow in Cartesian coordinates and for axisymmetric flow in cylindrical coordinates were solved numerically. The algorithm developed to solve these systems of coupled, nonlinear partial differential equations implements a variable step Crank-Nicolson scheme with Richardson's extrapolation to discretize the time integration. It implements a fourth-order collocation method to solve the linear elliptic problems at the core of the computations. An iterative method is applied to solve for the nonlinearities over each time step. This fourth-order finite element algorithm was verified by simulating two nonparallel flow exact solutions to the Navier-Stokes equations to within the accuracy specified by the user. Next, the laminar flow field over the planar backstep located below a free-surface was computed and compared with available experimental data. Subsequently, the method was applied to investigate the flow over an inner-radius annual backstep. Solutions for steady flows over the annular backstep for Reynolds numbers from 100 to 400 were computed; the Reynolds number is based on the maximum velocity upstream of the step and the outer radius of the annulus. ***** Express versus iPSC/2 Primitives: A Performance Comparison Ishfaq Ahmad and Min-You Wu SCCS-77 (Short Code: Ahmad:91a) Express, developed by Parasoft, is a software programming environment for writing parallel programs for homogeneous MIMD multiprocessors. It provides a communication system for communicating processes, mechanisms for data sharing, reading files, debugging tools, and performance analyzing tools. An important feature of EXPRESS is that these functionalities are carried out in a user transparent manner. The flexibility of EXPRESS makes it an attractive tool set for developing parallel programs. In addition, it has nice features which allow an application to use appropriate utilities. One of such features is automatic domain decomposition library which can map physical domain of the problem to the underlying topology of the parallel computing system. The performance evaluation tools, using text and graphics, can be effectively used to analyze the run time performance of the program. ***** Garbage Collection Algorithms for the Connection Machine William G. O'Farrell SCCS-95 (Short Code: O'Farrell:91b) In this dissertation we study garbage collection (GC) on the Connection Machine (CM). In traditional GC, the objective is to identify memory cells which are no longer being used by the running program, so that those cells may be used again. On the CM, what are being collected are processors. Furthermore, traditional methods are inappropriate, as they would run only on the host machine. Therefore we explore massively parallel GC, in which the CM processors themselves are actively used in discovering which processors are garbage and which are not. Two different classes of GC are studied in this dissertation. In the first, a separate GC phase is applied to what we call ``binary graphs'' or B-graphs, which are directed graphs in which the outdegree for nodes is less then or equal to two. If the indegree for nodes is allowed to be greater than one, then the B-graph is said to permit ``structure sharing''. We study B-graphs with and without structure sharing. Counted in ``number of parallel steps'', our algorithm has a running time of O(log(n)) in the worst case (the graph is actually a linear list) and O(log(log(n))) in the best case. Here n is the number of nodes in the graph, and a ``parallel step'' may be any operation available on the CM, including general communication. The second class of GC studied in this dissertation is GC for Association Graphs (A-graphs), which are a generalization of association lists. A-graphs are directed graphs in which the outdegree of nodes is no greater than one. Unlike conventional association lists, A-graphs may have sharing and cycles. GC for A-graphs does not happen in a separate computational phase --- we incorporate it as part of the basic A-graph operations. Its running time is constant (i.e., O(1)). Furthermore, it is constant in a very genuine sense, because A-list operations use only the CM's broadcast and global-or networks. Although the delay time across these networks would grow logarithmically if the CM were scaled to larger and larger sizes, constant time is a reasonable assumption for the bounded size of todays CMs. ***** Investigation of the Two-dimensional O(3) Model using the Overrelaxation Algorithm John Apostolakis, Clive F. Baillie, and Geoffrey C. Fox SCCS-101 (Short Code: Apostolakis:91a) We investigate the two-dimensional O(3) model with the standard action by Monte Carlo simulation at couplings beta up to 2.05. We measure the energy density, mass gap, and susceptibility of the model, and gather high statistics on lattices of size L <= 1024 using the floating point systems T-series vector hypercube. Asymptotic scaling does not appear to set in for this action, even at beta = 20.5, where the correlation length is 304. We observe a 20% difference between our estimate m/Lambda_{\bar{MS}} = 3.52(6) at this beta and the recent exact analytical result. We use the overrelaxation algorithm interleaved with Metropolis updates and show that the decorrelation time scales with the correlation length and the number of overrelaxation steps per sweep. We determine its effective dynamical critical exponent to be z prime = 1.079(10); thus critical slowing down is reduced significantly for this local algorithm that is vectorizable and parallelizable. ***** Parallel Programming as an Optimization Problem or Intelligent Compilers for Parallel Computers Geoffrey C. Fox and Vasanth Balasundaram SCCS-110 (Short Code: Fox:90d) For at least the next few years, FORTRAN will be a critical language for scientific computation. One can view it either as a primary user interface or as a portable ``machine-language'' for which excellent compilers exist and which is a target for more user-friendly systems. Mapping a scientific problem onto a high performance computer can be viewed as a hard optimization problem where one minimizes some combination of user program development time and production program execution time. We consider how expert systems, neural networks, simulated annealing, and related methods can be integrated into a FORTRAN compiler to both give better code and feedback to the user on the appropriateness of a particular problem to particular hardware. ***** The Architecture of Problems and Portable Parallel Software Systems Geoffrey C. Fox SCCS-134 (Short Code: Fox:91g) We show how the structure or architecture of applications suggest the nature of parallel software systems that will run portably on a variety of parallel machines---both those available now and those expected during the coming decade. The discussion is illustrated by lessons learned from real applications implemented on current MIMD and SIMD machines. These are mainly academic problems and the extrapolation to complex industrial and government applications is unproven but we believe our methodology will still be applicable. We suggest that problems consist of a heterogeneous mixture of at least three distinct classes for which the parallel software support could be different. Further, we need approaches to manage the integration of these different software paradigms.