\input /home/sushi/kea/vax/fonts.tex \overfullrule=0pt \hsize=5.5in \hoffset=0.5in \vsize=8.0in \voffset=0.5in \elevenpoint \footline={\ifnum\pageno=1 {\hfil} \else {\hss\tenrm\folio\hss}\fi} \def\starline{\centerline{****************************************}} \centerline{\bf Syracuse Center for Computation Science} \smallskip \centerline{\bf Technical Report Abstracts} \bigskip \bigskip \vbox{\par\noindent{\bf Physical Computation} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-2; SCCS-2b\ \ \ \ \ \ \ (Short Code: Fox:90i) \par Physical Computation embraces a variety of physical analogies used to tackle non-traditional problems. We describe Monte Carlo and deterministic methods, including simulated annealing and neural networks. Applications include economic change in Eastern Europe, the travelling salesman problem, vehicle navigation, track finding, and parallel computer load balancing.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a presentation to IBM by G. Fox, entitled ``NPAC and SCCS'', July 1990. SCCS-3 (Short Code: Fox:90q)} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by G. Fox, entitled ``Something to do with CRPC at Syracuse'', presented at the CRPC Annual Conference, Rice University, Houston, Texas, August 1990. SCCS-4 (Short Code: Fox:90r)} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by V. Balasundaram, entitled ``Static Performance Estimation in a Data Partitioning Tool'', presented at the CRPC Annual Conference, Rice University, Houston, Texas, August 1990. SCCS-5 (Short Code: Balasundaram:90g)} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by G. Fox, entitled ``Physical Computation and Software Systems'', presented at the KBSA Conference, Liverpool, New York, September 1990. SCCS-7 (Short Code: Fox:90t)} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by G. Fox, entitled ``Prospects for Parallel Computers'', presented at the Rome Air Development Center, Griffis AFB, New York (9/21/90) and Cornell University (9/25/90). SCCS-8 (Short Code: Fox:90u)} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Solving Problems in Navigation} \par\noindent {\it Amar Gandhi and Geoffrey Fox} \par\noindent SCCS-9\ \ \ \ \ \ \ (Short Code: Gandhi:90a) \par The problem of trying to find the minimal-time path from point $A$ to point $B$ through a field of obstacles is both an interesting and practical one. \par To solve the navigation problem, one can construct a graph representing all the possible ways of going from $A$ to $B$. In this way, the navigation problem is mapped onto a multidimensional combinatorial optimization problem. Simple methods due to `traditional' computer science (branch and prune) are too expensive [GUPT 90], [ZHU 90]. Often, one uses heuristics based on some physical insight to guide this optimization [GLAV 90], [BARR 89a], [CANN 90].} \par In general, most of the methods have limitations on the environments they can work in. They are often too specialized, and not robust or versatile enough to work in a real situation. \par In this work, we try to formulate a method using which we can handle time-dependent (i.e., moving) obstacles. We show how this makes it possible to also handle multiple robots (or vehicles) with little effort. \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by G. Fox, entitled ``Portable Parallel programs'', presented at North American Transputer User Group Meeting, Cornell University, October 1990. SCCS-11 (Short Code: Fox:90v)} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Comparison of Cluster Algorithm for Two-Dimensional {Potts} Models} \par\noindent {\it Clive F. Baillie and Paul D. Coddington} \par\noindent SCCS-12\ \ \ \ \ \ \ (Short Code: Baillie:91b) \par We have measured the dynamical critical exponent $z$ for the Swendsen-Wang and the Wolff cluster update algorithms, as well as a number of variants of these algorithms, for the $q=2$ and $q=3$ Potts models in two dimensions. We find that although the autocorrelation times differ considerably between algorithms, the critical exponents are the same. The results for $q=2$ favor a logarithmic increase in the autocorrelation time with lattice size, implying that $z$ is zero.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by G. Fox, entitled ``Parallel Computing Architecture and Software'', presented at the 63rd Semiannual Symposium at New York State Section of American Physics, IBM, Poughkeepsie, New York, October 1990. SCCS-13 (Short Code: Fox:90w)} \bigskip\starline\bigskip \vbox{\par\noindent{\bf A Static Performance Estimator to Guide Data Partitioning Decisions} \par\noindent {\it Vasanth Balasundaram, Geoffrey Fox, Ken Kennedy, and Ulrich Kremer} \par\noindent SCCS-13\ \ \ \ \ \ \ (Short Code: Balasundaram:90h) \par An approach to distributed memory parallel programming that has recently become popular is one where the programmer explicitly specifies the data decomposition using language extensions, and a compiler generates all the necessary communication. While this frees the programmer from the tedium of thinking about message-passing, no assistance is provided in determining the data decomposition scheme that gives the best performance on the target machine. In this paper, we discuss performance estimation as part of an interactive software tool that provides assistance for this very task. We describe a new approach to performance estimation that is based on experiments rather than on a fixed machine model. Preliminary experiments indicate that the proposed techniques will work well for a large class of scientific programs.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Static Performance Estimation in a Data Partitioning Tool} \par\noindent {\it Vasanth Balasundaram, Geoffrey Fox, Ken Kennedy, and Ulrich Kremer} \par\noindent SCCS-14b\ \ \ \ \ \ \ (Short Code: Balasundaram:91i) \par The choice of the data domain partitioning scheme is an important factor in determining the available parallelism and hence the performance of an application on a distributed memory multiprocessor. In this paper, we present a performance estimator for statically evaluating the relative efficiency of different data partitioning schemes for any given program on any given distributed memory multiprocessor. Our method is not based on a theoretical machine model, but instead uses a set of kernel routines to ``train'' the estimator for each target machine. We also describe a prototype implementation of this technique and discuss an experimental evaluation of its accuracy.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by G. Fox, entitled ``High Level Software Tools and the Astrophysical Nbody Problem'', presented at the ICASE Workshop on Unstructured Computation on Multiprocessors, Nags Head, North Carolina, October 1990. SCCS-17 (Short Code: Fox:90z)} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Computer Science Software Tools and Languages to Support Computational Physics} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-18,49\ \ \ \ \ \ \ (Short Code: Fox:90aa) \par Syracuse University is one of the largest private universities in the United States. It is located in the center of New York State in short distance to some of the state's most spectacular scenic areas. Its twelve colleges provide more than 200 different programs of study that lead to degrees in: arts and sciences, architecture, computer science, education, engineering, human development, information studies, management, nursing, public communications, social work and visual and performing arts.} \par Syracuse University has about 12,000 undergraduate studies, 2,130 full time and 2,340 part time graduate students, and 900 full time faculty. \par The activities at Syracuse University takes place in more than 324 buildings on its extended campus of approximately 900 acres. \par The Syracuse Center for Computational Science is a new program in computational science headed by Geoffrey Fox. The program builds on his experience at Caltech where he led the Caltech Concurrent Computation Program which pioneered the interdisciplinary approach to computing with close coupling between physics, computer science and other scientific application areas. \par As a part of the new program, the Physics Department is creating a new computational physics group with about three new faculty positions to be filled. Initially the group consists of G. C. Fox, O. Biham, W. Furmanski and three postdoctoral fellows: P. Coddington, A. Middleton and M. Vinson. \par The center and the faculty members involved have a strong commitment to development new educational programs in computational science. As a part of this initiative, new courses in computational physics will be added at the graduate and undergraduate levels. A workstation laboratory dedicated to this purpose already exists, while additional substantial investments in course related software development and maintenance will be made. \bigskip\starline\bigskip \vbox{\par\noindent{\bf Statistical Mechanics and Phase Transitions in Clustering} \par\noindent {\it Kenneth Rose, Eitan Gurewitz, and Geoffrey Fox} \par\noindent SCCS-20\ \ \ \ \ \ \ (Short Code: Rose:90c) \par A new approach to clustering based on statistical physics is presented. The problem is formulated as fuzzy clustering and the association probability distribution is obtained by maximizing the entropy at a given average variance. The corresponding Lagrange multiplier is related to the ``temperature'' and motivates a deterministic annealing process where the free energy is minimized at each temperature. Critical temperatures are derived for phase transitions when existing clusters split. It is a hierarchical clustering estimating the most probable cluster parameters at various average variances.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Constrained Clustering as an Optimization Method} \par\noindent {\it Kenneth Rose, Eitan Gurewitz, and Geoffrey Fox} \par\noindent SCCS-21\ \ \ \ \ \ \ (Short Code: Rose:90e) \par Our deterministic annealing approach to clustering, derived on the basis of the principle of maximum entropy, is independent of the initial state, and produces natural hierarchical clustering solutions by going through a sequence of phase transitions. This approach is modified here for a larger class of optimization problems by adding constraints to the free energy. The concept of constrained clustering is explained, and three examples are given where it is used as means to introduce deterministic annealing. First, the previous clustering method is improved by adding cluster mass variables and a total mass constraint. Secondly, the Travelling Salesman Problem is reformulated as constrained clustering, yielding the Elastic Net approach to the problem. More insight is gained by identifying a second Lagrange multiplier which is related to the tour length, and can also be used to control the annealing process. Finally, the ``open path'' constraint formulation is shown to relate to self-organization and dimensionality reduction in unsupervised learning. A similar annealing procedure is applicable in this case as well.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Parallel Processing Applied to Climate Modeling} \par\noindent {\it G. L. Keppenne, M. Ghil, G. C. Fox, J. W. Flower, A. Kolawa, P. N. Papaccio, J. J. Rosati, J. F. Shepanski, F. G. Spadaro, and J. O. Dickey} \par\noindent SCCS-22\ \ \ \ \ \ \ (Short Code: Keppenne:90a) \par A climate model of intermediate size and complexity is used to investigate the relative advantages and disadvantages of current and future computer architectures for Global Change studies. The model is based on the primitive-equation system used in atmospheric general circulation models (GCMs) and implements an efficient transform method to switch back and forth between the model's state spectral and grid-point representations.} \par The target architectures for the implementation of the model are a multiple instruction/multiple data (MIMD) NCUBE HYPERCUBE, a single instruction/multiple data (SIMD) Connection Machine~2 and a CRAY~Y-MP vector-supercomputer operating in single-processor mode. Details of the implementation on each architecture are presented, along with a discussion of programming techniques and human considerations, as well as parallelization and vectorization issues. FORTRAN~90 has proven an efficient programming language for all three architectures involved, either as is or after direct translation. \par The preliminary results show the advantages of the parallel architectures for problems of this type and comparable or larger sizes. These results can be generalized to other disciplines, given the similarities of the model with others used in fluid dynamics and engineering. \bigskip\starline\bigskip \vbox{\par\noindent{\bf Applications of Parallel Supercomputers: Scientific Results and Computer Science Lessons} \par\noindent {\it Geoffrey Fox} \par\noindent SCCS-23\ \ \ \ \ \ \ (Short Code: Fox:90o) \par Parallel computing has come of age. Both commercial and university-built parallel computers with supercomputer performance are now available.} \par In this chapter we describe a number of major computations which were carried out in 1988-1990 at Caltech on a variety of parallel machines: hypercubes, transputer arrays, the Connection Machine, and the AMT DAP. We compare their performance with that of several advanced-architecture serial machines, including the conventional CRAY and ETA-10 supercomputers, on the same problems. From our varied use of parallel machines we derived a number of lessons concerning hardware, software, and performance, including the idea of an appropriate match between problem category and type of parallel architecture. It is our hope that these lessons will both encourage scientific users to use parallel machines properly and help computer scientists to design better hardware and software. \par We also discuss the emergence of a new academic discipline---{\it computational science}---motivated by the larger and more important role played in science by computation in general, but especially by parallel computation. We can expect computation, and especially parallel computation, not only to revolutionize many existing scientific fields but also to open up new ones for which traditional approaches have failed. We can thus expect {\it computation\/} to join {\it theory\/} and {\it experiment\/} as a third fundamental part of the scientific method. It has been clear for some time that parallel computers are crucial to this revolution. Our town research has wandered between science, computational science, computer science and ..... \bigskip\starline\bigskip \vbox{\par\noindent{\bf An Evolutionary Approach to Load Balancing Parallel Computations} \par\noindent {\it G. C. Fox and N. Mansour} \par\noindent SCCS-25\ \ \ \ \ \ \ (Short Code: Fox:91l) \par We present a new approach to balancing the workload in a multicomputer when the problem is decomposed into subproblems mapped to the processors. It is based on a hybrid genetic algorithm. A number of design choices for genetic algorithms are combined in order to ameliorate the problem of premature convergence that is often encountered in the implementation of classical genetic algorithms. The algorithm is hybridized by including a hill climbing procedure which significantly improves the efficiency of the evolution. Moreover, it makes use of problem specific information to evade some computational costs and to reinforce favorable aspects of the genetic search at some appropriate points. The experimental results show that the hybrid genetic algorithm can find solutions within 3\% of the optimum in a reasonable time. They also suggest that this approach is not biased towards particular problem structures.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf An Evolutionary Approach to Load Balancing Parallel Computations} \par\noindent {\it Nashat Mansour and Geoffrey Fox} \par\noindent SCCS-25b\ \ \ \ \ \ \ (Short Code: Fox:91m) \par We present a new approach to balancing the workload in a multicomputer. It is based on a genetic algorithm that combines a number of design choices in order to ameliorate the problem of premature convergence. The genetic algorithm is further hybridized by including a hill climbing procedure which significantly improves the efficiency of the evolution. Moreover, it makes use of problem specific information to evade computational costs and to reinforce favorable aspects of the genetic search. The experimental results show that the hybrid genetic algorithm can find solutions that are very close to the optimum.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf The Northeast Distributed Center for Parallel Computing} \par\noindent {\it Geoffrey Fox} \par\noindent SCCS-26\ \ \ \ \ \ \ (Short Code: Fox:90dd) \par Progress in parallel computing has been rapid, and promises teraflop performance by the year 2000 or even earlier. Production parallel supercomputers can now support large-scale scientific calculations, and using these machines will provide knowledge critical to advancing high performance computing. This is a key enabling technology for academic research and for the nation's economic competitiveness and leadership. We propose the Northeast Distributed Center for Parallel Computing (NDCPC) as a collaboration between two existing facilities. These are the Northeast Parallel Architectures Center (NPAC) at Syracuse University and the Cornell Theory Center, representing more than eight years of proven performance in providing and supporting service for the national research community in parallel computing.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Developing Large Industrial Applications on NCUBE-2} \par\noindent {\it G. C. Fox, A. Choudhary, and S. Ranka} \par\noindent SCCS-27\ \ \ \ \ \ \ (Short Code: Fox:90ee) \par The thrust of our research at Syracuse University is to provide the potential benefits of the commercially available parallel machines to the industrial and academic (especially non computer science) community. We are currently building an environment for developing portable and scalable FORTRAN on parallel computers at Syracuse University. Our environment will provide support for aiding the user in porting old sequential FORTRAN programs to parallel computers and write new parallel programs. We will emphasize the issues coming from the large codes (many hundreds of thousands of lines) common in industry and the need to provide a high level portable interface for the average user. Our strategy is to link the application parallelization with the development of parallel environments so that we can test the computer science research described below, by the real problems for which the environments are designed. This requires an interdisciplinary approach which Fox successfully developed at Caltech and is extended by our new Computation Science academic program at Syracuse. NPAC will provide facilities and staff support for code conversion of our selected applications. We envision that our environment will be extremely useful in developing parallel software for large ``real'' applications.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by K. Kennedy, entitled ``Compiling Fortran for the Intel Touchstone'', sponsored by NPAC and the CIS Department of Syracuse University, December 1990. SCCS-28 (Short Code: Kennedy:90a)} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Achievements and Prospects for Parallel Computing} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-29\ \ \ \ \ \ \ (Short Code: Fox:91f) \par Parallel computing works for the majority of large scale computations. The development of parallel hardware designs has been largely transferred to industry, while universities continue major research efforts into better software environments. We describe a classification of problems and how different software models are needed for portable user friendly high performance implementations on parallel machines. The education of a new generation of computational scientists will be a major challenge to our universities.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Undergraduate Computational Science Summer Internships} \par\noindent {\it G. C. Fox and K. Kennedy} \par\noindent SCCS-30\ \ \ \ \ \ \ (Short Code: Fox:91o) \par We propose to enhance CRPC's mission in the development of parallel processing in the areas of education and industrial applications. Both these areas are critical bottlenecks to making parallelism a practical tool. This supplement will be based at Syracuse University and will fund a pilot program which will impact future projects at other CRPC sites and, more generally, universities, national laboratories and industries throughout the nation. The work at Syracuse will be coordinated by Geoffrey Fox who is P.I. for the CRPC computational science research at Syracuse University. Fox is Professor of Computer and Information Science and of Physics and Director of the Northeast Parallel Architectures Center at Syracuse.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Parallelization of Scientific Codes for IBM 3090} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-31\ \ \ \ \ \ \ (Short Code: Fox:90ff) \par We will take four scientific computations from the Perfect Club Collaboration and parallelize them for the six-way and 12-way IBM 3090 system. This will involve using Parallel and Clustered FORTRAN. We will produce the data in the form required by the Perfect Club and document the nature and amount of effort involved in the project. The results will include speed-up values as a function of problem size. We expect speed-ups to be good (corresponding to efficiencies of about 90\%) and we will optimize parallelization until this performance is obtained or clear reasons are discovered that it will be unrealistic to reach this goal.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Deterministic Annealing, Clustering, and Optimization} \par\noindent {\it Kenneth Rose} \par\noindent SCCS-32\ \ \ \ \ \ \ (Short Code: Rose:90f) \par This work introduces the concept of deterministic annealing (DA) as a useful approach to clustering and other related optimization problems. It is strongly motivated by analogies to statistical physics, but is formally derived within information theory and probability theory. This approach enables escaping local optima that plague traditional techniques, without the extremely slow schedules typically required by stochastic methods. The clustering solutions obtained by DA are totally independent of the choice of initial configuration.} \par A probabilistic framework is constructed, which is based on the principle of maximum entropy. The association probabilities at a given average distortion are Gibbs distributions parametrized by the corresponding Lagrange multiplier $\beta$, which is inversely proportional to the temperature in the physical analogy. By computing marginal probabilities within the framework, an effective cost is obtained, which is minimized to find the most probable set of cluster representatives at a given temperature. This effective cost is the free energy in statistical mechanics, which is indeed optimized at isothermal, stochastic equilibrium. \par Within the probabilistic framework, annealing is introduced by controlling the Lagrange multiplier $\beta$. This annealing is interpreted as gradually reducing the ``fuzziness'' of the associations. Phase transitions are identified in the process, which are, in fact, cluster splits. A sequence of phase transitions produces a hierarchy of fuzzy-clustering solutions. Critical $\beta$ are computed exactly for the first phase transition and approximately for the following ones. \par Specific algorithms are derivable from the general approach, to address different aspects of clustering in the large variety of application fields. Here, algorithms are derived, and simulation results are presented for the three major classes, namely, hard clustering, fuzzy clustering, and hierarchical clustering. From the experimental results it appears that DA is substantially superior to traditional techniques. \par The last part of the work extends the approach to deal with a larger family of optimization problems that can be reformulated as constrained clustering. A probabilistic framework for constrained clustering is derived based on the principle of maximum entropy. It is shown that for our annealing purpose, the constraint can be directly applied to the free energy. Three examples of constrained clustering are discussed. Mass-constrained clustering is formulated and yields an improvement of the clustering procedure. The process is now independent of the number of representatives and their multiplicity in the clusters. Secondly, the travelling salesman problem (TSP) is reformulated as constrained clustering, yielding the elastic net approach. A second Lagrange multiplier is identified, which is used to obtain a more powerful annealing method. Finally, self-organization of neural networks is shown to be closely related to TSP, and a similar annealing method is suggested. A fuzzy solution is sought to obtain the optimal net for a given training data set. \bigskip\starline\bigskip \vbox{\par\noindent{\bf A Hybrid Genetic Algorithm for Task Allocation in Multicomputers} \par\noindent {\it Nashat Mansour and Geoffrey Fox} \par\noindent SCCS-33\ \ \ \ \ \ \ (Short Code: Mansour:90a) \par A hybrid genetic algorithm for the task allocation problem (HGATA) in multicomputers is proposed. It minimizes the possibility of premature convergence and finds good solutions in a reasonable time. HGATA includes elitist ranking selection, variable rates for the genetic operators, the inversion operator and hill-climbing of individuals. Hill-climbing is done by a simple heuristic procedure tailored to the task allocation problem. HGATA also makes use of problem-specific information to evade some computational costs and to reinforce favorable aspects of the genetic search at some appropriate points. The experimental results on realistic test cases support the HGATA approach for task allocation.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Development of Software and Applications for Parallel Processors} \par\noindent {\it G. C. Fox, S. Ranka, and A. Choudhary} \par\noindent SCCS-34\ \ \ \ \ \ \ (Short Code: Fox:90gg) \par This is a proposed joint study with IBM Yorktown involving I:\ Initial Studies for Vulcan and II: Initial Studies for GF11.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf A Hybrid Genetic Algorithm for Task Allocation in Multicomputers} \par\noindent {\it Nashat Mansour and Geoffrey Fox} \par\noindent SCCS-36\ \ \ \ \ \ \ (Short Code: Mansour:91a) \par A hybrid genetic algorithm for the task allocation problem (HGATA) in multicomputers is proposed. It minimizes the possibility of premature convergence and finds good solutions in a reasonable time. HGATA includes elitist ranking selection, variable rates for the genetic operators, the inversion operator and hill-climbing of individuals. Hill-climbing is done by a simple heuristic procedure tailored to the task allocation problem. HGATA also makes use of problem-specific information to evade some computational costs and to reinforce favorable aspects of the genetic search at some appropriate points. The experimental results on realistic test cases support the HGATA approach for task allocation.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Nearterm Examples of the Possibilities of Parallel Computing for the Financial Industrial} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-37\ \ \ \ \ \ \ (Short Code: Fox:90ii) \par Three nearterm examples (Valuable and Scenario planning for Mortgage backed Securities; Dowquest; and Online Databases) of the possibilities of parallel computing for the financial industry. These are illustrative of the opportunities that will surely grow when we study them systematically as proposed under ACTION.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Gauge Theories on the Random-block Lattice} \par\noindent {\it T. W. Chiu} \par\noindent SCCS-38\ \ \ \ \ \ \ (Short Code: Chiu:90a) \par Gauge theories are formulated on the random-block lattice in which the sites are cartesian product of $D$ sets of random coordinates, one from each dimension, where $D$ is the dimension of the space-time. Some numerical results of the pure gauge theories are presented.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Parallel Software Engineering} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-39\ \ \ \ \ \ \ (Short Code: Fox:91p) \par ACTION proposes to develop parallel computing in New York State Industry. The funding from the State is devoted to developing applications and not the software engineering tools needed for these. I propose that we set up a project within CASE which will enhance ACTION but be clearly distinct from the basic project. I have chosen one difficult but clear (low risk) activity and one whose potential is obvious but whose implementation and success is uncertain. I suggest setting up the work to complement our projects with IBM so we maximize our interaction with them. I have a grant from Yorktown to develop software and algorithms for their new parallel machines GF11 and Vulcan. I also hope tos et up a project with Kingston and would like the ICASE work to both enhance these projects and increase the probability of getting IBM support by using CASE as matching funds. For both projects I would initially involve Alok Choudhary from ECE and Sanjay Ranka from CIS and students from both these departments. I also would like to link my work more strongly with that of Dan Pease.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Test Suite and Performance for Fortran 90 Compilers} \par\noindent {\it Min-You Wu and Geoffrey Fox} \par\noindent SCCS-40\ \ \ \ \ \ \ (Short Code: Wu:90b) \par We have three examples in our test suite now: Gaussian elimination, FFT, and N-body problem. For each of them, we have the following version of codes: \item{$\bullet$}sequential Fortran77 code \item{$\bullet$}Fortran77D code \item{$\bullet$}Fortran90 (CMFortran) code \item{$\bullet$}Fortran90D (CMFortranD) code \item{$\bullet$}hand-written Fortran77+MP code (initially run on iPSC/2) \item{$\bullet$}hand-compiled Fortran77+MP code from Fortran990D code (iPSC/2)} \par We use Gaussian elimination as an example for translating a Fortran90D program into a Fortran77+MP program. Then we list performance of all three examples. \bigskip\starline\bigskip \vbox{\par\noindent{\bf An Outline of Fortran90 Compiler for Distributed Memory Systems} \par\noindent {\it Min-You Wu and Geoffrey Fox} \par\noindent SCCS-41\ \ \ \ \ \ \ (Short Code: Wu:90c) \par This outline briefly describes the basic issues of a Fortran90 compiler for distributed memory systems. We discuss compiler directives, data partitioning and sequentialization, the techniques for the active area, communication insertion, and implementation of intrinsic functions. Some basic optimization techniques are also presented. This outline does not cover all aspects of the compiler, neither the details of implementation. Some primitive transformations are discussed without formal description.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Fortran D Language Specification} \par\noindent {\it G. C. Fox, S. Hiranadani, K. Kennedy, C. Koelbel, U. Kremer, C.-W. Tseng, and M.-Y. Wu} \par\noindent SCCS-42,42b,42c\ \ \ \ \ \ \ (Short Code: Fox:90n,91e,91h) \par This paper presents FORTRAN D, a version of FORTRAN 77 enhanced with data decomposition specifications. It is designed to support two fundamental stages of writing a data-parallel program: {\it problem mapping\/} using sophisticated array alignments, and {\it machine mapping\/} through a rich set of data distribution functions. We believe that FORTRAN D provides a simple machine independent programming model for most numerical computations. We intend to evaluate its usefulness for both programmers and advanced compilers for a variety of parallel architectures.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk given by Josef Stein, Visiting Researcher, at the SCCS/NPAC Weekly Meeting, entitled ``Coarse Grain Parallelization of Fortran-77 Programs'', January 22, 1991. SCCS-45 (Short Code: Stein:91a).} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Numerical Study of Laminar Separation Over an Annular Backstep} \par\noindent {\it A. Gaber Mohamed, D. T. Valentine, and R. E. Hessel} \par\noindent SCCS-46\ \ \ \ \ \ \ (Short Code: Mohamed:91a) \par Numerical simulations of laminar flows with recirculation regions were performed. The vorticity-stream function representation of the Navier-Stokes equations for two-dimensional flow in Cartesian coordinates and for axisymmetric flow in cylindrical coordinates were solved numerically. The algorithm developed to solve these systems of coupled, nonlinear partial differential equations implements a variable step Crank-Nicolson scheme with Richardson's extrapolation to discretize the time integration. It implements a fourth-order collocation method to solve the linear elliptic problems at the core of the computations. An iterative method is applied to solve for the nonlinearities over each time step. This fourth-order finite element algorithm was verified by simulating two nonparallel flow exact solutions to the Navier-Stokes equations to within the accuracy specified by the user. Next, the laminar flow field over the planar backstep located below a free-surface was computed and compared with available experimental data. Subsequently, the method was applied to investigate the flow over an inner-radius annual backstep. Solutions for steady flows over the annular backstep for Reynolds numbers from 100 to 400 were computed; the Reynolds number is based on the maximum velocity upstream of the step and the outer radius of the annulus.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by W. Furmanski, entitled ``MOVIE'', 1991. SCCS-47 (Short Code: Furmanski:91b).} \bigskip\starline\bigskip \vbox{\par\noindent{\bf A VLSI Neural Network for Color Constancy} \par\noindent {\it A. Moore, G. Fox, J. Allman, and R. Goodman} \par\noindent SCCS-48\ \ \ \ \ \ \ (Short Code: Moore:91a) \par A system for color correction has been designed, built, and tested successfully; the essential components are three custom chips built using sub-threshold analog CMOS VLSI. The system, based on Land's Retinex theory of color constancy, produces colors similar in many respects to those produced by the visual system. Resistive grids implemented in analog VLSI perform the smoothing operation central to the algorithm at video rates. With the electronic system, the strengths and weaknesses of the algorithm are explored.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by G. Fox, entitled ``ACTION Industrial Technology Transfer for NYS Industry for Parallel Computing'', presented at the Thursday Morning Roundtable, January 31, 1991. SCCS-50 (Short Code: Fox:91i).} \bigskip\starline\bigskip \vbox{\par\noindent{\bf JTF Quarterly Report and Project Summary} \par\noindent {\it Geoffrey C. Fox and Wojtek Furmanski} \par\noindent SCCS-51\ \ \ \ \ \ \ (Short Code: Fox:91q) \par This report---together with the previous one---summarizes the status of the JTF Center of Excellence at Syracuse. There are four current projects within the JTF project. \item{A)}MOVIE Software \item{B)}Move Applications \item{C)}Physical Computation \item{D)}Outreach \par Section~5 summarizes futures and Section~6 lists technical reports.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by G. Fox, entitled ``JTF Center of Excellence'', presented at NPAC as part of the JTF Quarterly Report, February 21, 1991. SCCS-54 (Short Code: Fox:91s).} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Syracuse University Joint Tactical Fusion Center of Excellence} \par\noindent {\it G. C. Fox and W. Furmanski} \par\noindent SCCS-57\ \ \ \ \ \ \ (Short Code: Fox:91u) \par We will establish a Joint Tactical Fusion (JTF) Center of Excellence studying advanced computational methods of relevance to JTF's mission. This research will include algorithms and software targeted at general purpose parallel computers as well as specialized neural network hardware. We will collaborate with both the TECHBASE program and federal laboratories including CSW (Center for Signals Warfare) and RADC (Rome Air Force Development Center). The research will be motivated by JTF problems and technology transfer to JTF will be a high priority. In the first year, we will concentrate on applications of the distributed software system, MOVIE, and its implementation on transputer arrays. We will also continue our research in complex systems and physically based optimization methods with applications to clustering and navigation.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Computational Science at Syracuse University} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-62\ \ \ \ \ \ \ (Short Code: Fox:91dd) \par Computer technology is advancing rapidly. By the year 2000, we can expect to see increases in both peak performance and cost-performance of a factor of 1000 compared to today's machines. At the lower end, personal computers running at one million operations per second today, will in ten years be capable of one billion calculations each second. At the high end, today's supercomputers peaking at about five billion scientific operations each second, will by the year 200 be ``teraflop'' machines capable of over a trillion such calculations each second. These advances will continue into the twenty-first century. This advance in computer technology is a wonderful opportunity but also a major challenge. At Syracuse University, we intend to be a national leader in both exploiting the opportunities and solving the challenges.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Computer Hardware, Advanced Mathematics, and Model Physics (CHAMMP) Program Research Grant} \par\noindent SCCS-63\ \ \ \ \ \ \ (Short Code: Fox:91w) \par TRW is strongly committed to the Global Change Initiative and to the CHAMMP Program in particular. We fully appreciate the central role that the next three years of research will play in the ability to perform reliable long-range interdecadal climate forecasts on a regional scale. Therefore, we have started a three-year Independent Research and Development Program of nearly one-million dollars, to evolve a parallel processing testbed where issues of advanced data management, networking, visualization, and software porting and validation for climate modeling will be investigated. This effort is contractually separate from the work to be performed under the proposed Grant, but its results will become available to the other CHAMMP Program researchers.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by G. Fox, entitled ``Parallel Computing: Software Applications and Their Linkage'', presentation slides B, February 28, 1991. SCCS-64 (Short Code: Fox:91w).} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Investigation of the Two-dimensional O(3) Model using the Overrelaxation Algorithm} \par\noindent {\it John Apostolakis, Clive F. Baillie, and Geoffrey C. Fox} \par\noindent SCCS-65\ \ \ \ \ \ \ (Short Code: Apostolakis:91a) \par We investigate the two-dimensional O(3) model with the standard action by Monte Carlo simulation at couplings $\beta$ up to 2.05. We measure the energy density, mass gap, and susceptibility of the model, and gather high statistics on lattices of size $L\le 1024$ using the floating point systems T-series vector hypercube. Asymptotic scaling does not appear to set in for this action, even at $\beta =20.5$, where the correlation length is 304. We observe a 20\% difference between our estimate $m/\Lambda_{\bar{MS}} = 3.52(6)$ at this $\beta$ and the recent exact analytical result. We use the overrelaxation algorithm interleaved with Metropolis updates and show that the decorrelation time scales with the correlation length and the number of overrelaxation steps per sweep. We determine its effective dynamical critical exponent to be $z' = 1.079(10)$; thus critical slowing down is reduced significantly for this local algorithm that is vectorizable and parallelizable.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Computational Science Education} \par\noindent SCCS-67\ \ \ \ \ \ \ (Short Code: Fox:91y) \par The emergence of computation as a fundamental methodology implies major changes in our educational system which will challenge our universities, community colleges, and schools in the next decade. Computers are being used in schools, even at the kindergarten level. This is necessary, but not sufficient. We need not only to teach students how to program, but instill the understanding that computing is fundamental and will change the nature of both their careers and their day to day life. Computing like the traditional reading, writing and arithmetic is a key enabling skill whose use and {\it understanding\/} permeates all other activities. Looking at applicants for graduate school (in Physics at Caltech), I saw few students who viewed computers as anything but a rather tiresome tool. why is this when most of them come from prestigious undergraduate institutions where the very latest computers abounded? The reason may be that such students are typically not given any courses that treat computation as fundamental and exciting. They are taught by faculty whose vision of computing comes from a time when computers were a useful but unexciting tool. This will change.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Confidential Documentation of the Caltech Proposals in Computational Science} \par\noindent SCCS-69\ \ \ \ \ \ \ (Short Code: Fox:91z) \par This option studies practical and fundamental issues in computation as they relate to the solution of scientific and engineering problems. The field bridges the gap between computer science, applied mathematics and the numerical approach to science. We see theory, experiment and computer simulation as three distinct approaches to science. Computational science provides the theoretical and practical underpinning of such computer simulation.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Association Graphs and Their Implementation on the Connection Machine} \par\noindent SCCS-70\ \ \ \ \ \ \ (Short Code: O'Farrell:91a) \par In this paper, we study an interesting collection of list structures which we call {\it association graphs\/} (A-graphs), since they are a generalization of the well-known association lists. Association lists, which in their basic form are merely linear lists of association pairs, are widely used in implementations of Functional and Logic Programming languages, as well as in Artificial Intelligence applications. Efforts to achieve fast implementation of association lists have led to the construction of several special purpose architectures for managing the lists. Also, the idea of storing collections of association lists was explored by Potter for the Massively Parallel Processor (MPP). The association mechanism of Ribeiro et al's associative memory chip is very similar to Potter's work. We have extended association lists to more general graphs by allowing sharing and circularity. These extensions may be suitable for architectural designs in involving multi-hosts sharing an association graph server on a special purpose processor.} \par In our current work, we have developed the association graph algorithms specifically for the Connection Machine, which is a SIMD machine with a hypercube communication network. This accomplishes the purpose of a special association graph processor---that of achieving lookups of associations in very fast (constant) time. However, one of the main goals of our work is to exploit special architectural features of the CM, in this case, the broadcast network and the global-or network. The availability of these fast networks on the CM is important for algorithm design because those operations can be assumed to run in constant time, unlike other hypercube machines. \par The original feature of our development of these association graph algorithms is in the data structures and algorithms for the garbage collection of nodes. In the association graphs, garbage collection is not another ``phase'' which doesn't interact with the graph operations. It occurs as part of any operation in which a host decides to forget an association, possibly making it, and all the associations linked to it, garbage. Although a naive SIMD implementation of garbage collection would expect to be $O(\log n)$, where $n$ is the length of the garbage list, our algorithm is $O(1)$. It achieves this time at the expense of keeping four extra fields per node, including a unique id for each linear segment of an association graph. it also keeps reference counts and cycle heads in order to keep track of the sharing and circularity. Although some graph operations which manipulate structure incur some overhead in order to keep these fields, there is no overhead imposed on the fast lookup. \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by G. Fox, entitled ``Slide Presentation D'', March 1991. SCCS-71 (Short Code: Fox:91aa).} \bigskip\starline\bigskip \vbox{\par\noindent{\bf White Paper for ONR Neural Supercomputer DoD University Research Initiative} \par\noindent {\it G. C. Fox} \par\noindent SCCS-72\ \ \ \ \ \ \ (Short Code: Fox:91bb) \par Our goal is to develop the basic methodology for constructing neural supercomputers which have ``general'' (what this means will be a product of research) capability and consist of many thousands of individual neural and digital ``nodes'' (chips) linked together. They will be neural analogy of today's Connection Machine, and NCUBE in the digital computer world. The research will cover hardware architecture (topology and interconnection of neural chip to neural chip and parallel neural to parallel digital system). Also, routing techniques (new hardware or can we use neurons), ``programming'', algorithmic implications (we can do an all to all interconnection on a chip but between chips connections will be sparser and slower), and analogies with cortex where rumored that 50\% of neurons are used for routing and not ``computing''. We will follow the same strategy used successfully with hypercube at Caltech and JPL, namely develop the new hardware architectures and general techniques by focussing on a few applications of interest to ONR.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf An Evaluation of High Performance and Parallel Computing Hardware and Software Needs for DoD Applications} \par\noindent SCCS-73\ \ \ \ \ \ \ (Short Code: Fox:91cc) \par Our goal is to provide information to DARPA and the DoD community concerning the suitability of parallel computing for their computational applications. This evaluation will discuss the implications of various aspects of high performance computing, including hardware and software architectures and systems, languages, libraries and tools.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Parallel Genetic Algorithms and Application to Load Balancing for Parallel Computation} \par\noindent {\it Nashat Mansour and Geoffrey C. Fox} \par\noindent SCCS-74\ \ \ \ \ \ \ (Short Code: Mansour:91b) \par A new coarse grain parallel genetic algorithm (PGA) and a new implementation of a data-parallel GA are presented in this paper. They are based on models of natural evolution in which the population is formed of discontinuous or continuous sub-populations. In addition to simulating natural evolution, the intrinsic parallelism in the two PGA's minimizes the possibility of premature convergence that the implementation of GA's often encounters. The PGA's have been implemented on a hypercube and a Connection Machine, and their operation is demonstrated by applying them to the static load balancing problem, also referred to as domain decomposition, in parallel computation. The PGA's have found near-optimal solutions that are better than those found by a sequential GA. In addition to the speed-up due to the parallel implementation, the PGA's evolve fit genotypes in a smaller number of generations by virtue of the intrinsic parallelism of the underlying evolution models. On one hand, the PGA's accentuate the advantage of parallel computers for simulating natural evolution. On the other hand, they represent a new approach to domain decomposition.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by J. Saltz, entitled ``Thoughts of a Great Man'', presented to NPAC, March 1991. SCCS-75 (Short Code: Saltz:91a).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by G. Fox, entitled ``NIH Site Visit Presentation'', March 21, 1991. SCCS-76 (Short Code: Fox:91ee).} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Express versus iPSC/2 Primitives: A Performance Comparison} \par\noindent {\it Ishfaq Ahmad and Min-You Wu} \par\noindent SCCS-77\ \ \ \ \ \ \ (Short Code: Ahmad:91a) \par Express, developed by Parasoft, is a software programming environment for writing parallel programs for homogeneous MIMD multiprocessors. It provides a communication system for communicating processes, mechanisms for data sharing, reading files, debugging tools, and performance analyzing tools. An important feature of EXPRESS is that these functionalities are carried out in a user transparent manner. The flexibility of EXPRESS makes it an attractive tool set for developing parallel programs. In addition, it has nice features which allow an application to use appropriate utilities. One of such features is automatic domain decomposition library which can map physical domain of the problem to the underlying topology of the parallel computing system. The performance evaluation tools, using text and graphics, can be effectively used to analyze the run time performance of the program.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Parallel Problem Architectures and Their Implications for Portable Parallel Software Systems} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-78\ \ \ \ \ \ \ (Short Code: Fox:91c) \par We show how the structure or architecture of applications suggest the nature of parallel software systems that will run portably on a variety of parallel machines---both those available now and those expected during the coming decade. The discussion is illustrated by lessons learned from real applications implemented on current MIMD and SIMD machines. These are mainly academic problems and the extrapolation to complex industrial and government applications is unproven but we believe our methodology will still be applicable.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Parallel Problem Architectures and Their Implications for Portable Parallel Software Systems} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-78b\ \ \ \ \ \ \ (Short Code: Fox:91ff) \par We show how the structure or architecture of applications suggest the nature of parallel software systems that will run portably on a variety of parallel machines---both those available now and those expected during the coming decade. The discussion is illustrated by lessons learned from real applications implemented on current MIMD and SIMD machines. These are mainly academic problems and the extrapolation to complex industrial and government applications is unproven but we believe our methodology will still be applicable.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by Geoffrey C. Fox, entitled ``Remarks on Fortran 90 and Distributed Memory LAPACK'', presented at BLACS Workshop, Houston, Texas, March 28, 1991. SCCS-79 (Short Code: Fox:91gg). We suggest how Fortran~90D and Fortran~77D could be used to produce a portable linear algebra library for general parallel machines.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by Geoffrey C. Fox, entitled ``Physical Computation ASAS Program Office TECHBASE Group at Syracuse University'', presented at APO TECHBASE Meeting, Syracuse NY April 9--10. SCCS-80 (Short Code: Fox:91hh).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by Wojtek Furmanski, entitled ``MOVIE and Map Separates'', presented at JTF Semiannual Coordination Meeting, Syracuse NY April 9--10, 1991. SCCS-81 (Short Code: Furmanski:91d).} \bigskip\starline\bigskip \vbox{\par\noindent{\bf MOVIE and Map Separates} \par\noindent {\it Wojtek Furmanski} \par\noindent SCCS-82\ \ \ \ \ \ \ \ (Short Code: Furmanski:91a) \par This paper contains the expanded version of my talk given at the JTF Semiannual Coordination Meeting at Syracuse University, Syracuse, NY, April 9--10, 1991. \par The first part of the paper contains the summary of the current status of the MOVIE system. I start from a brief overview, followed by a short presentation of new elements introduced to the system during the last quarter February--March, 1991. Finally, I discuss the MOVIE distribution issues such as ports to other software and/or hardware platforms, documentation, transfer to other TECHBASE applications and the intended business project.} \par The second part of the paper is devoted to the map separates project. I present first the global strategy, based on the neural expert model, followed by the summary of the current status of the project. In particular, I discuss a class of algorithms for extracting the `primitive separates', based on lateral local networks in the color separates space with the feedback connections from other modalities. Next, I describe the cluster algorithm in the color (RGB) cube, proposed as the dominant `first guess' feedback to the color separates networks. The concept is illustrated on a set of color plates, extracted from the corresponding MOVIE based tool. \bigskip\starline\bigskip \vbox{\par\noindent{\bf Parallel Computing and Education} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-83\ \ \ \ \ \ \ (Short Code: Fox:91a) \par In this issue we have learned that parallel architectures will lead to computers of dramatically greater performance and capability. How and why will this affect education? Will the impact be incremental, as happens when one discovers a new elementary particle, and this appears as an extra chapter in a multivolume book on introductory physics? Or will the impact be dramatic, as in biology where areas such as genetic and molecular biology have revolutionized the core knowledge taught in the field? I believe the latter is a better analogy, and that as educators adopt the use of parallel computers, these machines will lead to major changes, both in the way we teach, and in what we teach. Many agree with this assertion, and a few have started to implement its consequences.} \par Educators can address many educational challenges with parallel computers, and through realistic simulations, help scientists explain their predictions and discoveries. For example, I was impressed by a set of talks about global climate change at a recent conference. Each participant agreed that major, and probably undesirable changes in the environment were inevitable, unless strong action is taken soon. They may or may not have agreed that parallel computers will accurately predict the temperature rises coming from global warming. However, no one could see how to explain to the public, and in particular, to politicians, the urgency of the situation. The key, I believe, lies in widespread implementation of parallel computers in science and education. \bigskip\starline\bigskip \vbox{\par\noindent{\bf Preliminary Analysis Enhanced Clustered Fortran Language Specification} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-85\ \ \ \ \ \ \ (Short Code: Fox:91jj) \par The document describes an enhancement of Fortran~77 to include new primitives which handle both shared (ECF-SH) and/or distributed memory (ECF-DI). The shared memory model presumably follows the Parallel Computing Forum and will not be critiqued in this report. We note that the language does {\it not\/} support portability between shared and distributed memory execution. With known compiler technology, it is possible to emulate a distributed memory with shared memory hardware and so the ECF-DI primitives can be implemented on a shared memory machine. However, users with Fortran codes using ECF-SI will not be able to exploit parallelism on distributed memory hardware.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Joint Tactical Fusion Concurrent Processing Program} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-87\ \ \ \ \ \ \ (Short Code: Fox:91ll) \par We have established a Joint Tactical Fusion Center of Excellence at Syracuse University to study advanced computational methods of relevance to JTF's mission. The work of our first year focused on applications of the distributed software system, MOVIE, and its implementation on transputer arrays. In our second year we will dedicate our efforts to two major program areas; Physical Optimization/Physical Computation and Distributed Computing.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Compiling Fortran90 Programs for Distributed Memory MIMD Parallel Computers} \par\noindent {\it Min-You Wu and Geoffrey C. Fox} \par\noindent SCCS-88\ \ \ \ \ \ \ (Short Code: Wu:91a) \par This paper describes the design and motivation for a Fortran90 compiler, a source-to-source parallelizing compiler, for distributed memory systems. We discuss the methodology of parallelizing Fortran programs and the principle of compiler design. Then we describe compiler directives, data partitioning and sequentialization, communication insertion, and implementation of intrinsic functions. Some basic optimization techniques are also presented. We use an introductory example of Gaussian elimination to explain the compiling techniques. Other sample programs in our test suite, such as FFT and the N-body problem are briefly discussed with their performance.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Fortran90D Compiler for Distributed Memory MIMD Parallel Computers} \par\noindent {\it Min-You Wu and Geoffrey C. Fox} \par\noindent SCCS-88b\ \ \ \ \ \ \ (Short Code: Wu:91b) \par This paper describes the design and motivation for a Fortran90D compiler, a source-to-source parallel compiler for distributed memory systems. We discuss the methodology of parallelizing Fortran programs and the principle of compiler design. Different from Fortran77 parallelizing compilers, a Fortran90D compiler does not {\it parallelize\/} sequential constructs. Only parallelism expressed by Fortran90D parallel constructs is exploited. This makes a Fortran90D compiler much easier to implement. We describe compiler directives, partitioning, sequentialization, communication generation, and optimization techniques. We use an introductory example of Guassian elimination, among other programs in our test suite, to explain the compilation techniques.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} Copies of a Slide Presentation from the APO Techbase Semiannual Coordination Meeting, Geoffrey C. Fox, held at Syracuse University's Center for Science and Technology April 9--10, 1991. SCCS-89 (Short Code: Fox:91mm).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by Wojtek Furmanski, entitled ``MOVIE a Software System for Computational Science'', presented to the SCCS, April 17, 1991. SCCS-90 (Short Code: Furmanski:91e).} \bigskip\starline\bigskip \vbox{\par\noindent{\bf FortranD as a Portable Software System for Parallel Computers} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-91\ \ \ \ \ \ \ (Short Code: Fox:91d) \par We discuss how extensions of Fortran in a distributed computing environment may be the basis of a portable software system for a heterogeneous computing network consisting of SIMD and MIMD parallel machines connected with conventional (super) computers.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Approaches to Physical Optimization} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-92\ \ \ \ \ \ \ (Short Code: Fox:91d) \par Physical Optimization is the use of analogies from nature to solve optimization problems. This approach leads to approximate solutions with time complexities that are much lower than traditional exact methods. The algorithms parallelize straightforwardly and promise the solution of many practical problems on the large scale parallel computers that are expected in the future. We describe and contrast four related physical optimization methods;\ the well-known simulated annealing and neural network approaches, as well as the important but less well understood deterministic annealing and elastic network methods.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Note:\ Blocked LU Factorization in Engineering Applications on a Minisupercomputer} \par\noindent {\it A. Gabor Mohamed and Geoffrey C. Fox} \par\noindent SCCS-94\ \ \ \ \ \ \ (Short Code: Mohamed:91b) \par This note discusses methods of implementing the Level~3 BLAS in LU factorization used in engineering applications on the Alliant~FX/80 minisupercomputer. Three ways of expressing the LU factorization in terms of blocked algorithms using Level~3 BLAS are considered. We also compare the performance of the parallelism within the computational kernels using a noblock algorithm that employs Level~1 and Level~2 BLAS with that obtained over the kernels when using blocked LU with the Level~3 BLAS.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Garbage Collection Algorithms for the Connection Machine} \par\noindent {\it William G. O'Farrell} \par\noindent SCCS-95\ \ \ \ \ \ (Short Code: O'Farrell:91b) \par In this dissertation we study garbage collection (GC) on the Connection Machine (CM). In traditional GC, the objective is to identify memory cells which are no longer being used by the running program, so that those cells may be used again. On the CM, what are being collected are processors. Furthermore, traditional methods are inappropriate, as they would run only on the host machine. Therefore we explore massively parallel GC, in which the CM processors themselves are actively used in discovering which processors are garbage and which are not.} \par Two different classes of GC are studied in this dissertation. In the first, a separate GC phase is applied to what we call ``binary graphs'' or B-graphs, which are directed graphs in which the outdegree for nodes is less then or equal to two. If the indegree for nodes is allowed to be greater than one, then the B-graph is said to permit ``structure sharing''. We study B-graphs with and without structure sharing. Counted in ``number of parallel steps'', our algorithm has a running time of $O(\log (n))$ in the worst case (the graph is actually a linear list) and $O(\log (\log (n)))$ in the best case. Here $n$ is the number of nodes in the graph, and a ``parallel step'' may be any operation available on the CM, including general communication. \par The second class of GC studied in this dissertation is GC for Association Graphs (A-graphs), which are a generalization of association lists. A-graphs are directed graphs in which the outdegree of nodes is no greater than one. Unlike conventional association lists, A-graphs may have sharing and cycles. GC for A-graphs does not happen in a separate computational phase --- we incorporate it as part of the basic A-graph operations. Its running time is constant (i.e., $O(1)$). Furthermore, it is constant in a very genuine sense, because A-list operations use only the CM's broadcast and global-or networks. Although the delay time across these networks would grow logarithmically if the CM were scaled to larger and larger sizes, constant time is a reasonable assumption for the bounded size of todays CMs. \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by Clive Baillie, entitled ``Monte Carlo Simulation of Dynamically Triangulated Random Surfaces as Models of String Theory'', presented to SCCS/NPAC, April 29, 1991. SCCS-96 (Short Code: Baillie:91e).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by Enzio Marinari, entitled ``Random Self-Interacting Chains and Connections to Protein Folding'', presentation of research to NPAC/SCCS members, May 1991. SCCS-97 (Short Code: Marinari:91a).} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Evaluating Fortran D for NASA Distributed Supercomputing Applications} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-98\ \ \ \ \ \ (Short Code: Fox:91nn) \par We propose a collaboration between JPL and Syracuse which will both produce portable Fortran90D versions of JPL applications and also feed lessons from these codes into the Fortran90D compiler. The latter is being targeted at MIMD machines including the Intel Delta machine, JPL Mark~IIIfp, and the NCUBE~2. We have chosen two particular JPL applications; one of which is relatively straightforward and should fit into the current Fortran90D definition and one which will help our ongoing research (a Rice, Syracuse and NASA ICASE collaboration) into Fortran90D extensions for irregular loosely synchronous problems.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Evaluation of Parallel Computing for Over the Horizon Radat (OTH)} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-99\ \ \ \ \ \ (Short Code: Fox:91oo) \par We propose a two stage project to evaluate the applicability of parallel processing for OTH and to provide General Electric (GE) the technical support to incorporate parallel computing into future proposals.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Syracuse University School of Management Project} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-100\ \ \ \ \ \ \ (Short Code: Fox:91pp) \par The purpose of this project is to evaluate the potential for high performance computing applications in the field of management and economic modeling. We propose a two step process. First, we will conduct a survey of computing applications in management and economics to identify issues relevant to high performance computing. Second, we will implement 2--3 specific applications on parallel computing systems to illustrate the problem evaluation process and to demonstrate the benefits of parallel computing.} %% Not sure about this one. It might be wrong. See the next SCCS-101 %% %% \bigskip\starline\bigskip %% \vbox{\par\noindent{\bf Investigation of the Two-dimensional O(3) Model %% using the Overrelaxation Algorithm} %% \par\noindent %% {\it John Apostolakis, Clive F. Baillie, and Geoffrey C. Fox} %% \par\noindent %% SCCS-101\ \ \ \ \ \ \ (Short Code: Apostolakis:91a) %% \par %% We investigate the two-dimensional O(3) model with the standard action %% by Monte Carlo simulation at couplings $\beta$ up to 2.05. We measure %% the energy density, mass gap, and susceptibility of the model, and %% gather high statistics on lattices of size $L\le 1024$ using the %% floating point systems T-series vector hypercube. Asymptotic scaling %% does not appear to set in for this action, even at $\beta =20.5$, where %% the correlation length is 304. We observe a 20\% difference between our %% estimate $m/\Lambda_{\bar{MS}} = 3.52(6)$ at this $\beta$ and the recent %% exact analytical result. We use the overrelaxation algorithm %% interleaved with Metropolis updates and show that the decorrelation %% time scales with the correlation length and the number of overrelaxation %% steps per sweep. We determine its effective dynamical critical exponent %% to be $z' = 1.079(10)$; thus critical slowing down is reduced %% significantly for this local algorithm that is vectorizable and %% parallelizable.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf The 2d O(3) Model by Overrelaxation} \par\noindent {\it John Apostolakis, Clive F. Baillie, and Geoffrey C. Fox} \par\noindent SCCS-101\ \ \ \ \ \ \ (Short Code: Apostolakis:91b) \par } %% Lisa to provide abstract %% \bigskip\starline\bigskip \vbox{\par\noindent{\bf Parallel Cluster Algorithms} \par\noindent {\it P. D. Coddington and C. F. Baillie} \par\noindent SCCS-102\ \ \ \ \ \ \ (Short Code: Coddington:90b) \par Cluster update algorithms dramatically reduce critical slowing down in spin models, but unlike the standard Metropolis algorithm, it is not obvious how to implement these algorithms efficiently on parallel or vector computers. Here we present two different parallel implementations of the Swendsen-Wang algorithm which give reasonable efficiencies on various MIMD parallel computers.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf A Multi-Grid Cluster Labeling Scheme} \par\noindent {\it John Apostolakis, Paul Coddington, Enzo Marinari} \par\noindent SCCS-103\ \ \ \ \ \ \ (Short Code: Apostolakis:91c) \par We introduce a simple multi-scale algorithm for connected component labeling on parallel computers, which we apply to the problem of labeling clusters in spin and percolation models. We show that it is only logarithmically slowed down in the critical limit of bond percolation and the Ising model. We also discuss, in light of the proposed Teraflop computers optimized for lattice gauge theories and other lattice problems, the minimum requirements for simple computer switchboard architectures for which one can efficiently implement multi-scale algorithms to fight critical slowing down.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf The MOVIE Software Environment---Description and Applications} \par\noindent {\it R. G. Kennett} \par\noindent SCCS-104\ \ \ \ \ \ \ (Short Code: R. G. Kennett) \par In this paper, I describe MOVIE and four applications. These are the Map Separates problem for which it is currently being used, a High Energy Physics problem for which its use has been proposed and two projects under development, LIGO and EISDIS, for which MOVIE's use has not been considered before. For these, I discuss whether its application would be beneficial.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Thermal Rounding of the Charge Density Wave} \par\noindent {\it A. Alan Middleton} \par\noindent SCCS-108\ \ \ \ \ \ \ (Short Code: Middleton:91a) \par The rounding of the charge density wave depinning transition by thermal noise is examined. Hops by localized modes over small barriers trigger ``avalanches'', resulting in a creep velocity much larger than that expected from comparing thermal energies with typical barriers. For a field equal to the $T=0$ depinning field, the creep velocity is predicted to have a {\it power-law\/} dependence on the temperature $T$; numerical computations confirm this result. The predicted order of magnitude of the thermal rounding of the depinning transition is consistent with rounding seen in experiment.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Scattering Schedule for Problems and Unpredictable Structures} \par\noindent {\it Min-You Wu and Wei Shu} \par\noindent SCCS-109\ \ \ \ \ \ \ (Short Code: Wu:91c) \par An extended scatter scheduling was applied to problems with unpredictable, asynchronous structures. It has been found that with this simple scheduling strategy, good load balance can be reached without incurring much runtime overhead. This scheduling algorithm has been implemented on hypercube machines, and its performance is compared with other scheduling strategies.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Parallel Programming as an Optimization Problem or Intelligent Compilers for Parallel Computers} \par\noindent {\it Geoffrey C. Fox and Vasanth Balasundaram} \par\noindent SCCS-110\ \ \ \ \ \ \ (Short Code: Fox:90d) \par For at least the next few years, FORTRAN will be a critical language for scientific computation. One can view it either as a primary user interface or as a portable ``machine-language'' for which excellent compilers exist and which is a target for more user-friendly systems. Mapping a scientific problem onto a high performance computer can be viewed as a hard optimization problem where one minimizes some combination of user program development time and production program execution time. We consider how expert systems, neural networks, simulated annealing, and related methods can be integrated into a FORTRAN compiler to both give better code and feedback to the user on the appropriateness of a particular problem to particular hardware.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Hardware and Software Architectures for Irregular Problem Architectures} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-111\ \ \ \ \ \ \ (Short Code: Fox:90p) \par We study the hardware and, especially, software support for unstructured scientific simulations in the context of an application classification in terms of a problem architecture. We suggest that extensions of Fortran~90 may provide an overall framework for handling general loosely synchronous problems. These include over 90\% of current scientific simulations.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a SCCS poster session by D. J. Edelsohn and G. C. Fox, entitled ``Hierarchical Tree-Structures for Particle Simulations'', Self-Interacting Chains and Connections to Protein Folding'', May 23, 1991. SCCS-112 (Short Code: Edelsohn:91a).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by Geoffrey C. Fox entitled, ``Parallel Computing: Software, Applications and Their Linkage''. Revised Slide Talk A, April 1991. SCCS-113 (Short Code: Fox:91tt).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by Geoffrey C. Fox entitled, ``Parallel Computing: Software, Applications and Their Linkage''. Revised Slide Talk A, June 1991. SCCS-113b (Short Code: Fox:91uu).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by Geoffrey C. Fox, entitled ``Future of Big Computing: A Triumph for Lilliputions'', slide of article from New York Times, November 25, 1990. SCCS-114 (Short Code: Fox:90mm).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by Geoffrey C. Fox entitled, ``Parallel Computing: Some HPC Software Implications that We Have Learned from the Use of Parallel Computers'', Revised Slide Talk B, April 5, 1991. SCCS-115 (Short Code: Fox:91vv).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by Geoffrey C. Fox entitled, ``Portable Software for Parallel Computers'' at the SC91 USA/Pacific Supercomputing Conference, Santa Clara, CA, June 20, 1991. SCCS-115b (Short Code: Fox:91ww).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by Geoffrey C. Fox entitled, ``Opportunity for a New World-Class Industry; The Supercomputing Corridor''. Revised Slide Talk C, April 16, 1991. SCCS-116 (Short Code: Fox:91xx).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by Geoffrey C. Fox entitled, ``The Supercomputing Corridor: the Central New York Advantage'', May 1991. SCCS-117 (Short Code: Fox:91yy).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by Geoffrey C. Fox entitled, ``New Computer Services Company'', May 28, 1991. SCCS-118 (Short Code: Fox:91zz).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by Geoffrey C. Fox entitled, ``Computational Science'', May 28, 1991. SCCS-119 (Short Code: Fox:91ab).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by Geoffrey C. Fox entitled, ``Examples of Physical Computation'', May 28, 1991. SCCS-120 (Short Code: Fox:91ac).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by Geoffrey C. Fox entitled, ``Examples of Physical Computation'', May 28, 1991. SCCS-120b (Short Code: Fox:91ad).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a talk by Richard Shapiro entitled, ``UTRC Experience with the Connection Machine'', April 1991. Invited speaker for United Technologies Corporation. SCCS-121 (Short Code: Shapiro:91a).} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Physical Optimization Methods for Allocating Data to Multicomputer Nodes} \par\noindent {\it Nashat Mansour and Geoffrey C. Fox} \par\noindent SCCS-122\ \ \ \ \ \ \ (Short Code: Mansour:91d) \par Three optimization methods derived from natural sciences have been applied to the problem of allocating data to multicomputer nodes. These are a Simulated Annealing Algorithm, a Hybrid Genetic Algorithm, and a Bold Neural Network. In this paper, a review of the three methods is presented. Their performances are critically evaluated and compared for a number of examples with different topologies, sizes, and granularities. The performance criteria are solution quality, execution time, robustness, bias, and parallelizability. Simulation results show that the three methods have fairly diverse advantages and drawbacks. Hence, they can be suitable for different requirements. For example, the annealing and genetic algorithms yield comparable solutions and do not show a bias towards particular problem structures. The neural network produces solutions of a lower quality, but in a considerably shorter time.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from the Software Technology for High Performance Architectures Workshop, entitled ``Problems and Software Engineering Solutions to C$^3$I Applications'', June 12--14, 1991. SCCS-123 (Short Code: C3I:91a).} \bigskip\starline\bigskip \vbox{\par\noindent{\bf A Percolation Theory Approach to the Lowering of Ionization Potential in Hot and Dense Plasmas} \par\noindent {\it Jose Stein and David Salzmann} \par\noindent SCCS-125\ \ \ \ \ \ \ (Short Code: Stein:91b) \par Percolation theory in conjunction with a cylindrically symmetric Thomas-Fermi model are introduced to calculate the ionization potential lowering in hot and dense plasmas. The bound electron level shifts due to the interaction with the nearest neighbor are calculated by means of a cylindrically symmetric model. The critical energy for which a negative energy electron wavefunction overlaps a macroscopic portion of the plasma is then computed by percolation theory methods.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf The Effect of Percolation on the Ionization Potential in Hot and Dense Plasmas} \par\noindent {\it Josef Stein and David Salzmann} \par\noindent SCCS-126\ \ \ \ \ \ \ (Short Code: Stein:91c) \par The significant overlap between the ion spheres of randomly distributed neighboring ions in a hot and dense plasma, has important effects on the ionization potential of the ions. In addition to the effects described by the ion-sphere model, this overlap has two opposing effects on the ionization potential:} \par The first one is the lowering of the potential barrier between neighboring ions. This enables electrons with high enough (but still negative) energy to {\it percolate}, i.e. to travel freely over macroscopic distances throughout the plasma, or, using quantum theory terminology, to occupy continuum states. This effect is similar to the delocalization of electrons in random potentials. \par The second effect is the lowering of the bound states due to the superposition of the potentials of nearby ions, thereby raising the ionization potential. \bigskip\starline\bigskip \vbox{\par\noindent{\bf Software Engineering for High Performance C$^3$I Systems} \par\noindent {\it Geoffrey C. Fox, Wojtek Furmanski, and Donald Leskiw} \par\noindent SCCS-129\ \ \ \ \ \ \ (Short Code: Fox:91ag) \par We propose to define and plan the Northeast Software Consortium in High Performance C$^3$I Systems (NSC$^4$I) involving Rome Laboratories, industry, and universities. This consortium will develop and evaluate C$^3$I software engineering tools, which will scale and port to future high performance and parallel computers. We will suggest appropriate model C$^3$I applications as part of a broad survey. We will aid Rom Laboratory's PEEP project by indicating and evaluating hardware and software components.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Coherent Structures on a Boundary Layer in Raleigh-B\'enard Turbulence} \par\noindent {\it Michael J. Shelley and Michael Vinson} \par\noindent SCCS-131\ \ \ \ \ \ \ (Short Code: Shelley:91) \par Observations of Raleigh-B\'enard convection have shown the existence of various coherent structures in the boundary layer region of the convection cell in the regime of hard turbulence. These structures include long-lived waves, plumes, and propagating regions of rotational fluid, termed ``swirls''. Besides providing visualizations of these flow structures, the experimenters have conjectured that the observed waves are normal modes associated with the buoyant shear layer at the wall, and have provided a measured dispersion relation. We study here a simple model of such an unstably stratified boundary layer. The model consists of a two-dimensional layer of constant vorticity against a wall; this models the effect of viscosity and the large scale rolling motion in creating shear at the wall. The effect of buoyancy is included as a sharp density jump across the boundary of the shear layer. We study both the linear analysis of the model, and its nonlinear behavior through numerical simulation. The model reproduces much of the behavior observed in the experiment. We observe the formation of both plumes and swirls, similar in form and dynamics to those in the experiment. We find that plumes and swirls arise from the same instability mechanism, only differing in the amount of shear acting upon them. The linear and nonlinear behavior of the model's neutrally stable waves suggests that the observed waves are not normal modes of the boundary layer; neutrally stable waves of the model obey a different dispersion relation, and are nonlinearly unstable in superharmonic Raleigh-Taylor instabilities. We suggest that the observed waves reflect some other collective action of the system.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a slide presentation by Josef Stein, Mitanu Paul and Geoffrey C. Fox entitled, ``An Outer-Loop Parallelization of Existing, Real-Life Fortran-77 Programs'', July, 1991. SCCS-132 (Short Code: Stein:91d).} \bigskip\starline\bigskip \vbox{\par\noindent{\bf ASAS Quarterly Report} \par\noindent {\it Geoffrey C. Fox and Wojtek Furmanski} \par\noindent SCCS-133\ \ \ \ \ \ \ (Short Code: Fox:91ah) \par In this report we summarize our work within the ASAS Center of Excellence at Syracuse in the second quarter of 1991. Two main thrusts of our ASAS funded research are: Physical Computation and MOVIE system/applications.} \par Within the Physical Computation project we were continuing our work on multivehicle navigation and on genetic algorithms for load balancing. We also performed a comparative analysis of the three basic physical optimization techniques for load balancing: Simulated Annealing, Neural Networks and Genetic Algorithms. \par Within the MOVIE project we were continuing our software engineering work, both in the general purpose (system) sector as well as in the application sector, where we focus on the Map separates problem. \par We also started recently several new collaborations within SCCS, some of them involving use of MOVIE for new computational domains such as integrated databases, Virtual Reality or C$^3$I (Command and Control Center Intelligence). Some of these projects are synergistic with the TECHBASE program and we therefore expect parts of this work to be also of interest to ASAS. We intend to discuss these issues with ASAS during our next semiannual coordination meeting. \par In the following, we describe briefly the progress for both projects in the indicated period. We also list some relevant meetings and workshops and we discuss shortly the new collaborations/projects of potential interest to ASAS. \bigskip\starline\bigskip \vbox{\par\noindent{\bf The Architecture of Problems and Portable Parallel Software Systems} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-134\ \ \ \ \ \ \ (Short Code: Fox:91g) \par We show how the structure or architecture of applications suggest the nature of parallel software systems that will run portably on a variety of parallel machines---both those available now and those expected during the coming decade. The discussion is illustrated by lessons learned from real applications implemented on current MIMD and SIMD machines. These are mainly academic problems and the extrapolation to complex industrial and government applications is unproven but we believe our methodology will still be applicable. We suggest that problems consist of a heterogeneous mixture of at least three distinct classes for which the parallel software support could be different. Further, we need approaches to manage the integration of these different software paradigms.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from Geoffrey C. Fox entitled, ``Parallel Applications, Some Experience, and Some Lessons'' presented at IBM Workshop, Oberlech, Austria, July 18, 1991 (Talk I). SCCS-135 (Short Code: Fox:91ai).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from Harry Jordan entitled, ``Maximizing and Modeling Multiprocessor Performance using the Force'', November 2, 1989. SCCS-136 (Short Code: Jordan:89a).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from Geoffrey Fox as a presentation given to Gil Weigand as part of Site Visit, entitled ``Syracuse Center for Computational Science'', August 30, 1991. SCCS-142 (Short Code: Fox:91aj).} \bigskip\starline\bigskip \vbox{\par\noindent{\bf JTF Quarterly Report} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-143\ \ \ \ \ \ \ (Short Code: Fox:91ak) \par In this report we summarize our work within the JTF Center of Excellence at Syracuse in the first quarter of 1991. Two main thrusts of our JTF funded research are: Physical Computation and MOVIE system/applications.} \par Within the Physical Computation project, our main focus in the indicated period was on developing new genetic algorithms for resource allocation and comparing this method with other techniques such as networks and annealing. \par Within the MOVIE project, we were continuing assembling and integrating interface and computational tools, relevant for Map Separates. In particular, we completed the ISF/Motif interface as the first step of our Open Systems connectivity program and we implemented the multiple inheritance mechanism for the object-oriented model in MOVIE. \par In the following, we describe briefly the progress for both projects in the indicated period. More detailed discussion is given in the attached papers. \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from Geoffrey Fox entitled ``What We Have Learnt About Parallel Scientific Computation'', September 4, 1991 presentation at the Workshop on the Design and Use of Large-Scale Parallel Machines. Held at the University of Toronto and sponsored by IBM. SCCS-148 (Short Code: Fox:91al).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from Kim Mills entitled, ``Stock Option Pricing on the Connection Machine''. Presentation at Los Alamos Symposium, August 22, 1991. SCCS-149 (Short Code: Mills:91b).} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Implementing Spatial Environmental Models on Parallel Computing Systems} \par\noindent {\it Kim Mills, Geoffrey C. Fox, and Roy Heimbach} \par\noindent SCCS-150\ \ \ \ \ \ \ (Short Code: Mills:91d) \par Parallel computing will play an important role in integrating spatial environmental models for large-scale systems. Intervisibility analysis with error simulation in the digital elevation model is used to illustrate an approach to developing parallel models, and to demonstrate some benefits of high performance computing. The line of sight algorithms, and data distribution strategies for two intervisibility models run on the Connection Machine are described and partial program listings are appended. Collaboration between environmental and computational scientists will be required to develop more complex spatial environmental models.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from Geoffrey Fox entitled, ``Syracuse-Rice DARPA Project''. Presentation to Gil Weigand during Site Visit, August 30, 1991. SCCS-151 (Short Code: Fox:91am).} \bigskip\starline\bigskip \vbox{\par\noindent{\bf A Comparison of Load Balancing Methods for Parallel Computations} \par\noindent {\it Nashat Mansour and Geoffrey Fox} \par\noindent SCCS-154\ \ \ \ \ \ \ (Short Code: Mansour:91e) \par Three physical optimization methods are considered in this paper for load balancing parallel computations. These are simulated annealing, genetic algorithms, and neural networks. Some design choices and the inclusion of additional steps lead to new versions of the algorithms with different solution qualities and execution times. The performances of these versions are critically evaluated and compared for test cases with different topologies and sizes. Orthogonal recursive coordinate bisection is also included in the comparison as a typical simple deterministic method. Simulation results show that the algorithms have diverse properties. Hence, different algorithms can be applied to different problems and requirements. For example, the annealing and genetic algorithms produce better solutions and do not show a bias towards particular problem structures. But, they are slower than the neural network and recursive bisection. Preprocessing graph contraction is one of the additional steps suggested for the physical methods. It produces a significant reduction in execution time necessary for their applicability to large-scale problems.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Distributed Memory Compiler Methods for Irregular Problems---Data Copy Reuse and Runtime Partitioning} \par\noindent {\it R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis} \par\noindent SCCS-163\ \ \ \ \ \ \ (Short Code: Das:91a) \par This paper outlines two methods which we believe will play an important role in any distributed memory compiler able to handle sparse and unstructured problems. We describe how to link runtime partitioners to distributed memory compilers. In our scheme, programmers can {\it implicitly\/} specify how data and loop iterations are to be distributed between processors. This insulates users from having to deal explicitly with potentially complex algorithms that carry out work and data partitioning.} \par We also describe a viable mechanism for tracking and reusing copies of off-processor data. In many programs, several loops access the same off-processor memory locations. As long as it can be verified that the values assigned to off-processor memory locations remain unmodified, we show that we can effectively reuse stored off-processor data. We present experimental data from a 3-D unstructured Euler solver run on an iPSC/860 to demonstrate the usefulness of our methods. \bigskip\starline\bigskip \vbox{\par\noindent{\bf Zipcode: A Portable Communication Layer for High Performance Multicomputing---Practice and Experience} \par\noindent {\it A. Skjellum and M. Morari} \par\noindent SCCS-172\ \ \ \ \ \ \ (Short Code: Skjellum:91a) \par Sophisticated multicomputer applications require efficient, flexible, convenient underlying communication primitives. Here, the design, development, articulation and evaluation of {\it Zipcode}, a portable communication ``layer'' or library are discussed.} \par The primary research goals of {\it Zipcode\/} were as follows: high efficiency compared to lowest-level message primitives, user-definable message receipt selectivity, as well as abstraction of collections of processes and message selectivity to allow multiple, independently conceived concurrent libraries to work together without conflict. Furthermore, we recognize that there is a need for a portable, vendor-independent standard for message passing, and we are convinced that the notions incorporated by {\it Zipcode\/} are essential for such an application-level standard. \par {\it Zipcode\/} works atop the {\it Reactive Kernel/Cosmic Environment (RK/CE)}, a portable, ``light-weight'', multicomputer node operating system. Presently, the {\it Reactive Kernel\/} is implemented for Intel iPSC/1, iPSC/2, Symult S2010, and iPSC860 Gamma Prototype multicomputers and emulated on shared-memory computers such as the BBN TC2000 as well as networks of homogeneous NFS-connected workstations (e.g., Sun clusters). We see {\it RK\/} primitives as the logical low-level vendor-independent standard for message-passing applications, upon which a higher-level standard such as {\it Zipcode\/} can be built. \par The power of {\it Zipcode\/} and flexibility of its message-passing classes and contexts allows effective emulation of other, perhaps common, less expressive message-passing notations. Such emulators fall into the new package, {\it UPU}, currently under development. While this represents work in progress, a complete {\it UPU\/} emulation of the Livermore LMPS notation is complete. This work will be detailed in a future publication, as will the efforts to emulate and support {\it RK/CE\/} in the BBN TC2000 environment. \bigskip\starline\bigskip \vbox{\par\noindent{\bf Parallel Processing for Computer Vision and Image Understanding} \par\noindent {\it Alok Choudhary and Sanjay Ranka} \par\noindent SCCS-178\ \ \ \ \ \ \ (Short Code: Choudhary:92b) \par What is vision? From one point of view, ``it is the process of recognizing objects of interest from images''. The word ``process'' refers to some form of processing performed on the input data, which may be one or more images. The phrase ``objects of interest'' implies the existence of a context under which this processing must take place and the existence of a representation used in the processing. For example, if we were asked, ``Is there a table in this room?'', we would have some representation of a table in our mind and look for something similar to it in the room. In that process, we would ignore objects that did not look like a table. That is, we started with a model and looked for some instance of the model in the room. On the other hand, if we were asked, ``Describe all the objects in this room'', we would scan the room, form some representation of the objects, and match them to some objects that are part of our ``knowledge base''. However, we may or may not know what to expect in the room. Both problems can be considered ``vision problems''. A general vision problem, therefore, is considered to be an ill-posed problem from a computational perspective. Vision has fascinated researchers from various disciplines such as psychology, neural science, computer science and engineering, etc. for a long time and a good discussion on various aspects of vision can be found.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from Nancy McCracken entitled, ``Advances in Parallel Computing and the ACTION Program'', October 7, 1991. SCCS-179 (Short Code: McCracken:91a).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from Geoffrey C. Fox entitled, ``Undergraduate Research in Computational Science'', presented at ASEE Meeting, October 4, 1991. Slide Talk N, SCCS-180 (Short Code: Fox:91az).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from Geoffrey C. Fox entitled, ``General Remarks and Lessons from CRPC and Contributions of SCCS to Coop'', presented at Rome Laboratory Software Engineering Coop, Rome, New York, October 8--9, 1991. SCCS-182 (Short Code: Fox:91bc).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from Geoffrey C. Fox entitled, ``Parallel Computing: New Era in Computation'', October 18, 1991, Slide Talk O. SCCS-185 (Short Code: Fox:91bd).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from Geoffrey C. Fox entitled, ``Syracuse Center for Computational Science'', November 5, 1991, Slide Talk P, SCCS-186 (Short Code: Fox:91be).} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from Geoffrey C. Fox entitled, ``An Overview: Syracuse Center for Computational Science'', September 17, 1991, Slide Talk J, SCCS-192 (Short Code: Fox:91bj).} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Hierarchical Tree-Structures as Adaptive Meshes} \par\noindent {\it David Edelsohn and Geoffrey C. Fox} \par\noindent SCCS-193\ \ \ \ \ \ \ (Short Code: Edelsohn:91b) \par New adaptive mesh refinement algorithms provide an opportunity to utilize the same hierarchical tree-structures developed for multipole-based particle simulations in grid-based simulations of both continuum and particle problems. Representing both a multipole method simulation and an adaptive mesh simulation with this same structure provides a natural formalism with which to unite these two classes of solvers. This paper discusses how both methods exploit the same basic principle of locality evident in many systems such as those governed by Poisson's Equation. The locality of the systems and the resulting algorithms provide important benefits for implementations on massively parallel computers.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf A Semi Distributed Task Allocation Strategy for Large Hypercube Supercomputers} \par\noindent {\it Ishfaq Ahmad and Arif Ghafoor} \par\noindent SCCS-194\ \ \ \ \ \ \ (Short Code: Ahmad:91b) \par This paper presents a semi distributed approach for task scheduling in large parallel and distributed systems which is different from the conventional centralized and fully distributed approaches. The proposed strategy partitions the system into independent regions (spheres) centered at some control points. The central points called schedulers, optimally schedule tasks within their spheres and maintain state information with low overhead. We consider Hypercube systems for evaluation and using its algebraic characteristics, show that identification of spheres and their scheduling points is an NP-complete program. An efficient solution for this problem is presented by making an exclusive use of a combinatorial structure known as Hadamard Matrix. Performance of the proposed strategy was evaluated and compared with an efficient fully distributed strategy, through simulation. In addition to yielding high performance in terms of response time, better resource utilization and throughput, the proposed strategy is shown to incur small overhead in terms of network traffic.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Semi-Distributed Load Balancing for Massively Parallel Multicomputer Systems} \par\noindent {\it Ishfaq Ahmad and Arif Ghafoor} \par\noindent SCCS-195\ \ \ \ \ \ \ (Short Code: Ahmad:91c) \par This paper presents a semi-distributed approach, for load balancing in large parallel and distributed systems, which is different from the conventional centralized and fully distributed approaches. The proposed strategy uses a two-level hierarchical control by partitioning the interconnection structure of a distributed or multiprocessors system into independent symmetric regions (spheres) centered at some control points. The central points, called schedulers, optimally schedule tasks within their spheres and maintain state information with low overhead. We consider interconnection structures belonging to a number of families of distance transitive graphs for evaluation, and using their algebraic characteristics, show that identification of spheres and their scheduling points is, in general, an NP-complete problem. An efficient solution for this problem is presented by making an exclusive use of a combinatorial structure known as the Hadamard Matrix. Performance of the proposed strategy has been evaluated and compared with an efficient fully distributed strategy, through an extensive simulation study. In addition to yielding high performance in terms of response time and better resource utilization, the proposed strategy incurs less overhead in terms of control messages. It is also shown to be less sensitive to the communication delay of the underlying network.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Fault-Tolerant Load Management for Real-Time Distributed Computer Systems} \par\noindent {\it Ishfaq Ahmad and Arif Ghafoor} \par\noindent SCCS-196\ \ \ \ \ \ \ (Short Code: Ahmad:91d) \par This paper presents a fault-tolerant scheme applicable to any decentralized load balancing algorithms used in soft real-time distributed systems. Using the theory of distance-transitive graphs for representing topologies of these systems, the proposed strategy partitions these systems into independent symmetric regions (spheres) centered at some control points. These central points, called fault-control points, provide a two-level task redundancy and efficiently re-distribute the load of failed nodes within their spheres. Using the algebraic characteristics of these topologies, it is shown that the identification of spheres and fault-control points is, in general, an NP-complete problem. An efficient solution for this problem is presented by making an exclusive use of a combinatorial structure known as the Hadamard matrix.} \par Assuming a realistic failure---repair system environment, the performance of the proposed strategy has been evaluated and compared with no fault environment, through an extensive and detailed simulation. For our fault-tolerant strategy, we propose two measures of goodness, namely, the percentage of re-scheduled tasks which meet their deadlines and the overhead incurred for fault management. It is shown that using the proposed strategy, up to 80\% of the tasks can still meet their deadlines. The proposed strategy is general enough to be applicable to many networks, belonging to a number of families of distance transitive graphs. Through simulation, we have analyzed the sensitivity of this strategy to various system parameters and have shown that the performance degradation due to failures does not depend on these parameter. Also, the probability of a task being lost altogether due to multiple failures has been shown to be extremely low. \bigskip\starline\bigskip \vbox{\par\noindent{\bf Evaluating the Performance of PARTI} \par\noindent {\it Ishfaq Ahmad and Min-You Wu} \par\noindent SCCS-197\ \ \ \ \ \ \ (Short Code: Ahmad:91e) \par In this report, we describe the evaluation of overhead associated with PARTI (Parallel Automated Run Time Toolkit at ICASE). PARTI is a set of primitives which are used in loops for irregular problems on distributed memory architectures. When irregular arrays are distributed over the processors of the system, the operations within a loop on some processor may require array elements from the memory of some other processors. Communication for accessing array elements, for multiple times, within a loop can result in significant overhead which in turn results in loss of performance. If the elements which need to be fetched within the loops are accessed before executing the loop, significant savings can be made in the time spent in communication otherwise. This leads to the concept to INSPECTOR and EXECUTOR for problems with irregular structures. An INSPECTOR is a program module which performs pre-processing by examining the loops and generates the set of communication patterns which are required within the loops to access off-processor data elements. The information generated by the INSPECTOR is then used to pre-fetch those data elements. The actual execution of the program is done by the EXECUTOR which uses the pre-fetched data elements within the loops without incurring the penalty for communication overhead during processing.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Semi-Distributed Load Balancing for Massively Parallel Multicomputer Systems} \par\noindent {\it Ishfaq Ahmad and Arif Ghafoor} \par\noindent SCCS-198\ \ \ \ \ \ \ (Short Code: Ahmad:91f) \par This paper presents a semi-distributed approach for load balancing in large parallel and distributed systems, which is different from the conventional centralized and fully distributed approaches. The proposed strategy uses a two-level hierarchical control by partitioning the interconnection structure of a distributed or multiprocessor system into independent symmetric regions (spheres) centered at some control points. The central points, called schedulers, optimally schedule tasks within their spheres and maintain state information with low overhead. We consider interconnection structures belonging to a number of families of distance transitive graphs for evaluation, and using their algebraic characteristics, show that identification of spheres and their scheduling points is in general an NP-complete problem. An efficient solution for this problem is presented by making an exclusive use of a combinatorial structure known as the Hadamard Matrix. Performance of the proposed strategy has been evaluated and compared with an efficient fully distributed strategy, through an extensive simulation study. In addition to yielding high performance in terms of response time and better resource utilization, the proposed strategy incurs less overhead in terms of control messages. It is also shown to be less sensitive to the communication delay of the underlying network.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Performance Prediction, Evaluation and Comparison of Decentralized Load Balancing Strategies} \par\noindent {\it Ishfaq Ahmad, Arif Ghafoor, and Kishan Mehrotra} \par\noindent SCCS-198\ \ \ \ \ \ \ (Short Code: Ahmad:91g) \par A performance evaluation approach for comparing different distributed load balancing schemes on a unified basis is presented. This approach is an integration of an analytical model, statistical analyses and simulation, and takes into account all the system parameters that can possibly affect the performance. First, we show that all the sender---initiated distributed load balancing strategies can be modeled by a central server open queuing network. Next, these load balancing strategies can be characterized by only two queuing parameters---the average execution queue length and the probability that a newly arrived task is executed locally or migrated to another node. To capture the relation between these queuing parameters and various system parameters, a statistical analysis has been carried out on the empirical data obtained through simulation. The analytical queuing model is then employed to predict the response time of the system with any combination of system parameters. For performance evaluation, the task migration cost and scheduling overhead are also taken into account, and the results are again validated through simulation. Experimental results are obtained for six different load balancing schemes and their relative performance is compared. The proposed model provides performance results in a straightforward manner and can be beneficial in assessing the system under varying conditions. Another advantage of this model is that the sensitivity of each load balancing strategy can be determined for each system parameter.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Performance Prediction of Distributed Load Balancing on Multicomputer Systems} \par\noindent {\it Ishfaq Ahmad, Arif Ghafoor, and Kishan Mehrotra} \par\noindent SCCS-200\ \ \ \ \ \ \ (Short Code: Ahmad:91h) \par This paper presents a performance evaluation approach to compare different distributed load balancing schemes on a unified basis. This approach is an integration of simulation, statistical and analytical models, and takes into account the fundamental system parameters that can possibly affect the performance. We show that all the sender-initiated distributed load balancing strategies can be modeled by a central server open queuing network. Furthermore, these load balancing strategies can be characterized by only two queuing parameters---the average execution queue length and the probability that a newly arrived task is executed locally or migrated to another node. To capture the relation between these queuing parameters and various system parameters, a statistical analysis has been carried out on the empirical data obtained through simulation. The analytical queuing model is then used to predict the response time of a system with any combination of system parameters. Experimental results are obtained for six different load balancing strategies. The proposed model provides performance results in a straightforward manner and can be beneficial to the system designers in assessing the system under varying conditions.} \bigskip\starline\bigskip \vbox{\par\noindent {\bf Transparencies} from a seminar talk by Kim Mills entitled, ``NPAC's Financial Modeling Project: Option Pricing on Parallel Computers'', September 24, 1991. SCCS-201 (Short Code: Mills:91f).} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Binomial Approximations of American Call Option Prices with Stochastic Volatilities} \par\noindent {\it T. J. Finucane} \par\noindent SCCS-202\ \ \ \ \ \ \ (Short Code: Finucane:91a) \par This study describes and tests a simple, efficient method of approximating stochastic volatility call option prices that is lattice based, and can easily accommodate the value of early exercise for American options. The proposed model exploits the fact that stochastic volatility call option prices are approximately linear in the degree of correlation between the asset price and the volatility. By calculating binomial prices for perfect and zero correlations, a simple linear function of the binomial prices can be used to accurately estimate call option prices for any other degree of correlation.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Lessons from Massively Parallel Architectures on Message Passing Computers} \par\noindent {\it G. C. Fox} \par\noindent SCCS-214\ \ \ \ \ \ \ (Short Code: Fox:91n) \par We review a decade's work on message passing MIMD parallel computers in the areas of hardware, software and applications. We conclude that distributed memory parallel computing works, and describe the implications of this for future portable software systems.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Complete Mode-Locking in Models of Charge-Density Waves} \par\noindent {\it A. Alan Middleton, Ofer Biham, P. B. Littlewood, and P. Sibani} \par\noindent SCCS-217\ \ \ \ \ \ \ (Short Code: Middleton:91b) \par Mode-locking in a.c. driven charge-density waves (CDW's) is studied numerically in two and three dimensions, using both continuous equations of motion and a cellular automaton model. As the system size is increased, a complete devil's staircase of steps is approached. The fractal dimension of the set of gaps in the staircase is found to be $D_0 = 0.75 \pm 0.04$, $0.88 \pm 0.05$ in two and three dimensions, respectively, in the automaton model. The spectrum of singularities $f(\alpha )$ is calculated and is related to the dynamic critical exponent $\zeta$ for the CDW depinning transition.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Benchmarking the CM-5 for Image Processing Applications} \par\noindent {\it Ravi Shankar, Ravi Ponnusamy, and Sanjay Ranka} \par\noindent SCCS-233\ \ \ \ \ \ \ (Short Code: Shankar:92a) \par This paper presents benchmarking results for image processing algorithms on the Connection Machine model CM-5 and compares them with the results from the CM-2 and the Sun-4. Image processing algorithms with varying communication and computational requirements were implemented, tested and timed. The performance and the scalability of the CM-5 were analyzed and compared with that of the CM-2.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Parallelization of CHARMM Molecular Dynamics Code on Multicomputers} \par\noindent {\it S. Ranka, G. C. Fox, J. Saltz, and R. Das} \par\noindent SCCS-236\ \ \ \ \ \ \ (Short Code: Ranka:92a) \par This paper discusses different algorithms for implementing molecular dynamics in the program CHARMM on multicomputers. The molecular dynamics is the computationally intensive part of CHARMM and requires almost all of the time for a typical application. The molecular dynamics can be divided into four main categories of subroutines:\ the calculation of forces, the integration of equations of motion, the correction of atomic coordinates because of constraints and the generation and update of the non-bonded list. We examine each of them in detail and discuss their scalability on multicomputers.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Simulated Tempering: A New Monte Carlo Scheme} \par\noindent {\it Enzo Marinari and Giorgio Parisi} \par\noindent SCCS-241\ \ \ \ \ \ \ (Short Code: Marinari:92a) \par We propose a new global optimization method (\it Simulated Tempering\/) for simulating effectively a system with a rough free energy landscape (i.e., many coexisting states) at finite non-zero temperature. This method is related to simulated annealing, but here the temperature becomes a dynamic variable, and the system is always kept at equilibrium. We analyze the method on the Random Field Ising Model, and we find a dramatic improvement over conventional Metropolis and cluster methods. We analyze and discuss the conditions under which the method has optimal performances.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Visualization of High Energy Physics Data using AVS} \par\noindent {\it Tomasz Haupt} \par\noindent SCCS-243\ \ \ \ \ \ \ (Short Code: Haupt:92a) \par There is an increasing gap between the computational tools used in High Energy Physics research and achievements in modern hardware and software technologies. More and more often one can hear opinions that physicists should reduce their time spent on generating code and start making use of commercially available high-tech packages. Apparently, this seems to be the only way to fully utilize the available hardware, including a high resolution three-dimensional color graphics, capability of distributed or parallel computing, etc.} \par To verify this, I used the commercially available Application Visualization System (AVS) package by Stardent/AVS, Inc., as a tool for visualization of experimental data. There are many places in the analysis of high energy experiment I found the visualization to be very useful. One example is an event by event analysis. The means provided by the AVS system make it possible to create a very realistic event display, with all geometric aspects preserved. The three-dimensional rotations and continuous zooming capability gives full access to any detail of the analyzed event. \par Another class of applications is the monitoring of a detector's performance. Superimposing of a detector characteristics on a high fidelity visual representation of the detector geometry provides a very clear picture of the detector's performance, an extremely useful tool for monitoring and trouble-shooting. Moreover, the AVS systems enables creation of simple animations to demonstrate changes of the detector parameters with time. \par Maybe, the most interesting application of an advanced visualization system is, however, the off-line analysis of the experimental data. Three dimensional histograms (possibly supplemented by color and animation as a tool to operate on a five-dimensional space of parameters) gives deeper insight into the nature of the investigated problem. In addition, an easy way of applying even very complex cuts by manipulating real or virtual dials and sliders, and making projections onto flat or curvilinear subspaces and finding iso surfaces makes an analysis fast and very efficient. Examples of each category mentioned above are presented in this report, to illustrate capability of the AVS system. However, this should be regarded as a pilot study to determine the feasibility of using the AVS system for visualization of high energy physics data rather than a fully comprehensive catalog of its possible applications. \bigskip\starline\bigskip \vbox{\par\noindent{\bf A Test Suite Approach for Fortran90D Compilers on MIMD Distributed Memory Parallel Computers} \par\noindent {\it Min-You Wu and Geoffrey C. Fox} \par\noindent SCCS-244\ \ \ \ \ \ \ (Short Code: Wu:92b) \par This paper describes a test suite approach for a Fortran90D compiler, a source-to-source parallel compiler for distributed memory systems. Different from Fortran77 parallelizing compilers, a Fortran90D compiler does not parallelize sequential constructs. Only parallelism expressed by Fortran90D parallel constructs is exploited. We discuss compiler directives and the methodology of parallelizing Fortran programs. An introductory example of Gaussian elimination is used, among other programs in our test suite, to explain the compilation techniques.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Concurrent Interstate Conflict Simulations: Testing the Effects of the Serial Assumption} \par\noindent {\it Gavan Duffy} \par\noindent SCCS-250\ \ \ \ \ \ \ (Short Code: Duffy:91a) \par The interstate conflict simulation model of Bremer and Mihalka, as extended by Cusack and Stoll, is reimplemented in a massively parallel computational environment. By observing the impact of controlling parameters upon pluralism in simulated state systems, Cusack and Stoll investigate the consequences of competing propositions of the realist school of international relations theory. The simulation method avoids reliance on the highly unrealistic simplifying assumptions characteristic of deductive assessments of realisms. However, their simulation model relies on the serial assumption:\ that only one conflict can occur at one time. The massively parallel implementation removes this assumption, simulating multiple conflicts concurrently. While some minor differences across the serial and parallel implementations are observed at the system level, sharper differences appear in the survival probabilities of states adopting particular styles of decision making. Additionally, relaxation of the Cusack-Stoll assumption that states misperceive others' capabilities at a constant rate of error produces substantial effects at the level of system endurance.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Compiling Fortran 77D and 90D for MIMD Distributed-Memory Machines} \par\noindent {\it A. Choudhary, G. Fox, S. Ranka, S. Hiranandani, K. Kennedy, C. Koelbel and C-W. Tseng} \par\noindent SCCS-251\ \ \ \ \ \ \ (Short Code: Choudhary:92c) \par Fortran D provides a set of data decomposition specifications for either Fortran~77 of Fortran~90. In this paper we present an integrated approach to compiling Fortran~77D and Fortran~90D programs into SPMD (Single Program Multiple Data) message-passing programs for efficient execution on MIMD distributed-memory machines. The integrated Fortran~D compiler relies on two key observations. First, array constructs may be {\it scalarized\/} by the Fortran~90 front end into {\caps forall} loops without loss of information. Second, {\it loop fusion}, {\it partitioning}, and {\it sectioning\/} optimizations in the Fortran~D back end can be useful for both Fortran~D dialects.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Implementation and Scalability of Fortran~90D Intrinsic Functions on Distributed Memory Machines} \par\noindent {\it I. Ahmad, A. Choudhary, G. Fox, K. Parasuram, R. Ponnusamy, S. Ranka, and R. Thakur} \par\noindent SCCS-256\ \ \ \ \ \ \ (Short Code: Ahmad:92a) \par We are developing a Fortran~90D compiler, which converts Fortran~90D code into Fortran~77 node programs for distributed memory machines. This paper presents the performance results of Fortran~90D intrinsic functions in Intel iPSC/860 and iPSC/2 hypercubes. We discuss the implementation of those intrinsic functions and show that our implementations are scalable.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Prospects for Parallel Computing} \par\noindent {\it Geoffrey C. Fox} \par\noindent SCCS-265\ \ \ \ \ \ \ (Short Code: Fox:92g) \par The best enterprises have both a compelling need pulling them forward and an innovative technology solution pushing them on. In high performance computing, we have the need for increased computational power in many applications and the inevitable long term solution is massively parallelism. In the short term, the relation between pull and push may seem unclear as novel algorithms and software needed to support parallel computing. However, eventually parallelism will be present in all computers---including those in your children's video game, your personal computer or workstation and the central supercomputer.} \bigskip\starline\bigskip \vbox{\par\noindent{\bf Scheduling Regular and Irregular Communication Patterns on the CM-5} \par\noindent {\it Ravi Ponnusamy, Rajeev Thakur, Alok Choudhary, and Geoffrey Fox} \par\noindent SCCS-274\ \ \ \ \ \ \ (Short Code: Ponnusamy:92a) \par In this paper, we study the communication characteristics of the CM-5 and the performance effects of scheduling regular and irregular communication patterns on the CM-5. We first study the impact of path length and message size on communication time. Then we consider the scheduling of regular communication patterns such as complete exchange and broadcast. We have implemented four algorithms for complete exchange and two algorithms for broadcast. We have also implemented four algorithms for scheduling irregular communication patterns and studied their performance on the communication patterns of several synthetic as well as real problems such as the conjugate gradient solver and the Euler solver.} \vfill\eject \bye