\input/home/sashimi/kea/fonts.tex \overfullrule=0pt \hsize=5.5in \hoffset=0.5in \vsize=8.0in \voffset=0.5in \elevenpoint \footline{\ifnum\pageno=1 {\hfil} \else {\hss\tenrm\folio\hss}\fi} \centerline{\bf Concurrency: Practice and Experience} \vskip 12pt \centerline{\bf Accepted Abstracts for 1994} \vskip 0.5in \centerline{\bf Beyond Software Performance Visualization} \medskip \centerline{\it M. Abrams, T. J. Lee, H. T. Cadiz, and K. Ganugapati} \medskip Performance visualization tools of the last decade have yielded new insights into the behavior of sequential, parallel, and distributed programs. However, they have three inherent limitations: (1) They only display what happened in {\it one\/} execution of a program. (This is dangerous when analyzing current applications, which are prone to non-deterministic behavior.) (2) A human uses one or more bandwidth-limited senses with a visualization tool. (This limits the scalability of a visualization tool.) (3) The relationship of ``interesting'' program events are often separated in time by other events; thus discerning time dependent behavior often hinges on finding the ``right'' visualization---a possibly time-consuming activity. CHITRA93 complements visualization systems, while alleviating these limitations. CHITRA93 analyzes a set (or ensemble) of traces by combining the visualization of ensemble to empirical models that capture the time dependent relationships of ``interesting'' program events through application, programming language, and computer architecture independent analysis techniques (addressing (2) and (3)). CHITRA93 incorporates: transforms, such as aggregation, that simplify the ensemble and reduce the state space size of the models generated; a user interface that allows certain transforms to be selected by editing the visualization with a mouse; homogeneity tests that allow partitioning of an ensemble; an efficient semi-Markov model generation algorithm whose computation time is linear in the sum of the lengths of the traces comprising the ensemble; and a CHAID-based model that can fathom non-Markovian relationships among transitions in the traces. The use of CHITRA93 is demonstrated by partitioning ten parallel database traces with nearly 8,000 states into two homogeneous subsets, each modeled by an irreducible, periodic, and hierarchical stochastic process with as few as four states. \vskip 8pt \noindent{\bf Short Code:\ \ }[Abrams:94a] \vskip 5pt \noindent{\bf Reference:\ \ \ \ }Accepted---11/21/94 (C219) \vskip 0.25in \centerline{\bf A Toolkit for Parallel Functional Programming} \medskip \centerline{\it P. H. Hartel, R. F. H. Hofman, K. G. Langendoen, H. L. Muller, W. G. Vree, and L. O. Hertzgerger} \medskip Out toolkit for the design and implementation of parallel functional programs supports the stepwise development of parallel programs from a high-level sequential specification to an optimized parallel implementation. The toolkit is used as follows: \item{1.} The algorithm to be implemented is specified in a functional language. The program is debugged and tested using an interpreter. \item{2.} The program is compiled for a sequential machine. Its performance is analyzed and improved. \item{3.} Annotation driven transformations are applied to the program to indicate parallel tasks. Simulations at task level, basic block level and bus transaction level make it possible to analyze the parallel performance of the program at three levels of detail. \item{4.} When the performance is optimized using the simulators, the program is executed on a genuine parallel machine. Several programs have been developed with the toolkit. A program that simulates tidal flow in an estuary of the North sea is presented as a case study to demonstrate the merits of the toolkit when developing complex parallel programs. The toolkit not only supports the design of parallel applications; it also allows the study of important concepts in parallel computer architecture. These include the behavior of cached memory systems, bus protocols, scheduling algorithms and memory management algorithms. \vskip 8pt \noindent{\bf Short Code:\ \ }[Hartel:94a] \vskip 5pt \noindent{\bf Reference:\ \ \ \ }Accepted---11/21/94 (C225) \vskip 0.25in \centerline{\bf Parallel Solution of Large-Scale Differential-Algebraic Systems} \medskip \centerline{\it R. S. Maier, L. R. Petzold, and W. Rath} \medskip DASPK solves large-scale systems of differential-algebraic equations. It is based on the integration method in DASSL, but instead of a direct method for the associated linear system which arise at each time step, the preconditioned GMRES iteration is applied in combination with an Inexact Newton Method. Two parallel versions of DASPK have been developed: DASPKF90, a Fortran 90 data parallel implementation, and DASPKMP, a message-passing implementation written in Fortran 77 with extended BLAS. The parallel versions have been implemented for the Thinking Machines Corporation TMC CM-5, a massively parallel multiprocessor, keeping the user interface relatively simple while allowing for portability to other massively parallel architectures. The codes have been demonstrated on several large-scale test problems, including three-dimensional formulations of the heat equation, the Cahn-Hilliard equation, and a multi-species reaction-diffusion problem. The formulations are described, including detail on preconditioning the Krylov iteration, timing results and performance analysis. \vskip 8pt \noindent{\bf Short Code:\ \ }[Maier:94a] \vskip 5pt \noindent{\bf Reference:\ \ \ \ }Accepted---11/21/94 (C229) \vskip 0.25in \centerline{\bf Native and Generic Parallel Programming Environments on a Transputer and a PowerPC Platform} \medskip \centerline{\it A. G. Hoekstra, P. M. A. Sloot, F. van der Linden, M. van Muiswinkel, J. J. J. Vesseur, and L. O. Hertzberger} \medskip Genericity of parallel programming environments, enabling development of portable parallel programs, is expected to result in performance penalties. Furthermore, programmability and tool support of programming environments are important issues if a choice between programming environments has to be made. In this paper we propose a methodology to compare native and generic parallel programming environments, taking into account such competing issues as portability and performance. As a case study, this paper compares the Iserver-Occam, Parix, Express, and PVM parallel programming environments on a 512-node Parsytec GCel. Furthermore, we apply our methodology to compare Parix and PVM on a new architecture, a 32-node Parsytec PowerXplorer, which is based on the PowerPC chip. In our approach we start with a representative application and isolate the basic (environment) dependent building blocks. These basic building blocks, which depend on floating point performance and communication capabilities of the environments, are analyzed independently. We have measured point to point communication times, global communication times and floating point performance. All information is combined into a time complexity analysis, allowing the comparison of the environments on different degrees of functionality. Together with demands for portability of the code and development time (i.e., programmability), an overall judgment of the environments is given. \vskip 8pt \noindent{\bf Short Code:\ \ }[Hoekstra:94a] \vskip 5pt \noindent{\bf Reference:\ \ \ \ }Accepted---12/14/94 (C241) \vskip 0.25in \centerline{\bf Exploiting Application Parallelism in Knowledge-based Systems: An Experimental Method} \medskip \centerline{\it J. W. H. Daniel and M. R. Moulding} \medskip A method is presented which aims to enhance the runtime performance of real-time production systems by utilizing natural concurrency in the application knowledge base. This Exploiting Application Parallelism (EAP) method includes an automated analysis of the knowledge base and the use of this analysis information to partition and execute rules on a novel Parallel Production System (PPS) architecture. Prototype analysis tools and a PPS simulator have been developed for the Inference ART environment in order to apply the method to a Naval data-fusion problem. The results of this experimental investigation revealed that an average maximum of 12.06 rule-firings/cycle was possible but, due to serial bottlenecks inherent in the data-fusion problem, up to only 2.14 rule-firings/cycle was achieved overall. Limitations of the EAP method are discussed within the context of the experimental results and an enhanced method is investigated. \vskip 8pt \noindent{\bf Short Code:\ \ }[Daniel:94a] \vskip 5pt \noindent{\bf Reference:\ \ \ \ }Accepted---11/2/94 (C257) \vfill\eject \bye