Enforcing Scalability of Parallel Comprehensive Mine Simulator (CMS)

 

ERDC PET FMS Year 4 Focused Project Technical Report

 

Wojtek Furmanski, David Bernholdt, Geoffrey Fox (contact person at fox@csit.fsu.edu)

NPAC, Syracuse University

 

Syracuse, NY, June 2000

 

Introduction  This project addressed the development of scalable Parallel CMS system by porting to Origin2000 the sequential CMS code developed by Ft. Belvoir. CMS is a substantial C++ code with a large number of dynamic objects and complex memory layout. Such codes are inherently hard to parallelize on current shared memory / NUMA platforms such as Origin2000 that are better tuned for conventional, more regular data parallel applications.  In consequence, our initial attempts at Parallel CMS, based on standard Origin2000 semi-automatic parallelization techniques (such as compiler pragmas or OpenMP) were not very successful. Other FMS projects reported similar problems – for example the E-ModSAF effort within the CHSSI FMS-3 that aimed at porting ModSAF to Origin2000 was cancelled due to unmanageable complexity of dynamic memory layout of numerous simulation objects. To address the challenge of Parallel CMS we performed an in-depth analysis of the CMS source code, we experimented with a set of parallization techniques, and finally we succeeded in building a fully scalable Parallel CMS module. In this report, we summarize our approach and we present our performance, scalability and load balancing results.

 

 

Comprehensive Mine Simulator by Ft. Belvoir  The Night Vision Lab at Ft. Belvoir, VA conducts R&D in the area of countermine engineering, using the advanced Comprehensive Mine Simulator (CMS) as an experimentation environment for a synthetic battlefield. Developed by the OSD sponsored Joint Countermine Advanced Concepts Technology Demonstration (JCM ACTD),  CMS is state-of-the-art high fidelity minefield simulator with support for a broad range of mine categories, including conventional types such as buried pressure-fuzed mines, antitank mines and other types including offroute (side attack) and wide-area (top attack) mines. CMS organizes mines in components, given by regular arrays of mines of particular types. Minefields are represented as heterogeneous collections of such homogenous components. CMS interoperates via the DIS protocol with ModSAF vehicle simulators. Mine interaction with a target in controlled by its fuse. CMS supports several fuze types, including full width, track width fuzes, off-route fuzes and others. CMS mines can also interact with countermine systems, including both mechanical and explosive countermeasures and detectors.

 

The relevance of HPC for the CMS system stems from the fact that modern warfare can require a million or more of mines to be present on the battlefield, such as in the Korean Demilitarized Zone or the Gulf War. The simulation of such battlefield areas requires HPC support. As part of the PET FMS project, Syracuse University analyzed the CMS code and ported the system to the Origin2000 shared memory parallel MPP. Below, we summarize our approach and results.

 

Parallel CMS: Approach  In our first attempt to port CMS to Origin2000, we identified performance critical parts of the inner loop, related to the repetitive tracking operation over all mines with respect to the vehicle positions and we tried to parallelize it using the Origin2000 compiler pragmas (i.e. loop partition and/or data decomposition directives). Unfortunately, this approach delivered only very limited scalability for up to 4 processors. We concluded that the pragmas based techniques, while efficient for regular Fortran programs, are not very practical for parallelizing complex and dynamic object-oriented event driven FMS simulation codes  - especially the 'legacy' object-oriented codes such as CMS which were developed by multiple programming teams over a long period of time and resulted in complex dynamic memory layouts of numerous objects that are now extremely difficult to decipher and properly distribute.

 

In the follow-on effort, we decided to explore an alternative approach based on a more direct, lower level parallelization technique. Based on our analysis of the SPEEDES simulation kernel that is known to deliver scalable object-oriented HPC FMS codes on Origin2000 (such as Parallel Navy Simulation System under development by Metron), we constructed a similar parallel support for CMS. The base concept of this 'micro SPEEDES kernel' approach, borrowed from the SPEEDES engine design but prototyped by us independently of the SPEEDES code, is to use only the fully portable UNIX constructs such as fork and shmem for the inter-process and inter-processor communication. This guarantees that the code is manifestly portable across all UNIX platforms, and hence it can be more easily developed, debugged and  tested in the single-processor multi-threaded mode on sequential UNIX boxes.  

 

In our micro-kernel, the parent process allocates a shared memory segment using shmget() and then it forks n children, remaps them via execpv(), and passes the shared memory segment descriptor to each child via the command line argument. Each child attaches to its dedicated slice of the shared memory using shmat(), thereby establishing the highest possible performance (no MPI overhead), fully portable (from O2 to O2K) multi-processor communication framework. We also developed a simple set of semaphores to synchronize node programs and to avoid race conditions in critical sections of the code. On a single processor UNIX platform, our kernel, when invoked with n processes, generates in fact n concurrent threads, communicating via UNIX shared memory. In an unscheduled Origin2000 run, the number of threads per processor and the number of processors used are undetermined (i.e. under control of the OS). However, when executed under control of a parallel scheduler such as MISER, each child process forked by our parent is assigned to a different processor, which allows us to regain control over the process placement and to realize a natural scalable implementation of parallel CMS.

 

Parallel CMS: Architecture On top of this micro-kernel infrastructure, we put suitable object-oriented wrappers that hide the explicit shmem based communication under the suitable higher level abstractions so that each node program behaves in fact as a sequential CMS, operating on a suitable subset of the full minefield. CMS module cooperates with ModSAF vehicle simulator running on another machine on the network. CMS continuously reads vehicle motion PDUs or the equivalent HLA interaction events from the network, updates vehicle positions and tracks all mines in the minefield in search for possible explosions. In our parallel version, the parent node 0 reads from the physical network and it broadcasts all PDUs via shared memory to children. Each child reads its PDUs from a virtual network which is a TCP/IP wrapper over the shmem communication channel.

 

Minefield segments are assigned to individual node programs using the scattered/cyclic decomposition which guarantees reasonable dynamic load balancing regardless of the current number and configuration of vehicles propagating through the minefield. We found the CMS minefield parser and the whole minefield I/O sector as difficult to decipher and modify to support scattered decomposition. We bypassed this problem by constructing our own Java based minefield parser using the new powerful public domain Java parser technology called ANTLR and offered by the MageLang Institute. Our parser reads the large sequential minefield file and chops it into n files, each representing a reduced node minefield generated via scattered decomposition. All these files are fetched concurrently by the node programs when the parallel CMS starts and the subsequent simulation decomposes naturally into node CMS programs, operating on scattered sectors of the minefield and communicating  via the shmem micro-kernel channel described above.

 

Parallel CMS: Performance   We performed timing runs of Parallel CMS, using the Origin2000 systems at the Navy Research Laboratory in Washington, DC and at the ERDC Major Shared Resource Center at Vicksburg, MS. The performance results are presented in Figs 1, 2 and they illustrate that  we have successfully constructed a fully scalable Parallel CMS for the Origin2000 platform. Figs 1 and 2 present timing results of Parallel CMS for a large minefield of one million mines, simulated on 16, 32 and 64 nodes. The timing histogram in Fig. 1 displays total simulation times in a run on a 16-node spent by each of the nodes and it illustrates that we got almost perfect load balance. Higher bars on this figure represent full simulation run with all ModSAF PDUs activated, whereas lower bars represent dry CMS run without vehicle updates. The comparison of both sets illustrates that communication with ModSAF vehicles took of order  of 20-25% of the total simulation time and that both computation and communication parts are fully load balanced.

 

Fig. 2 illustrates the speedup measured on 16, 32 and 64 nodes. Instead of T(1)/T(n) we present un-normalized 1/T(n) in this plot since we couldn't measure T(1) - when trying to run million mines simulation in one node we got memory overflow error. The SPEEDUP plot illustrates that Parallel CMS offers almost perfect (linear) scaling over broad range of processors.

 

 

      

Fig. 1: Simulation time spent by various nodes in a Parallel CMS run for million mines on a 16-node subset of Origin2000 at NRL (both for full run with vehicle PDUs and for a dry CMS-only run without PDUs) - illustrates very good load balance.

 

Fig. 2: Speedup of Parallel CMS on NRL Origin2000 for million mines and 30 vehicles, measured on 16, 32 and 64 nodes - illustrates almost perfect scalability across a broad processor range.

 
 

 

 

 

 

 

 


The timing results described above were obtained during Parallel CMS runs within our WebHLA [1][2] based HPDC / metacomputing environment that span four geographically distributed laboratories - ERDC in Vicksburg, MS, NRL in Washington, DC, ARL in Aberdeen, MD and NPAC in Syracuse, NY. We descibe our WebHLA environment and the Metacomputing CMS   runs in another Year 4 technical report [4] (see also the CRPC book chapter [3]).

 

Summary   In this project, we demonstrated that the current generation of shared memory architectures such as Origin2000 can be successfully used not only for regular data parallel simulations but also for irregular, more dynamic and object-oriented modeling and simulation codes. Porting such codes cannot be accomplished by semi-automatic parallelization tools – it requires more labor and more insight into the object memory layout of the sequential code. However, it appears that building scalable high performance codes for modeling and simulation is feasible and that the additional parallel code can be cleanly encapsulated from the legacy sequential code in the form a suitable micro-kernel as described in this report. Scalable HPC M&S codes such as Parallel CMS and our micro-kernel based parallelization techniques described here, when combined with plug-and-play HLA based integration architecture (such as addressed in our other Year 4 project on WebHLA [4]) , can play an important role in facilitating HPC technology insertions into the new generation large scale M&S programs such as JSIMS, JMASS or JWARS.

 

 

References

 

1.       Geoffrey C. Fox, Ph. D., Wojtek  Furmanski, Ph. D., Ganesh Krishnamurthy, Hasan T. Ozdemir,  Zeynep  Odcikin-Ozdemir, Tom A.  Pulikal, Krishnan Rangarajan, Ankur Sood, " Using WebHLA to Integrate HPC FMS Modules with Web/Commodity based Distributed Object Technologies of CORBA, Java, COM and XML", In Proceedings of the Advanced Simulation Technologies Conference ASTC 99, San Diego, April 99.

 

2.       G. Fox, W. Furmanski, G. Krishnamurthy, H. Ozdemir, Z. Ozdemir, T.Pulikal, K. Rangarajan and A. Sood, “WebHLA as Integration Platform for FMS and other Metacomputing Application Domains”, In Proceedings of the DoD HPC Users Group Conference, Monterey, CA, June 8-15, 1999.

3.       CRPC Book Chapter, Morgan-Kaufmann 2000 (in progress): WebHLA based Metacomputing Environment for Forces Modeling and Simulation.

 

4.       “HLA Integration for HPC Applications Applied to CMS”, ERDC FMS PET Year 4 Focused Project – NPAC, Syracuse University Technical Report, June 2000.