Parallel Geant4 (ParGeant4) Maintained by Gene Cooperman (gene@ccs.neu.edu), and Viet Ha Nguyen (vietha@ccs.neu.edu) What is ParGeant4 ? ParGeant4 [1] is a parallel version of Geant4 that implements event-level parallelism to simulate separate events on remote processors. Typical simulations demonstrate a nearly linear speedup in running time as the number of remote processors increases. The needed enhancements of Geant4 are included in the examples/extended/parallel directory of the Geant4 distribution. Why is ParGeant4 useful? When doing a large Geant4 simulation, one often wishes to run on many processors to reduce the overall time. Traditionally, this has been done by splitting the events into multiple groups, and running Geant4 independently on each processor for its own group of events. This requires restarting a run if a processor goes down. It also requires saving the histogram files from each run, and merging the files prior to using the analysis tool. The human effort in this is considerable. ParGeant4 provides a much simpler mechanism. After setting up ParGeant4 one links and runs the sequential Geant4 application exactly as before, but additionally linking with some parallel libraries. Upon execution, ParGeant4 on the console sends out events to slave processes, collects all hits, and calls any analysis tool -- exactly as one would do in the sequential case. There is no need to split events into separate groups, track whether one of the processors crashed, merge histogram files, etc. If a slave processor crashes, ParGeant automatically re-sends the events of that slave processor to a new slave processor for re-execution. What is the performance of ParGeant4? As a rule of thumb, speedup will be nearly linear when each event simulation lasts for at least several milliseconds. ParGeant4 has been tested extensively on parallelizations of examples/novice/N02 and of examples/advanced/underground_physics. On N02, we see a speedup of 27 for 50 nodes and a speedup of 33 for 100 nodes. When using the --aggregated-tasks=50 option (see below) the speedup improves to 35 for 50 nodes and 60 for 100 nodes. In tests of underground_physics, events are longer and we see nearly linear speedup (94 times speedup with 100 nodes). Getting started Detailed information is under extended/parallel/ParN02/docs/000README. There are four steps: 1. Install TOP-C [2]. 2. Compile ParN02 by running gmake. 3. Make sure the "procgroup" file is correct and copy it to directory of the executable binary file (for example, $G4BIN/Linux-g++). 4. Run the parallel binary program. What is involved in setting up ParGeant4? To set up ParGeant4, one needs TOP-C [2] and Marshalgen [4] (free, open source software). If one is parallelizing a new Geant4 application, one must then add/modify approximately 20 lines of annotations (C++ comments to indicate shallow vs. deep copying of pointers, etc.) in the .h files for each hit type being defined by the application. For details of the annotations, refer to the manual of the Marshalgen package. Finally, in the main routine of the application, one replaces the call to the G4RunManager constructor by a call to the ParRunManager constructor. (ParRunManger is a derived class of G4RunManager.) After this, one invokes the already provided GNUMakefile (a slightly modified version of the Geant4 example GNUMakefile) to create the parallel application. Finally, one writes a "procgroup" file, which declares the names of the remote hosts to use in the parallel computation. Optionally, one may also specify filenames (e.g. slave1.out, slave2.out, ...) to store the printout from each slave process. One then calls the ParGeant4 binary exactly as one would call the Geant4 binary, and the results appear as normal, only faster. Are there examples of using ParGeant4? Yes. ParGeant4 includes parallelizations of other examples from the Geant4 distribution. Specifically, ParGeant4 includes example parallelizations of novice/N02, novice/N04, and advanced/underground_physics . What are some of the features of ParGeant4? ParGeant4 includes all of the features of TOP-C. In particular, after building a binary, "parMySimulation", one might call: ./parMySimulation --TOPC-help Display command options, and then exit. ./parMySimulation --TOPC-trace=0 By default, ParGeant4 traces each time a new event is sent to a task. This turns it off. ./parMySimulation --TOPC-verbose=0 By default, ParGeant4 provides statistics indicating what process was run, when it was run, what machine, the running times and elapsed times of master and the average slave, and other information. This turns off the statistics. ./parMySimulation --TOPC-aggregated-tasks=10 By default, ParGeant4 sends one event to one remote process before turning to the next process. This option sends 10 events to a single remote process in one message. This is useful when events are relatively short, and the network latency of sending a message is starting to dominate the running time. ./parMySimulation --TOPC-slave-timeout=3600 By default, if a remote process has not communicated with the master after 1800 seconds (a half hour), the slave process will kill itself. This prevents runaway processes that may be in an infinite loop for some event, or may have lost their socket to communicate with the master process. In this example, we allow 7200 seconds (two hours) because we expect simulation of some events to last up to (but not more than) 7200 seconds. By default, ParGeant4 uses its own subset implementation of MPI (MPINU). ParGeant4 adds approximately 50 KB to the "footprint" of the binary executable. By default, ParGeant4 uses "ssh" to set up remote processes. Those who wish to use their own MPI (perhaps if a batch cluster requires a specific MPI) may do so. (See "Configuring a Different `MPI')" in the TOP-C manual [3].) References: [1] ParGeant4: http://www.ccs.neu.edu/home/gene/pargeant4.html [2] TOP-C: http://www.ccs.neu.edu/home/gene/topc.html [3] TOP-C manual: http://www.ccs.neu.edu/home/gene/topc/topc_toc.html [4] Marshalgen : http://www.ccs.neu.edu/home/gene/marshalgen.html