|
Geant4 User's Guide
For Application Developers
Examples |
Parallel Geant4 (ParGeant4)
by Gene Cooperman (gene@ccs.neu.edu)
and Viet Ha Nguyen (vietha@ccs.neu.edu)
What is ParGeant4 ?
ParGeant4
is a parallel version of Geant4 that implements event-level
parallelism to simulate separate events on remote processors.
Typical simulations demonstrate a nearly linear speedup in running
time as the number of remote processors increases. The needed
enhancements of Geant4 are included in the
examples/extended/parallel directory of the Geant4 distribution.
Why is ParGeant4 useful?
When doing a large Geant4 simulation, one often wishes to run
on many processors to reduce the overall time. Traditionally,
this has been done by splitting the events into multiple groups,
and running Geant4 independently on each processor for its own
group of events. This requires restarting a run if a processor
goes down. It also requires saving the histogram files from
each run, and merging the files prior to using the analysis tool.
The human effort in this is considerable.
ParGeant4 provides a much simpler mechanism. After setting up
ParGeant4 one links and runs the sequential Geant4 application
exactly as before, but additionally linking with some parallel libraries.
Upon execution, ParGeant4 on the console sends out events to
slave processes, collects all hits, and calls any analysis tool --
exactly as one would do in the sequential case.
There is no need to split events into separate groups, track
whether one of the processors crashed, merge histogram files, etc.
If a slave processor crashes, ParGeant automatically re-sends the
events of that slave processor to a new slave processor for re-execution.
What is the performance of ParGeant4?
As a rule of thumb, speedup will be nearly linear when each event
simulation lasts for at least several milliseconds.
ParGeant4 has been tested extensively on parallelizations of
examples/novice/N02 and of examples/advanced/underground_physics .
On N02, we see a speedup of 27 for 50 nodes and a speedup of
33 for 100 nodes. When using the --aggregated-tasks=50
option (see below)
the speedup improves to 35 for 50 nodes and 60 for 100 nodes.
In tests of underground_physics, events are longer and we see nearly
linear speedup (94 times speedup with 100 nodes).
Getting started
Detailed information is under extended/parallel/ParN02/docs/000README.
There are four steps:
- Install TOP-C.
- Compile ParN02 by running gmake.
- Make sure the "procgroup" file is correct and copy it to
directory of the executable binary file (for example,
$G4BIN/Linux-g++).
- Run the parallel binary program.
What is involved in setting up ParGeant4?
To set up ParGeant4, one needs
TOP-C and
Marshalgen
(free, open source software).
If one is parallelizing a new Geant4 application, one must then add/modify
approximately 20 lines of annotations (C++ comments
to indicate shallow vs. deep copying of pointers, etc.) in the .h files
for each hit type being defined
by the application. For details of the annotations, refer to the manual of the Marshalgen package.
Finally, in the main routine of the application,
one replaces the call to the G4RunManager constructor by a call
to the ParRunManager constructor. (ParRunManger is a derived class
of G4RunManager.)
After this, one invokes the already provided
GNUMakefile (a slightly modified version of the Geant4 example GNUMakefile)
to create the parallel application. Finally, one writes a
"procgroup" file, which declares the names of the remote hosts
to use in the parallel computation. Optionally, one may also
specify filenames (e.g. slave1.out, slave2.out, ...) to store
the printout from each slave process. One then calls the
ParGeant4 binary exactly as one would call the Geant4 binary,
and the results appear as normal, only faster.
Are there examples of using ParGeant4?
Yes. ParGeant4 includes parallelizations of
other examples from the Geant4 distribution. Specifically, ParGeant4
includes example parallelizations of novice/N02, novice/N04, and
advanced/underground_physics .
What are some of the features of ParGeant4?
ParGeant4 includes all of the features of TOP-C. In particular,
after building a binary, "parMySimulation",
one might call:
-
./parMySimulation --TOPC-help
-
Display command options, and then exit.
-
./parMySimulation --TOPC-trace=0
-
By default, ParGeant4 traces each time a new event is sent to
a task. This turns it off.
-
./parMySimulation --TOPC-verbose=0
-
By default, ParGeant4 provides statistics indicating what
process was run, when it was run, what machine, the running
times and elapsed times of master and the average slave, and
other information. This turns off the statistics.
-
./parMySimulation --TOPC-aggregated-tasks=10
-
By default, ParGeant4 sends one event to one remote process before
turning to the next process. This option sends 10 events
to a single remote process in one message. This is useful
when events are relatively short, and the network latency
of sending a message is starting to dominate the running time.
-
./parMySimulation --TOPC-slave-timeout=3600
-
By default, if a remote process has not communicated with the
master after 1800 seconds (a half hour), the slave process will
kill itself. This prevents runaway processes that may be in an
infinite loop for some event, or may have lost their socket to
communicate with the master process. In this example, we allow
7200 seconds (two hours) because we expect simulation of some
events to last up to (but not more than) 7200 seconds.
By default, ParGeant4 uses its own subset implementation of MPI (MPINU).
ParGeant4 adds approximately 50 KB to the "footprint" of the binary
executable. By default, ParGeant4 uses "ssh" to set up remote processes.
Those who wish to use their own MPI (perhaps if a batch cluster requires
a specific MPI) may do so. (See "Configuring a Different `MPI')"
in the
TOP-C manual.)
About
the authors