Usage of mpi-cluster-uniprocess and mpi-cluster-smp backends by: Dan Bonachea --- Basic Usage --- The mpi-cluster-uniprocess and mpi-cluster-smp backends allow your Titanium application to be compiled into an executable that (as far as the system is concerned) is indistinguishable from MPI-based executables written directly in C or Fortran, and therefore should be run in the same ways. To run a Titanium app compiled on the mpi-cluster-uniprocess or mpi-cluster-smp backend, you need to use the standard MPI run script provided by the system administrators on your computing site. You should consult your site-specific documentation to find the correct mantra which allows running of MPI programs on your machine. A common example is the MPICH run script, which generally looks something like this (to start a 4-node job): mpirun -np 4 ./HelloWorld {program args...} Another example is the POE system used on many IBM SP systems: poe ./HelloWorld {program args...} -nodes 4 -tasks_per_node 1 -rmpool 1 -euilib us (note: -nodes specifies the number of boxes, and -tasks_per_node indicates the number of MPI processes to start on each node) To use the mpi-cluster-smp backend, you'll also need to set TI_THREADS as usual to indicate how many Titanium threads to run on each cluster node. Note that on a machine where multiple CPU's share a memory space (i.e. a CLUMP machine like the IBM SPPower3), the optimal configuration is to use the mpi-cluster-smp backend, so you can configure it to use MPI between nodes and pthreads (shared memory transfers) within a node. This will provide higher performance for intra-node transfers than if you'd merely run each Titanium thread on a node as a separate MPI process (although that usage is also supported if you have a reason to want it). For example, you might set TI_THREADS to "4 4 2 2" and use either of the commands above to spawn a 4-node (12-thread) job utilizing 2 quad-processor SMPs and 2 dual-processor SMPs. Note that in the case of poe, we still specify -task_per_node 1, because we still only have one MPI process per node - the multiple titanium threads on that node share a single MPI interface for off-node transfers. --- Viewing communication performance statistics --- The mpi-* backends include a feature that monitors the communication behavior of your distributed application and outputs a list of performance statistics at program exit. This feature is useful for investigating the communication behavior of Titanium apps. To enable this feature, set the environment variable (TI_AMSTATS) and run your distributed application on 2 or more nodes (The program must return or throw an exception from main, not just call System.exit()) Example output (from commperf test running between nodes on seaborg, an IBM SPPower3): -------------------------------------------------- Global AM2 usage statistics: Requests: 646742 sent, 646742 received Replies: 646742 sent, 646742 received Returned messages: 0 Message Breakdown: Requests Replies Average data payload Small (<= 32 bytes) 404022 404682 10.158 bytes Medium (<=65032 bytes) 242720 242060 1050.162 bytes Large (<=65032 bytes) 0 0 0.000 bytes Total 399.937 bytes Data bytes sent: 517311784 bytes Total bytes sent: 550452040 bytes (incl. AM overhead) Bandwidth overhead: 6.02 % Average packet size: 425.558 bytes (incl. AM overhead) Packets unaccounted for: 0 -------------------------------------------------- Note that accounting statistics (number of requests/replies, data bytes, etc) may differ somewhat from the other AM-based distributed Titanium backends (such as udp-* or now-*) because AMMPI supports larger medium-sized messages that these other AM layers (up to 64KB, rather than the usual 512 byte limit). --- Setting environment variables --- Note that the MPI run script on many systems (especially when used interactively) may fail to correctly propagate environment variable values from your console to the worker nodes. This that means if you set environment variables such as TI_THREADS and TI_AMSTATS on your console and interactively run an MPI job, those setting may not be reflected in the environment of the worker nodes, causing them to be ignored. Titanium includes a feature especially intended to solve this problem, whereby you can specify some environment variable settings to be used via the file system: The user can specify environment settings to be used while running his application, by creating a file called ".tienv" in the same directory as the Titanium application executable and placing an environment variable assignment on each line, of the form: name = value (where name and value may be "quoted" and value may be omitted) The settings in this file are added to the environment at program startup, overriding any settings already there if there are any conflicts. The file is read very early at program startup, so it may include job configuration variables (such as TI_THREADS). This feature works on all the tic backends, but is especially useful for distributed backends (especially the mpi-* backends) on clusters whose job spawning mechanism doesn't correctly propagate environment variables from the console node to the worker nodes (for example, the mpi-launch script on the ROCKS cluster).