Usage of udp-cluster-uniprocess and udp-cluster-smp backends by: Dan Bonachea --- Basic Usage --- To run a titanium app compiled on the udp-cluster-uniprocess or udp-cluster-smp backend, just run the app directly and pass the first argument as the number of cluster nodes to use: ./HelloWorld 4 {program args...} On the udp-cluster-smp backend, you'll also need to set TI_THREADS as usual to indicate how many titanium threads to run on each cluster node (e.g. for the command above, you might set it to "4 4 2 2" to specify a 4-node (12-thread) job containing 2 quad-processor SMPs and 2 dual-processor SMPs) --- Configuring the job spawn mechanism --- The Titanium runtime library needs a way to spawn the remote jobs on the worker nodes of your cluster. The udp-* backends support several common mechanisms for doing this. By default, it will use glurun or rexec if these are available, otherwise it reverts to ssh remote shell. To explicitly control the spawning mechanism, set TI_SPAWNFN to one of the following characters: G)lurun, R)exec, S)sh remote shell, C)ustom spawn fn, L)ocal fork()/exec() (for debugging) Documentation for each option appears in the following sections. * Glurun spawn mechanism: This is the spawn mechanism used on the Berkeley NOW cluster. No special setup is required. However if you wish, you can control the nodes chosen to run your job by setting the environment variable GLUNIX_NODES to a list or range of hostnames (e.g. "u2,u3,u4", or "u2..u15") If you're having trouble, you can test whether glurun is working correctly by issuing the command: "glurun -4 uname -a" to run uname on 4 cluster nodes. * Rexec spawn mechanism: This is the spawn mechanism used on the Berkeley Millennium cluster. See http://www.millennium.berkeley.edu/rexec/ for instructions on setting up your environment to work with rexec. By default, rexec will connect to a load-balancer specified by VEXEC_SVRS to select lightly-loaded nodes to run your job. However if you wish, you can explicitly control the nodes chosen to run your job by setting the environment variable REXEC_NODES to a list of hostnames (e.g. "mm2 mm4 mm8 mm19"). Environment variables recognized: * option default description * -------------------------------------------------- * REXEC_SVRS none list of servers to use * REXEC_CMD "rexec" rexec command to use * REXEC_OPTIONS "" additional options to give rexec If you're having trouble, you can test whether rexec is working correctly by issuing the command: "rexec -n 4 uname -a" to run uname on 4 cluster nodes. You should also try "rexec -n 4 ls" from the directory containing the Titanium executable, to ensure that all nodes are correctly mounting your working directory. * Ssh remote shell spawn mechanism: This job spawning mechanims uses ssh remote shell sessions to spawn jobs on the worker nodes. In order to use the ssh job spawning mechanism, the user must create an RSA keypair which will be used to perform automatic authentication when connecting to the worker nodes (in lieu of typing a password for each node). The user needs to create his/her RSA key pair by running ssh-keygen(1). This stores the private key in $HOME/.ssh/identity and the public key in $HOME/.ssh/identity.pub in the user's home directory. The user should then copy the identity.pub to $HOME/.ssh/authorized_keys in his/her home directory on the remote machine (the authorized_keys file corresponds to the conventional $HOME/.rhosts file, and has one key per line, though the lines can be very long). Then, the user should set these variables in their environment: setenv SSH_SERVERS "compute-0-0 compute-0-1 compute-0-2 compute-0-3" setenv SSH_NO_PASSWD If you're using the ssh spawn mechanism, you must set SSH_SERVERS to a list of hostnames to be used as worker nodes for running the cluster job (the four I've listed above correspond to four of the meteor cluster compute nodes). The second disables the ssh password prompt on some systems. Environment variables recognized: * option default description * ---------------------------------------------------------------- * SSH_SERVERS none - must be provided list of servers to use * SSH_CMD "ssh" ssh command to use * SSH_OPTIONS "" additional options to give ssh * SSH_REMOTE_PATH current working dir. the directory to use on the remote * machine that contains the * application executable The ssh shell does a cd to SSH_REMOTE_PATH before starting your app, so a copy of the application must be mounted in that directory on all the remote nodes. * C)ustom spawn mechanism: This spawn mechanism is a general spawn facility that can be easily configured at runtime to use whatever site-specific spawn scripts may be available. The user provides environment settings that dictate the shell command to be run by the master node when it is ready to spawn worker nodes. Ideally the person installing the Titanium compiler on a new system that requires using the custom spawn function would figure out the correct settings for these variables to make things work on the given system and publish the settings to other Titanium users on the site. The following environment variables are recognized: * option default description * ---------------------------------------------------------------- * TI_CSPAWN_SERVERS none list of servers to use - only required if %M is used below * TI_CSPAWN_CMD none command to call for spawning - the following * replacements will be made: * %M => list of servers taken from * TI_CSPAWN_SERVERS (the first nproc are * taken) * %N => number of worker nodes requested * %C => AMUDP command that should be run by * each worker node * %D => the current working directory * %% => % * TI_CSPAWN_ROUTE_OUTPUT set this variable to request * stdout/stderr routing of workers to the * console In order to use the custom spawn function, the user _must_ set the TI_CSPAWN_CMD variable to the command that should be run by the master to accomplish the launching of the worker nodes. The application will expand the recognized strings listed above (e.g. %N) to the appropriate values for the current run to provide input parameters to the spawning script. TI_CSPAWN_CMD _must_ include the %C string somewhere, which expands to a command that runs the worker-version of the application with some special arguments to enable bootstrapping. As far as AMUDP is concerned, the only requirement on the native spawn script being invoked is that it must run the command given by %C on each of %N worker nodes to be used for computation. The optional TI_CSPAWN_SERVERS variable is used to provide an arbitrarily long list of server names, which the master node will use when expanding the optional %M string (it replaces %M with the first N servers from TI_CSPAWN_SERVERS, where N is the number of boxes required for the current run). The TI_CSPAWN_ROUTE_OUTPUT command directs AMUDP to handle routing worker stdout/stderr to the console node, for native spawn scripts that aren't equipped to do that. Here is a basic example: setenv TI_CSPAWN_SERVERS "compute-0-2,compute-0-22,compute-0-3" setenv TI_CSPAWN_CMD "mpirun -np %N -m %M 'cd %D ; %C'" At runtime (assuming a 2-node job), this will result in the master node executing this command using a system() call: mpi-launch -np 2 -m compute-0-2,compute-0-22 'cd ; ' where is the current dir and is a shell command to be run on each worker which includes a call to the current application binary. * L)ocal fork()/exec() spawn mechanism: This spawn mechanism forks all worker processes on the local machine, but the worker processes still communicate using UDP over the localhost network to simulate a cluster configuration. It's a useful debugging tool, but should probably never be used for production jobs (use the smp backend instead for faster communication in single-machine environments). --- Viewing communication performance statistics --- The udp-* backends include a feature that monitors the communication behavior of your distributed application and outputs a list of performance statistics at program exit. This feature is useful for investigating the communication behavior of Titanium apps. To enable this feature, set the environment variable (TI_AMSTATS) and run your distributed application on 2 or more nodes (The program must return or throw an exception from main, not just call System.exit()) Example output (from arrayCopyTest running on Millenium, 4 P3 SMPs connected via 100 Mbit ethernet): -------------------------------------------------- Global AM2 usage statistics: Requests: 457266 sent, 80 retransmitted, 457346 received Replies: 457266 sent, 80 retransmitted, 457346 received Returned messages: 0 Latency (request sent to reply received): min: 56 microseconds max: 104617 microseconds avg: 150 microseconds Message Breakdown: Requests Replies Average data payload Small (<= 32 bytes) 453565 453730 11.968 bytes Medium (<= 544 bytes) 3701 3536 513.974 bytes Large (<=65032 bytes) 0 0 0.000 bytes Total 15.940 bytes Data bytes sent: 14577856 bytes Total bytes sent: 60371016 bytes (incl. AM overhead) Bandwidth overhead: 75.85 % Average packet size: 66.001 bytes (incl. AM & transport-layer overhead) Packets lost: 0 -------------------------------------------------- same program on NOW, 4 UltraSPARC-1/170's connected via 10 MBit Ethernet -------------------------------------------------- Global AM2 usage statistics: Requests: 916344 sent, 34732 retransmitted, 951076 received Replies: 916344 sent, 34732 retransmitted, 951076 received Returned messages: 0 Latency (request sent to reply received): min: 173 microseconds max: 3039327 microseconds avg: 2213 microseconds Message Breakdown: Requests Replies Average data payload Small (<= 32 bytes) 908587 909276 11.951 bytes Medium (<= 544 bytes) 7757 7068 503.337 bytes Large (<=65032 bytes) 0 0 0.000 bytes Total 15.926 bytes Data bytes sent: 29187084 bytes Total bytes sent: 150380952 bytes (incl. AM overhead) Bandwidth overhead: 80.59 Average packet size: 79.058 bytes (incl. AM & transport-layer overhead) Packets lost: 0 -------------------------------------------------- Note the latency statistics may differ significantly on different hardware and systems, and will also differ significantly on other AM implementations that are built on lower-level, higher-performance networking layers than UDP. However, the accounting statistics (number of requests/replies, data bytes, etc) should be nearly the same on all the AM-based distributed Titanium backends.