This section presents examples of two alternate methods of
using the PVM module. The source code for these examples is
included in the PVM
module source code distribution
in the examples
subdirectory. The first method uses PVM
library routines to manage a simple distributed application.
The second method uses the higher-level master-slave
interface. This interface can provide a high degree of
tolerance to failure of slave machines which proves useful in
long-running distributed applications.
In programming language tutorials, the first example is usually a program which simply prints out a message such as Hello World and then exits. The intent of such a trivial example is to illustrate all the steps involved in writing and running a program in that language.
To write a Hello World program using the PVM module, we will
write two programs, the master (
hello_master), and the
slave (
hello_slave). The master process will spawn a
slave process on different host and then wait for a message
from that slave process. When the slave runs, it sends a
message to the master, or parent, and then exits. For the
purpose of this example, we will assume that the PVM consists
of two hosts, named vex
and pirx
, and that the
slave process will run on pirx
.
hello_master
programFirst, consider the master process,
hello_master.
Conceptually, it must specify the full path to the slave
executable and then send that information to the slave host
(pirx
). For this example, we assume that the
master and slave executables are in the same directory and
that the master process is started in that directory. With
this assumption, we can construct the path to the slave
executable using the getcwd
and path_concat
functions. We then send this information to the slave host
using the pvm_spawn
function:
path = path_concat (getcwd(), "hello_slave");
slave_tid = pvm_spawn (path, PvmTaskHost, "pirx", 1);
The first argument to pvm_spawn
specifies the full path
to the slave executable. The second argument is a bit mask
specifying options associated with spawning the slave process.
The PvmTaskHost
option indicates that the slave process
is to be started on a specific host. The third argument gives
the name of the slave host and the last argument indicates how
many copies of this process should be started. The return
value of pvm_spawn
is an array of task identifiers for
each of the slave processes; negative values indicate that an
error occurred.
Having spawned the
hello_slave process on pirx
,
the master process calls the pvm_recv
function to
receive a message from the slave.
bufid = pvm_recv (-1, -1);
The first argument to pvm_recv
specifies the task
identifier of the slave process expected to send the message
and the second argument specifies the type of message that is
expected. A slave task identifier -1
means that a
message from any slave will be accepted. Similarly, a message
identifier of -1
means that any type of message will be
accepted. In this example, we could have specified
the slave task id and the message identifier explicitly:
bufid = pvm_recv (slave_tid, 1);
When a suitable message is received, the contents of the
message are stored in a PVM buffer and pvm_recv
returns
the buffer identifier which may be used by the PVM application
to retrieve the contents of the buffer.
Retrieving the contents of the buffer normally requires
knowing the format in which the information is stored. In this
case, because we accepted all types of messages from the
slave, we may need to examine the message buffer to find out
what kind of message was actually recieved. The
pvm_bufinfo
function is used to obtain information
about the contents of the buffer.
(,msgid,) = pvm_bufinfo (bufid);
Given the buffer identifier, pvm_bufinfo
returns the
number of bytes, the message identifier and the task identifer
sending the message.
Because we know that the slave process sent a single object of
Struct_Type
, we retrieve it by calling the
pvm_recv_obj
function.
variable obj = pvm_recv_obj();
vmessage ("%s says %s", obj.from, obj.msg);
This function is not part of the PVM package but is a higher
level function provided by the PVM
module. It
simplifies the process of sending S-lang objects between hosts
by handling some of the bookkeeping required by the lower
level PVM interface. Having retrieved a S-lang object from
the message buffer, we can then print out the message.
Running
hello_master, we see:
vex> ./hello_master
pirx says Hello World
Note that before exiting, all PVM processes should call the
pvm_exit
function to inform the pvmd
daemon of
the change in PVM status.
pvm_exit();
exit(0);
At this point, the script may exit normally.
hello_slave
programNow, consider the slave process, hello_slave. Conceptually, it must first determine the location of its parent process, then create and send a message to that process.
The task identifier of the parent process is obtained using
the pvm_parent
function.
variable ptid = pvm_parent();
For this example, we will send a message consisting of a
S-lang structure with two fields, one containing the name of
the slave host and the other containing the string
"Hello World"
.
We use the pvm_send_obj
function to send this this
message because it automatically handles packaging all the
separate structure fields into a PVM message buffer and also
sends along the structure field names and data types so that
the structure can be automatically re-assembled by the
receiving process. This makes it possible to write code which
transparently sends S-lang objects from one host to
another. To create and send the structure:
variable s = struct {msg, from};
s.msg = "Hello World";
s.from = getenv ("HOST");
pvm_send_obj (ptid, 1, s);
The first argument to pvm_send_obj
specifies the task
identifier of the destination process, the second argument is
a message identifier which is used to indicate what kind of
message has been sent. The remaining arguments contain the
data objects to be included in the message.
Having sent a message to the parent process, the slave process
then calls pvm_exit
to inform the pvmd
daemon
that its work is complete. This allows pvmd
to notify
the parent process that a slave process has exited. The slave
then exits normally.
The PVM
module provides a higher level interface to
support the master-slave paradigm for distributed
computations. The symbols associated with this interface have
the pvm_ms
prefix to distinguish them from those
symbols associated with the PVM package itself.
The pvm_ms
interface provides a means for handling
computations which consist of a predetermined list of tasks
which can be performed by running arbitrary slave processes
which take command-line arguments. The interface provides a
high degree of robustness, allowing one to add or delete hosts
from the PVM while the distributed process is running and also
ensuring that the task list will be completed even if one or
more slave hosts fail (e.g. crash) during the computation.
Experience has shown that this failure tolerance is
surprisingly important. Long-running distributed computations
experience failure of one or more hosts with surprising
frequency and it is essential that such failures do not
require restarting the entire distributed computation from the
beginning.
Scripts using this interface must initialize it by loading
the pvm_ms
package via, e.g.
require ("pvm_ms");
As an example of how to use this interface, we examine the
scripts
master and
slave.
master
programThe
master script first builds a list of tasks each
consisting of an array of strings which provide the command
line for each slave process that will be spawned on the PVM.
For this simple example, the same command line will be
executed a specified number of times. First, the script
constructs the path to the
slave executable,
(Slave_Pgm
), and then the command line (Cmd
), that
each
slave instance will invoke. Then the array of
tasks is constructed:
variable pgm_argvs = Array_Type[N];
variable pgm_argv = [Slave_Pgm, Cmd];
pgm_argvs[*] = pgm_argv;
The distribution of these tasks across the available PVM is
automatically handled by the pvm_ms
interface. The
interface will simultaneously start as many tasks as possible
up to some maximum number of processes per host. Here we
specify that a maximum of two processes per host may run
simultaneously and then submit the list of tasks to the PVM:
pvm_ms_set_num_processes_per_host (2);
exit_status = pvm_ms_run_master (pgm_argvs);
As each slave process is completed, its exit status is
recorded along with any messages printed to stdout
during the execution. When the entire list of tasks is
complete, an array of structures is returned containing status
information for each task that was executed. In this example,
the
master process simply prints out this information.
slave
programThe
slave process in this example is relatively simple.
Its command line arguments provide the task to be completed.
These arguments are then passed to pvm_ms_run_slave
pvm_ms_run_slave (__argv[[1:]]);
which spawns a subshell, runs the specified command,
communicates the task completion status to the parent process
and exits.