Parallel Algorithm Configurations

by D.W. Hyatt

Three Different Algorithm Approaches in PVM

There are a number of possible algorithm configurations that one could use when designing a PVM program. We will present three such configurations here: 1) Master - Slave Approach, 2) Relay Approach (or Loop ), and 3) Tree Approach. Take a look at the following example programs and how the different algorithms are used.

Master - Slave Approach
The master-slave approach is the most common method for a PVM computation. It utilizes the power of parallel programming more easily than the other methods because some computational task can be divided into sub-tasks, and then these smaller units can be worked on independently by the slave tasks, and the results sent back to the master when complete.
The following code shows how a master program sends an array of data out to a number of slave programs, each of which calculates the sum of a specific row. The slaves return the sum of an individual row of data which the master then records. The data is sent out in what is known as a "multicast" approach, where one packet is sent, yet all slaves listen and receive simultaneously.
master2.c slave2.c
In a multi-cast situation, it is not inherently clear whether the program would be more efficient if the master program only sent the specific row of data to be calculated to just the appropriate slave process because there would have to be a number of smaller yet different packets of data sent out in sequence rather than one that is sent to all. Whether the overhead for the extra pvm_pack( ) and pvm_send( ) routines is greater than the overhead of the surplus data sent out is not clear. The relationship between the two factors is an interesting thing to investigate.

Relay or Pipeline Approach
The rationale for the relay or pipeline approach parallels some common approaches used in computer architecture for speeding up computations. In this situation, if there are a number of steps that although discrete must be computed in a certain sequence, an algorithm similar to a "vector pipeline" might be used. This is an assembly line approach where the first phase is done by the first processor, afterwhich it passes the results along to the next processor in line. Meanwhile, the first processor can begin working on another batch of data. While the second processor is doing its task. When the second processor is finished, it can pass the data onto a third, while the second begins working on the next thing from process one, and the first process can begin on yet another piece of data.
Eventually, the pipeline gets filled and the throughput is significantly increased because each stage of the assembly line is attacking the task in parallel. If the amount of work do be done is small, the pipeline approach is not efficient because of the overhead required to fill the pipeline. If the different stages have some hardware advantage, then the pipeline might be good for improving throughput.
relay0.c relay1.c
Tree Approach
The tree computation approach is more useful for a "divide and conquer" type algorithm where a task is sent out to multiple slave-type processes, which in turn send out off portions to one or more additional slaves. When the calculations are eventually complete, the data is passed back up the nodes of the tree and returned to the originator of the task.
One potential problem in this recursive approach is the assurance that there is a terminating condition, otherwise a number of processes can continue to be spawned by sub processes, resulting in an endless propagation of child tasks, eventually overwhelming the system resources. Take a look at the program below that allows the generation of a tree with a maximum depth of four levels.
tree4.c

Multiple Processes on the Linux Cluster

The Linux Cluster uses a different programming approach. Instead of manually deciding how each independent machine will handle a separate program, the general approach is that a single program makes multiple copies of iteself with often different parameters by "forking" processes on the main node of the cluster. The operating system then distributes those programs to the various nodes of the cluster and each node completes what it was supposed to do.

The following two pages give examples of programs on the Cluster:

A Simple Cluster Program
Working with The Linux Cluster: POV-Ray Animation

Parallel Algorithm Configurations

by D.W. Hyatt

Three Different Algorithm Approaches in PVM

Master - Slave Approach

Relay or Pipeline Approach

Tree Approach

Multiple Processes on the Linux Cluster