The following code shows how a master program sends an array of data out to a number of slave programs, each of which calculates the sum of a specific row. The slaves return the sum of an individual row of data which the master then records. The data is sent out in what is known as a "multicast" approach, where one packet is sent, yet all slaves listen and receive simultaneously.
master2.c   slave2.c  
In a multi-cast situation, it is not inherently clear whether the program would be more efficient if the master program only sent the specific row of data to be calculated to just the appropriate slave process because there would have to be a number of smaller yet different packets of data sent out in sequence rather than one that is sent to all. Whether the overhead for the extra pvm_pack( ) and pvm_send( ) routines is greater than the overhead of the surplus data sent out is not clear. The relationship between the two factors is an interesting thing to investigate.
Eventually, the pipeline gets filled and the throughput is significantly increased because each stage of the assembly line is attacking the task in parallel. If the amount of work do be done is small, the pipeline approach is not efficient because of the overhead required to fill the pipeline. If the different stages have some hardware advantage, then the pipeline might be good for improving throughput.
One potential problem in this recursive approach is the assurance that there is a terminating condition, otherwise a number of processes can continue to be spawned by sub processes, resulting in an endless propagation of child tasks, eventually overwhelming the system resources. Take a look at the program below that allows the generation of a tree with a maximum depth of four levels.
tree4.c  
The following two pages give examples of programs on the Cluster: