Open MPI convenience functions

I could start by lecturing you on the basics of usage and principles of operation of MPI and Open MPI in particular, but I think it might not be a good idea, because you (or even more probably me 😉 ) could get bored really quickly. To cut long story short – MPI (Message Passing Interface) is a way of distributing computations over a grid of computers connected by network. Unlike in OpenMP, machines do not physically share the same memory (RAM and/or HDD). Instead, data has to be exchanged over the network, which obviously becomes this solution’s bottleneck. OpenMPI struggles to maximize utilization of the available network bandwidth and provide you with the best performance possible.

Now, I don’t use Open MPI all that much in my day to day work but since my very first moments with the library I noticed the following facts: 1) the C API is very powerful but it’s also very C-ish ;), 2) the boost::mpi wrapper is cool but it lacks some convenience functions and 3) the convenience functions I missed the most were basically those I had seen before in QtConcurrent namespace from Qt Framework. Now, what does a good coder do when he spots a situation like this? 😉 Of course, he starts coding the missing stuff 😀

After some thinking I figured out that what I needed on a regular basis could be achieved by implementing just one of QtConcurrent’s functions, namely map(Sequence&, MapFunction). This function normally takes a sequence (e.g. a std::vector) and distributes the computations (calls to MapFunction) on its elements between many threads on the same machine. Obviously its MPI counterpart should do exactly the same, only distribute the load between many separate computers.

I had one more decision to make. OpenMPI starts all the processes in parallel (unlike OpenMP, which parallelizes only explicitly marked parts of the code). Thanks to this, as long as data vector is initialized in every process, it doesn’t matter that machines do not physically share memory and implementation of map() can omit the initial step of scattering data between processes because they already have all the required fragments. Nevertheless, for a starter I decided to implement the version with scattering, just in case. It allows to read data in the master process and distribute it between slaves, an approach more intuitive to most programmers. The other version will follow.

#include <vector>
#include <boost/mpi.hpp>

#define SEQ(comm) if(comm.rank() == 0) {
#define PAR }

namespace mpi
{
	template<class ValueType, class MapFunction> void map(const boost::mpi::communicator &comm, std::vector<ValueType> &values, MapFunction func)
	{
		int count, oldCount;

		SEQ(comm)
		oldCount = count = values.size();
		if (count % comm.size() != 0) count += comm.size() - count % comm.size();
		values.resize(count);

		PAR
		broadcast(comm, count, 0);

		int recvcount = count / comm.size();

		std::vector<ValueType> vec(recvcount);
		boost::mpi::scatter(comm, values, &vec[0], recvcount, 0);
		for (int i = 0; i < recvcount; i++)
		{
			vec[i] = func(vec[i]);
		}

		boost::mpi::gather(comm, &vec[0], recvcount, values, 0);

		SEQ(comm)
		values.resize(oldCount);

		PAR
	};
}

Now, a trivial usage example:

template<class ValueType> class square
{
public:
	ValueType operator()(const ValueType &val)
	{
		return val * val;
	}
};

int main(int argc, char *argv[])
{
	boost::mpi::environment env(argc, argv);
	boost::mpi::communicator world;

	std::vector<int> test;

	SEQ(world)
	test.push_back(2);
	test.push_back(4);
	test.push_back(6);
	test.push_back(8);
	test.push_back(10);
	std::cout << "Hello world!" << std::endl;

	PAR
	mpi::map(world, test, square<int>());

	SEQ(world)
	std::cout << "Results:" << std::endl;
	for (uint i = 0; i < test.size(); i++)
	{
		std::cout << test[i] << std::endl;
	}

	PAR

	return 0;
}

You can test it like this: mpic++ mpitest.cpp -o mpitest -lboost_mpi && mpirun -np 3 ./mpitest

Obviously there is no performance gain in this case but there wasn’t supposed to be one 😉 If you know regular Open MPI and boost::mpi, after reading the code above you can surely appreciate higher (more OpenMP-like) expressiveness of this new approach. The code is BSD, as usual. Have fun.

1 Comment


Leave a Reply

Your email address will not be published. Required fields are marked *