Merge file sort and swap iterator

During my struggles with Project Euler Problem #302 I made an attempt to semi-brute force it. Unfortunately it required huge amounts of RAM (about 2e9 * 16 bytes) so I figured I could use some help from permanent storage. I implemented two helper classes, namely MergeSort and SwapIterator which look like this:

template<typename Record> class MergeSort
{
public:
    int sort(const string &inFileName, const string &outFileName, size_t blockSize);
};

template<typename Record> class SwapIterator
{
public:
    SwapIterator(const string &fileName, size_t blockSize);
    SwapIterator(FILE *handle, size_t blockSize);
    ~SwapIterator();
    bool opened();
    bool eof();
    const Record& next();
    const vector<Record>& nextBlock();
    size_t available();
    void reset();
};

In both cases blockSize means the number of Record elements which will be processed in-memory at a time.

MergeSort::sort() first splits the source file into several sorted files of size equal to or less than blockSize * sizeof(Record). Then a parallel merging algorithm is run on all the sorted fragments producing the final sorted output file. This way you can sort amounts of data which many times exceed the size of your RAM.

SwapIterator can be used for example to read back the huge amounts of sorted data, reading data in chunks and then providing single instances of Record.

All in all it’s a pretty nice solution that wasn’t readily available on the net. I’m not sure if it can help me with PE#302 but I feel it was worthwhile anyway 😉

I will publish the source code as soon as I finish that goddamn PE#302 😉

Leave a Reply

Your email address will not be published. Required fields are marked *