If you’ve ever used SVM for classification of 3D images, you’re probably a big fan of the kernel trick as it improves SVM performance significantly when working with such huge sets of features. I’m working in neuroscience at the moment and I think it’s fair to say that using SVM in this manner is very commonplace. So far I’ve been using the “Kernel from Images” utility from DARTEL toolbox of SPM8, however I found it quite unsatisfactory in terms of performance. Initially I blamed MATLAB and thought that it was caused by all the overhead of MATLAB’s interpreted language. However after a short series of tests it turned out that actually problem is more I/O- and memory-access-bound. The I/O part can hardly be helped because having 3000 images, 8MB each, it just has to be read into memory at least once. Then if you have it in memory you have to compute the dot product. Every pair of voxels has to be multiplied and the results accumulated. Using multiple threads helps only so much as you hit the memory-access speed limit pretty soon. In this case running more than two threads didn’t bring much benefit. Whatever remains is real CPU computation and this can be optimized by using SSE4 instruction DPPS to compute a dot product of two 4-element vectors. Using this combination I managed to reduce running time from 45 minutes using K.F.I. to 16 minutes using fast_dotprod. It’s useful to me. I hope it helps you as well. Cheers!
g++ -fno-strict-aliasing -O99 -msse4 -fopenmp fast_dotprod.cpp -o fast_dotprod
./fast_dotprod list_of_files.txt output.raw
list_of_files.txt has to contain single-quoted file names in each line. Lines that do not start and end with a single quote are ignored. Output is in raw float format. To read it in MATLAB I use:
f = fopen(‘output.raw’, ‘rb’); Phi = fread(f, ‘float32’); Phi = reshape(x, sqrt(numel(x)), sqrt(numel(x)));