2009/10/12

Matlab vectorisation

Wikipedia tells me that Donald Knuth said:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

(I figure you don’t need a citation when you mention “Wikipedia” in the sentence.)

Back in the old days, the fast way to do things in Matlab was to use “vectorised” code which operated on entire array rather than the individual elements; loops were the devil. More recently (2003-ish), Matlab gained a just-in-time compiler, eliminating the old bottleneck. (Update: Thanks, Ben, for pointing out my mistake there; not sure what I was thinking when I wrote 2007. Perhaps I didn’t get access to Matlab-with-JIT until some time later; I forget.)

But you still sometimes see advice to use vectorised code whenever possible. In short, this is a bad idea on performance grounds alone.

For example, the above-linked advice gave the trivial example:

% Extremely slow:
for i = 1:length(x)
   x(i) = 2*x(i);
end

% Extremely fast:
x = 2*x;

Interested, I tested this out.

[Update: Ha, “I tested this out” completely incorrectly, because I was hasty and hadn’t used Matlab in a while. So don’t mind me on that particular point. However, the following still stands:]

Your rule of thumb should be: write the code that makes the most sense when you’re writing it. If it’s slow, try and fix it then. Vectorised code can get damned hard to write and harder to read. It’s only worth it if it saves you real time running the code. And I’m talking hours and hours of time difference here.

When you write x=2*x, you should do so simply because that’s the clear logical representation of the operation “multiply each element of x by two”. But just because you use vectorised code here doesn’t mean you always should.