A brief comparison between for loop and vectorization in R
A short post to illustrate how vectorization in R is much faster than using the common for loop.
In this example I created two vectors a and b witch will take some random numbers.
I’ll compute the sum of a and b using the for loop and the vectorization approach and then compare the execution time taken by both of the different methods.
I’ll repeat this test 10 times with a different vector size, in the first test the vectors a and b contain only 10 elements but in the last test they contain 10 million elements.
Here is the code for the loop version, when i=1 n=10 so we loop 10 times and when i=10 n=10,000,000 hence we loop 10 million times.
I’ve stored the execution time taken for each test in the vector c.loop.time and I printed the last execution time when n=10 million. It took around 11 seconds to compute the sum of 10 millions values, let’s if we can do better with the vectorization method.
With the vectorization method it took only around 0.05 seconds just five hundredths of a second, this is a two hundreds time faster than the for loop version!
This massive difference is mainly because in the for loop method we’re doing the computation at the R level resulting in many more function calls that all need to be interpreted and compiled (especially the variable affectation which occurs 10 million times).
In the vectorization method the computation happens within the compiled code (C or Fortran I’m not too sure) and hence R has far less function to interpret and far fewer calls to compiled code.