Saturday, May 18, 2013

Bubble sort tuning

I was reading Paul Hiemsta's blogpost on Much more efficient bubble sort in R using the Rcpp and inline packages, went back to his first post  Bubble sort implemented in pure R and thought, surely we can do it better in pure R. So I cleaned inner loops, removed a recursion made some variations and finally made a big improvement as I vectorized it. I don't need to try Rcpp to know that is and will be faster, but surely closed the gap a bit.

Original code

This is how it was originally in the blog in pure R. If there are changes it is layout.

larger = function(pair) {
  if(pair[1] > pair[2]) return(TRUE) else return(FALSE)
}
swap_if_larger = function(pair) {
  if(larger(pair)) {
    return(rev(pair)) 
  } else {
    return(pair)
  }
}
swap_pass = function(vec) { 
  for(i in seq(1, length(vec)-1)) {
    vec[i:(i+1)] = swap_if_larger(vec[i:(i+1)])
  }
  return(vec)
}
bubble_sort = function(vec) {
  new_vec = swap_pass(vec)
  if(isTRUE(all.equal(vec, new_vec))) { 
    return(new_vec) 
  } else {
    return(bubble_sort(new_vec))
  }
}

Improve inside of loops

This is essentially the same code, but cleaned the first two functions. 
larger1 = function(pair) pair[1] > pair[2]

swap_if_larger1 = function(pair) {
  if(larger1(pair)) rev(pair) 
  else pair
}

swap_pass1 = function(vec) { 
  for(i in seq(1, length(vec)-1)) {
    vec[i:(i+1)] = swap_if_larger1(vec[i:(i+1)])
  }
  return(vec)
}

bubble_sort1 = function(vec) {
  new_vec = swap_pass1(vec)
  if(isTRUE(all.equal(vec, new_vec))) { 
    return(new_vec) 
  } else {
    return(bubble_sort1(new_vec))
  }
}

Improve outside loops

I cleaned the outside loop, then decided that the two inside functions were not needed any more, because it was reduced to one statement.
swap_pass2 = function(vec) { 
  for(i in 1:(length(vec)-1)) {
    if (vec[i] > vec[i+1] ) vec[c(i,i+1)] <- vec[c(i+1,i)]
  }
  return(vec)
}

bubble_sort2 = function(vec) {
  new_vec = swap_pass2(vec)
  if(identical(vec, new_vec)) new_vec 
  else bubble_sort2(new_vec)
}

No recursion

This was actually not intended a speed update, but R was complaining about too deep recursion. The same swapp_pass2 as before is used. 
bubble_sort3 = function(vec) {
  new_vec = swap_pass2(vec)
  while (!identical(vec, new_vec)) {
    vec <- new_vec
    new_vec = swap_pass2(vec)
  } 
  new_vec 
}

A different setup

The way I understood bubblesort is different from what is programmed so far. So far the sort is completed when no improvements are made, which is checked via a vector comparison. I always understood you don't check, the first bubble goes to the end, at which point the last element is the maximum. The second bubble then stops at end-1 etc. The final bubble is only the first two elements, after which the algorithm is finished. 
swap_pass4 = function(vec,iend) { 
  for(i in 1:iend) {
    if (vec[i] > vec[i+1] ) vec[c(i,i+1)] <- vec[c(i+1,i)]
  }
  return(vec)
}
bubble_sort4 = function(vec) {
  for (iend in (length(vec)-1):1) vec <- swap_pass4(vec,iend)
  vec
}

Tuning Paul's original algorithm

I was a bit disappointed with the improvements from using the true bubble sort. Apparently there is some gain by checking if the process is completed. Which is logical since the bubbles do some sorting of the intermediate elements. So, what about bubbles that go up and go down combined with a check on no improvements?
swap_pass3b = function(vec) { 
  for(i in length(vec):2) {
    if (vec[i] < vec[i-1] ) vec[c(i,i-1)] <- vec[c(i-1,i)]
  }
  return(vec)
}

bubble_sort3b = function(vec) {
  new_vec = swap_pass2(vec)
  while (!identical(vec, new_vec)) {
    new_vec = swap_pass2(vec)
    vec <- swap_pass3b(new_vec)
  } 
  vec 
}

Vectorizing the bubbles

When you look at the bubblesort without checking, it seems it is only assumed that the first step brings the highest element at the last position. This can be achieved by just pulling the highest element and placing it at the end, without the intermediate swaps.
bubble_sort5 = function(vec) {
  wm <- which.max(vec)
  vec <- c(vec[-wm],vec[wm])
  for (iend in ((length(vec)-1):2)) {
    wm <- which.max(vec[1:iend])
    vec <- c(vec[1:iend][-wm],vec[1:iend][wm],vec[(iend+1):length(vec)])
  }
  vec
}

Results

First a demonstration that they work;
test_vec = round(runif(10, 0, 100))
cbind(
    bubble_sort(test_vec),
    bubble_sort1(test_vec),
    bubble_sort2(test_vec),
    bubble_sort3(test_vec),
    bubble_sort4(test_vec),
    bubble_sort3b(test_vec),
    bubble_sort5(test_vec)
)
      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
 [1,]   28   28   28   28   28   28   28
 [2,]   41   41   41   41   41   41   41
 [3,]   42   42   42   42   42   42   42
 [4,]   46   46   46   46   46   46   46
 [5,]   52   52   52   52   52   52   52
 [6,]   68   68   68   68   68   68   68
 [7,]   89   89   89   89   89   89   89
 [8,]   93   93   93   93   93   93   93
 [9,]   98   98   98   98   98   98   98
[10,]  100  100  100  100  100  100  100

Speed

Now the timing. Cleaning the outer loops saves a third of the time. The bubble as I remembered it and the up-and-down bubble improve a bit. We are now at 13% of the original time. The final version, which uses the vectorization, blows them all out of the water. 
test_vec = runif(1000, 0, 100)
system.time(bubble_sort(test_vec))
#   user  system elapsed 
#  25.48    0.02   25.69 
system.time(bubble_sort1(test_vec))
#   user  system elapsed 
#  17.88    0.00   17.91 
system.time(bubble_sort2(test_vec))
#   user  system elapsed 
#   4.75    0.00    4.75 
system.time(bubble_sort3(test_vec))
#   user  system elapsed 
#   4.75    0.00    4.74 
system.time(bubble_sort4(test_vec))
#   user  system elapsed 
#   3.45    0.00    3.44 
system.time(bubble_sort3b(test_vec))
#   user  system elapsed 
#   3.57    0.00    3.59 
system.time(bubble_sort5(test_vec))
#   user  system elapsed 
#   0.06    0.00    0.06 

How about sort()?

Sort is much faster and scales much better, even more so after subtracting the time for the sample() call.. But I don't need to do the exercise to know that a not optimal algorithm in an interpreted language cannot compete with modern algorithms in a compiled language.
library(microbenchmark)
vector_size = 100                                                                           
print(microbenchmark(bubble_sort5(sample(vector_size)),                                                                          
        sort(sample(vector_size)),
        sample(vector_size))  )
Unit: microseconds
                              expr      min       lq    median       uq
 bubble_sort5(sample(vector_size)) 1730.041 1776.957 1806.2795 1866.758
         sort(sample(vector_size))  104.096  114.359  126.4540  180.334
               sample(vector_size)    5.865    8.797    9.8965   11.729
      max neval
 9978.521   100
  233.849   100
   21.992   100

No comments:

Post a Comment