Saturday, November 24, 2012

Secret Santa - unfinished business

Last week I wrote:
This is actually a more difficult calculation (or I forgot too much probability). Luckily a bit of brute force comes in handy. To reiterate, in general simulated data shows 0.54 redraws because of the first person etc.
[1] 0.54276 0.40735 0.31383 0.24834 0.20415
In this last week I figured out how to finish this more elegantly/without too much brute force.
To reiterate, the chance within a particular round of drawing names are:
(p.onedraw <- c(colSums(redraw)/nrow(redraw),p.succes) )
[1] 0.200 0.150 0.117 0.0917 0.075 0.367
This means that a 'round' of secret santa with 5 persons can have 6 outcomes: The first person draws self, the second draws self, same for person three , four and five. Sixth outcome: a valid drawing of names. If a person self draws, a next round occurs. This means an infinite number of rounds and drawings may occur. The brute force solution was to follow a lot of solutions and do this again and again. But, we can calculate the probability of of a finish after twice a fail at person 1 and three times fail at person 3. It is the chance of this happening: 0.22*0.1173*0.367 times the number of ways this can happen: (2+3)!/(2!*3!). Obviously, this is a generalization of some of the formulas for binomial. 
Now, to do this for all reasonable possibilities of results. The latter may be obtained e.g. by 
ncalc <- 8 xx <- expand.grid(0:ncalc,0:ncalc,0:ncalc,0:ncalc,0:ncalc)
[1] 59049

Implementation phase 1

There are quite a number of probabilities to calculate, and as this will be wholesale, every shortcut is welcome. The main approach is to do the whole calculation on a logarithmic scale and transform back at the end. 
With a bit of preparation log(factorial) is just getting a number out of a vector.
lnumb <- c(0,log(1:40))
clnumb <- cumsum(lnumb)
lfac <- function(x) clnumb[x+1]
[1] 24
to store logs for chances: 
lp1d <- log(p.onedraw)
Combine it all into functions
lpoccur <- function(x,lpvals) {
  x <- as.numeric(x)
  crossprod(c(x,1),lpvals) + lfac(sum(x))-sum(sapply(x,lfac))
poccur <- function(x,lpvals) exp(lpoccur(x,lpvals))
And call it over the matrix xx
chances <- sapply(1:nrow(xx),function(x) poccur(xx[x,],lp1d))
newcalc.res <-,chances))
newcalc.res <- newcalc.res[order(-newcalc.res$chances),]
     Var1 Var2 Var3 Var4 Var5    chances
1       0    0    0    0    0 0.36666667
2       1    0    0    0    0 0.07333333
10      0    1    0    0    0 0.05500000
82      0    0    1    0    0 0.04277778
730     0    0    0    1    0 0.03361111
6562    0    0    0    0    1 0.02750000

Implementation phase 2

The idea was to not store matrix xx, but rather generate the rows on the fly, because I thought xx would be too large. But is seems that is not needed and a suitable xx fits into memory. 
[1] 0.9998979
Hence the calculation can be vectorised a bit more, reducing calculation time again:
ntrial <- rowSums(xx)
accu <- lfac(ntrial) + crossprod(t(as.matrix(xx)),lp1d[1:(length(lp1d)-1)])
for (i in 1:ncol(xx) ) accu <- accu-lfac(xx[,i])
accu <- accu+lp1d[length(lp1d)]
chance2 <- exp(accu)
newcalc2.res <- cbind(xx,chance2)
newcalc2.res <- newcalc2.res[order(-newcalc2.res$chance2),]
So, the values from simulation are recreated
[1] 0.5445788
[1] 0.2044005
Calculation done! In the end it was not so much a difficult or long calculation. However, a simulation is much easy to construct and clearly pointed out directions.


  1. Please note that for readers of r-bloggers, they have no clue what post you are referring to.

    Please make sure to give a link back to yourself next time :)

    I've updated your post to reflect that.