Wiekvoet: Secret Santa - unfinished business

Last week I wrote:
This is actually a more difficult calculation (or I forgot too much probability). Luckily a bit of brute force comes in handy. To reiterate, in general simulated data shows 0.54 redraws because of the first person etc.

colSums(countstop)/nrep

[1] 0.54276 0.40735 0.31383 0.24834 0.20415

In this last week I figured out how to finish this more elegantly/without too much brute force.

To reiterate, the chance within a particular round of drawing names are:

(p.onedraw <- c(colSums(redraw)/nrow(redraw),p.succes) )

[1] 0.200 0.150 0.117 0.0917 0.075 0.367

This means that a 'round' of secret santa with 5 persons can have 6 outcomes: The first person draws self, the second draws self, same for person three , four and five. Sixth outcome: a valid drawing of names. If a person self draws, a next round occurs. This means an infinite number of rounds and drawings may occur. The brute force solution was to follow a lot of solutions and do this again and again. But, we can calculate the probability of of a finish after twice a fail at person 1 and three times fail at person 3. It is the chance of this happening: 0.2²*0.117³*0.367 times the number of ways this can happen: (2+3)!/(2!*3!). Obviously, this is a generalization of some of the formulas for binomial.

Now, to do this for all reasonable possibilities of results. The latter may be obtained e.g. by

ncalc <- 8 xx <- expand.grid(0:ncalc,0:ncalc,0:ncalc,0:ncalc,0:ncalc)
nrow(xx)

[1] 59049

Implementation phase 1

There are quite a number of probabilities to calculate, and as this will be wholesale, every shortcut is welcome. The main approach is to do the whole calculation on a logarithmic scale and transform back at the end.

With a bit of preparation log(factorial) is just getting a number out of a vector.

lnumb <- c(0,log(1:40))

clnumb <- cumsum(lnumb)

lfac <- function(x) clnumb[x+1]

exp(lfac(4))

[1] 24

to store logs for chances:

lp1d <- log(p.onedraw)

Combine it all into functions

lpoccur <- function(x,lpvals) {

x <- as.numeric(x)

crossprod(c(x,1),lpvals) + lfac(sum(x))-sum(sapply(x,lfac))

}

poccur <- function(x,lpvals) exp(lpoccur(x,lpvals))

And call it over the matrix xx

chances <- sapply(1:nrow(xx),function(x) poccur(xx[x,],lp1d))

newcalc.res <- as.data.frame(cbind(xx,chances))

newcalc.res <- newcalc.res[order(-newcalc.res$chances),]

head(newcalc.res)

     Var1 Var2 Var3 Var4 Var5    chances

1       0    0    0    0    0 0.36666667

2       1    0    0    0    0 0.07333333

10      0    1    0    0    0 0.05500000

82      0    0    1    0    0 0.04277778

730     0    0    0    1    0 0.03361111

6562    0    0    0    0    1 0.02750000

Implementation phase 2

The idea was to not store matrix xx, but rather generate the rows on the fly, because I thought xx would be too large. But is seems that is not needed and a suitable xx fits into memory.

sum(chances)

[1] 0.9998979

Hence the calculation can be vectorised a bit more, reducing calculation time again:

ntrial <- rowSums(xx)

accu <- lfac(ntrial) + crossprod(t(as.matrix(xx)),lp1d[1:(length(lp1d)-1)])

for (i in 1:ncol(xx) ) accu <- accu-lfac(xx[,i])

accu <- accu+lp1d[length(lp1d)]

chance2 <- exp(accu)

newcalc2.res <- cbind(xx,chance2)

newcalc2.res <- newcalc2.res[order(-newcalc2.res$chance2),]

So, the values from simulation are recreated

sum(newcalc2.res$Var1*newcalc2.res$chance2)

[1] 0.5445788

sum(newcalc2.res$Var5*newcalc2.res$chance2)

[1] 0.2044005

Calculation done! In the end it was not so much a difficult or long calculation. However, a simulation is much easy to construct and clearly pointed out directions.

Wiekvoet

Saturday, November 24, 2012

Secret Santa - unfinished business

Implementation phase 1

Implementation phase 2

2 comments: