Sunday, November 18, 2012

Secret Santa - again

Based on comments by cellocgw I decided to look at last week's Secret Santa again. This time, the moment a person, whoever that is, draws his/her own name, the drawing starts again at the first person.

Introduction

A group of n persons draws sequentially names for Secret Santa. Each person may not draw his/her own name. If a person draws his/her own name then all names are returned and the drawing starts again. Questions are such as: How often do you draw names?

Simulation 

To understand the problem a simulation is build. The simulation draws the names, with restarts if needed. These are placed into outcome. On top of that it counts where a person self-draws.
selsan1 <- function(persons) {
  startoutcome <- rep(0,persons)
  countstop <- rep(0,persons)
  outcome <- startoutcome
  done <- FALSE
  possible <- 1:persons
  who <- 1           
  while(!done) {
    remaining <- possible[!(possible %in% outcome)]
    if (length(remaining)==1) {
      if (who==remaining) {
        countstop[who] <- countstop[who]+1
        outcome <- startoutcome
        who <- 1
      } else {
        outcome[who] <- remaining
        who <- who+1
        done <- TRUE
      }
    }     else {
      select <- sample(possible[!(possible %in% outcome)],1)
      if (who==select) {
        countstop[who] <- countstop[who]+1
        outcome <- startoutcome
        who <- 1
      } else {
        outcome[who] <- select
        who <- who+1
      }
    }
  }     
  return(list(outcome=outcome,countstop=countstop))
}
In an unlucky draw for five persons there were four restarts, at person 1, 2, 3 and 5.
selsan1(5)
$outcome
[1] 2 1 5 3 4

$countstop
[1] 1 1 1 0 1
The aim of the simulation is to do this a lot of times and draw conclusions from there. 
nrep=1e5
simulations <- lapply(1:nrep,function(x) selsan1(5))
The first question concerns the outcomes. There are 44 allowed results. They are obtained about equally often.
outcomes <- sapply(simulations,function(x) paste(x$outcome,collapse=' '))
as.data.frame(table(outcomes))
    outcomes Freq
1  2 1 4 5 3 2301
2  2 1 5 3 4 2263
3  2 3 1 5 4 2192
4  2 3 4 5 1 2323
5  2 3 5 1 4 2268
6  2 4 1 5 3 2230
7  2 4 5 1 3 2280
8  2 4 5 3 1 2242
9  2 5 1 3 4 2203
10 2 5 4 1 3 2247
11 2 5 4 3 1 2327
12 3 1 2 5 4 2321
13 3 1 4 5 2 2243
14 3 1 5 2 4 2183
15 3 4 1 5 2 2346
16 3 4 2 5 1 2264
17 3 4 5 1 2 2216
18 3 4 5 2 1 2269
19 3 5 1 2 4 2301
20 3 5 2 1 4 2359
21 3 5 4 1 2 2301
22 3 5 4 2 1 2206
23 4 1 2 5 3 2291
24 4 1 5 2 3 2249
25 4 1 5 3 2 2263
26 4 3 1 5 2 2321
27 4 3 2 5 1 2265
28 4 3 5 1 2 2318
29 4 3 5 2 1 2329
30 4 5 1 2 3 2220
31 4 5 1 3 2 2309
32 4 5 2 1 3 2228
33 4 5 2 3 1 2291
34 5 1 2 3 4 2284
35 5 1 4 2 3 2314
36 5 1 4 3 2 2304
37 5 3 1 2 4 2339
38 5 3 2 1 4 2213
39 5 3 4 1 2 2285
40 5 3 4 2 1 2194
41 5 4 1 2 3 2255
42 5 4 1 3 2 2336
43 5 4 2 1 3 2218
44 5 4 2 3 1 2289
The number of times a redraw is taken can also be obtained. 
countstop <- t(sapply(simulations,function(x) x$countstop))
table(rowSums(countstop))/nrep
      0       1       2       3       4       5       6       7       8       9 
0.36938 0.23036 0.14788 0.09337 0.05929 0.03673 0.02220 0.01440 0.00988 0.00603 
     10      11      12      13      14      15      16      17      18      19 
0.00396 0.00240 0.00168 0.00089 0.00053 0.00041 0.00021 0.00014 0.00012 0.00005 
     20      21      22      28 
0.00004 0.00003 0.00001 0.00001 
We can also extract where the redraws occur. In general there is 0.5 redraw because of the first person, 0.4 because of the second, etc. The numbers do not add to 1, they are not chances. 
colSums(countstop)/nrep
[1] 0.54276 0.40735 0.31383 0.24834 0.20415

Calculations

The question is, can we achieve the same with a calculation. For this we obtain the chances of various results. For this we build three matrices. All permutations in pp. Continuation of a sequence is in permitted. Finally, redraw contains the person where a person causes a new draw. Trick here is that if person 2 causes a redraw, then no subsequent persons cause a redraw, hence only 2 is marked in redraw.
pp <- randtoolbox::permut(nn)
redraw <- matrix(FALSE,nrow(pp),nn)
permitted <- redraw
redraw[,1] <- pp[,1] ==1
permitted[,1] <- pp[,1]!=1

for(i in 2:nn) {
  permitted[,i] <- pp[,i]!=i & permitted[,i-1]
  redraw[,i] <- pp[,i] == i & permitted[,i-1]
}
The sequences start like this.
head(pp)
     i i i    
[1,] 5 4 3 1 2
[2,] 5 4 3 2 1
[3,] 5 4 1 3 2
[4,] 5 4 2 3 1
[5,] 5 4 1 2 3
[6,] 5 4 2 1 3
head(permitted)
     [,1] [,2]  [,3]  [,4]  [,5]
[1,] TRUE TRUE FALSE FALSE FALSE
[2,] TRUE TRUE FALSE FALSE FALSE
[3,] TRUE TRUE  TRUE  TRUE  TRUE
[4,] TRUE TRUE  TRUE  TRUE  TRUE
[5,] TRUE TRUE  TRUE  TRUE  TRUE
[6,] TRUE TRUE  TRUE  TRUE  TRUE
head(redraw)
      [,1]  [,2]  [,3]  [,4]  [,5]
[1,] FALSE FALSE  TRUE FALSE FALSE
[2,] FALSE FALSE  TRUE FALSE FALSE
[3,] FALSE FALSE FALSE FALSE FALSE
[4,] FALSE FALSE FALSE FALSE FALSE
[5,] FALSE FALSE FALSE FALSE FALSE
[6,] FALSE FALSE FALSE FALSE FALSE

Chance of succes

The chance of a success drawing is the mean of the last column in permitted. Below a comparison with the simulation result. First the observed proportions.
byrow <- as.data.frame(table(rowSums(countstop))/nrep)
head(byrow)
  Var1    Freq
1    0 0.36938
2    1 0.23036
3    2 0.14788
4    3 0.09337
5    4 0.05929
6    5 0.03673
Now the matching calculation. The numbers can be calculated easily.
(p.succes <- mean(permitted[,nn]))
[1] 0.3666667
byrow$n <- as.numeric(levels(byrow$Var1)[byrow$Var1])
byrow$p <- sapply(byrow$n,function(x) p.succes*(1-p.succes)^x)
byrow[,c(1,3,4,2)]
   Var1  n            p    Freq
1     0  0 3.666667e-01 0.36938
2     1  1 2.322222e-01 0.23036
3     2  2 1.470741e-01 0.14788
4     3  3 9.314691e-02 0.09337
5     4  4 5.899305e-02 0.05929
6     5  5 3.736226e-02 0.03673
7     6  6 2.366277e-02 0.02220
8     7  7 1.498642e-02 0.01440
9     8  8 9.491398e-03 0.00988
10    9  9 6.011219e-03 0.00603
11   10 10 3.807105e-03 0.00396
12   11 11 2.411167e-03 0.00240
13   12 12 1.527072e-03 0.00168
14   13 13 9.671458e-04 0.00089
15   14 14 6.125256e-04 0.00053
16   15 15 3.879329e-04 0.00041
17   16 16 2.456908e-04 0.00021
18   17 17 1.556042e-04 0.00014
19   18 18 9.854933e-05 0.00012
20   19 19 6.241457e-05 0.00005
21   20 20 3.952923e-05 0.00004
22   21 21 2.503518e-05 0.00003
23   22 22 1.585561e-05 0.00001
24   28 28 1.023239e-06 0.00001

Where fall the redraws

This is actually a more difficult calculation (or I forgot too much probability). Luckily a bit of brute force comes in handy. To reiterate, in general simulated data shows 0.54 redraws because of the first person etc.
colSums(countstop)/nrep
[1] 0.54276 0.40735 0.31383 0.24834 0.20415
So, what happens in a drawing? The outcomes follow from the matrix redraw. There is 20% chance the first person draws 1, 25% chance the second person draws a 2 etc. Finally, as established, the chance is 36% to have a good draw.
(p.onedraw <- c(colSums(redraw)/nrow(redraw),p.succes) )
[1] 0.20000000 0.15000000 0.11666667 0.09166667 0.07500000 0.36666667
The function below takes these numbers and a locator vector to return a data frame with chances, location of fail and success status in column finish
one.draw <- function(status.now,p.now,p.onedraw) {
  la <- lapply(1:(nn+1),function(x) {
        status <- status.now
        if (x>nn) finish=TRUE
        else {          
          status[x] <- status[x] +1
          finish=FALSE}
        list(status=status,p=p.onedraw[x],finish=finish)
      })
  res <- as.data.frame(do.call(rbind, lapply(la,function(x) x$status)))
  res$p <- sapply(la,function(x) x$p*p.now)
  res$finish <- sapply(la,function(x) x$finish)
  res
}
status.now <- rep(0,nn)
names(status.now) <- paste('person',1:5,sep='')
od <- one.draw(status.now,1,p.onedraw)
od
  person1 person2 person3 person4 person5          p finish
1       1       0       0       0       0 0.20000000  FALSE
2       0       1       0       0       0 0.15000000  FALSE
3       0       0       1       0       0 0.11666667  FALSE
4       0       0       0       1       0 0.09166667  FALSE
5       0       0       0       0       1 0.07500000  FALSE
6       0       0       0       0       0 0.36666667   TRUE
Where the sequence is not finished, the same chances apply again. For this a second function is build. Same outcomes are combined to restrict the number of outcomes.
one.draw.wrap <- function(x,p.pnedraw) {
  if (x$finish) return(x)
  one.draw(x[grep('person',names(x))],x$p,p.onedraw)
  }
  
new.draw <- function(od,p.onedraw) {
  todo <- od[!od$finish,]
  done <- od[od$finish,]
  la <- lapply(1:nrow(todo),function(x) one.draw.wrap(todo[x,],p.onedraw))
  snd <- do.call(rbind,la)
  snd <- snd[do.call(order,snd),]
  i <- 1
  while(i<nrow(snd) ) {
    if(all(snd[i,!(colnames(snd)=='p')] == snd[i+1,!(colnames(snd)=='p')])) {
      snd$p[i] <- snd$p[i] + snd$p[i+1]
      snd <- snd[-(i+1),]
    }
    else i=i+1
  }
  snd <- rbind(snd, done)
  snd[order(-snd$p),]
head(new.draw(od))

   person1 person2 person3 person4 person5          p finish
61       0       0       0       0       0 0.36666667   TRUE
6        1       0       0       0       0 0.07333333   TRUE
2        1       1       0       0       0 0.06000000  FALSE
25       0       1       0       0       0 0.05500000   TRUE
3        1       0       1       0       0 0.04666667  FALSE
35       0       0       1       0       0 0.04277778   TRUE
There are still quite some unfinished drawings, so we can redo that a few times. This is where a fast processor is practical.
myfin <- list(od=od)
for (i in 1:15) myfin[[i+1]] <- new.draw(myfin[[i]])
Even after 15 steps there are some unfinished sequences. Nevertheless I stop here.
xtabs(p ~finish,data=myfin[[15]])
finish
      FALSE        TRUE 
0.001057999 0.998942001 
In the finished series there are in on average 0.54 redraws at person 1. That is pretty close to 0.54 from the simulation.
with(myfin[[15]],sum(person1[finish]*p[finish]))
[1] 0.5398659
However, the 0.1% unfinished sequences would contribute a lot. Even if they were finished at step 15.
with(myfin[[15]],sum(person1*p))
0.5448775


No comments:

Post a Comment