Saturday, November 10, 2012

Secret Santa

On reddit somebody asked For n individuals, what's the probability that the last person to pick during a round of Secret Santa name picking, will pick their own name.. "With each person picking in turn, and re-picking if they pick out their own name.Two simulations were proposed. After some thinking I think it can be calculated rather easily.

Problem consideration

I think, it is possible to estimate the chance five people together draw any sequence. It is also possible to enumerate all sequences. Categorize the sequences, add their chances. Problem solved.

So, for example the sequence 2 5 1 3 4.
The first person has has 1/4 chance to select #2, as 2, 3, 4 and 5 can be selected with equal chances.
The second person has 1/4 chance to select #5, as 1, 3, 4 and 5 can be selected with equal chances.
The third person has 1/2 chance to select #1, as 1 or 4 can be selected.
The fourth person selects 3, as there is no alternative (4 results in a repick).
The fifth person selects 4, being the remaining number.
Chance to get this sequence: 1/32.
It is possible to recurse through all these trees, but why? After all we can easily enumerate all these sequences.
nn <-5
pp <- randtoolbox::permut(nn)
for(i in 1:(nn-1)) pp<- pp[!(pp[,i]==i),]
head(pp)
     i i i    
[1,] 5 4 1 3 2
[2,] 5 4 2 3 1
[3,] 5 4 1 2 3
[4,] 5 4 2 1 3
[5,] 5 3 4 1 2
[6,] 5 3 4 2 1
Then, the chances. For each column it is just 1/#possible_selections. That is one(!) statement.
la <- lapply(1:(nn-1),function(x) 1/rowSums(pp[,x:nn]!=x))
str(la)

List of 4
 $ : num [1:53] 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 ...
 $ : num [1:53] 0.333 0.333 0.333 0.333 0.333 ...
 $ : num [1:53] 0.5 0.5 0.5 0.5 0.333 ...
 $ : num [1:53] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1 1 ...
Having these chances, they must be multiplied
prob <- Reduce('*',la)
sum(prob)
[1] 1
Luckily they add to 1. So, what is the chance the last person draws a 5? 0.13, exactly the same as a simulation number found on reddit. 
sum(prob[pp[,nn]==nn])
[1] 0.1319444

To get a bit more results, we can put it in a function and get the answers up to 9 persons:

secsan1 <- function(nn) {
  pp <- randtoolbox::permut(nn)
  for(i in 1:(nn-1)) pp<- pp[!(pp[,i]==i),]
  la <- lapply(1:(nn-1),function(x) 1/rowSums(pp[,x:nn]!=x))
  prob <- Reduce('*',la)
  sum(prob[pp[,nn]==nn])
}
data.frame(n=3:9,sapply(3:9,secsan1))
  n sapply.3.9..secsan1.
1 3           0.25000000
2 4           0.13888889
3 5           0.13194444
4 6           0.11277778
5 7           0.10053241
6 8           0.09049461
7 9           0.08238237
These numbers are close to one of the simulations.

The second simulation shows the chances the last person has to draw a person.
secsan2 <- function(nn) {
  pp <- randtoolbox::permut(nn)
  for(i in 1:(nn-1)) pp<- pp[!(pp[,i]==i),]
  la <- lapply(1:(nn-1),function(x) 1/rowSums(pp[,x:nn]!=x))
  prob <- Reduce('*',la)
  xtabs(prob ~ pp[,nn])
}
as.data.frame(secsan2(8))
  pp...nn.       Freq
1        1 0.10485639
2        2 0.10729167
3        3 0.11072279
4        4 0.11591789
5        5 0.12471088
6        6 0.14283588
7        7 0.20316988
8        8 0.09049461
These numbers are not equal to simulation numbers on Reddit. This is because this simulation contains an error. It reads:
selsan <- function(who,persons) {
  if (length(persons)==1) return(persons)
  sel <- sample(persons[persons!=who],1)
  return(c(sel,selsan(who+1,persons[persons!=sel])))
}
#selsan(1,1:5)
finselsan <- function(n){
  selsan(1,1:n)[n]
}
nrep=1e4
sa <- sapply(1:nrep,function(x) finselsan(8))
table(sa)/nrep
What happens is that sample() is pulling a trick on us. Most of the time sample() gets some values and selects one of those. It is possible however, that the second last person has also himself to draw. At which point sample thinks only one number is available. To quote the manual: If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x. Note that this convenience feature may lead to undesired behaviour when x is of varying length in calls such as sample(x). See the examples. It is even documented. But who reads these all the time? A corrected simulation avoids this:
selsan <- function(who,persons) {

  if (length(persons)==1) return(persons)
  if (length(persons[persons!=who])==1) return(c(persons[persons!=who],who))
  sel <- sample(persons[persons!=who],1)
  return(c(sel,selsan(who+1,persons[persons!=sel])))
}
finselsan <- function(n){
  selsan(1,1:n)[n]
}
nrep=1e4
sa <- sapply(1:nrep,function(x) finselsan(8))
as.data.frame(table(sa)/nrep)
  sa   Freq
1  1 0.1033
2  2 0.1024
3  3 0.1108
4  4 0.1123
5  5 0.1270
6  6 0.1498
7  7 0.2024
8  8 0.0920
This shows the exact calculation gives similar results as two simulations.

Extensions

What are the chances for any person to get anybody? It is simple now.  
nn <- 6
pp <- randtoolbox::permut(nn)
for(i in 1:(nn-1)) pp<- pp[!(pp[,i]==i),]
la <- lapply(1:(nn-1),function(x) 1/rowSums(pp[,x:nn]!=x))
prob <- Reduce('*',la)
sapply(1:nn,function(x) xtabs(prob ~ factor(pp[,x],levels=1:nn)))
  [,1] [,2]      [,3]      [,4]      [,5]      [,6]
1  0.0 0.24 0.2250000 0.2083333 0.1875000 0.1391667
2  0.2 0.00 0.2375000 0.2194444 0.1972222 0.1458333
3  0.2 0.19 0.0000000 0.2388889 0.2138889 0.1572222
4  0.2 0.19 0.1791667 0.0000000 0.2500000 0.1808333
5  0.2 0.19 0.1791667 0.1666667 0.0000000 0.2641667
6  0.2 0.19 0.1791667 0.1666667 0.1513889 0.1127778
The persons are columns, the rows are presents. Obviously 0 on the diagonal, except the last person. Closer to the last person, the chances get more unequal.

Conclusion

Don't do this. Get everybody to draw, then determine if a redraw is needed.

5 comments:

  1. I'm not sure this works, since most of the trees are discarded when someone in [1:(j-1)] picks his own name. Next: if person[j-1] draws his own name, then he re-draws, and person[j] is guaranteed NOT to draw his own name. So I'm not convinced your comment on the reddit sim is correct.
    And finally, I'm assuming your "Conclusion" is a joke, as it clearly violates the problem definition.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. I am not sure I understand what you are saying. How does a redraw by person j-1 change the probability for person j? If person j-1 draws j-1, returns the name, then draws i, how is that different from drawing i at once?

      The reddit sim is incorrect. For the n-1 th person it can draw him/herself. This happens if the n-1th person has only n and n-1. In that case the n-1 th person gets sample(n,1). This can be anything from 1 to n. n-1 would be a violation of the rules. n would be ok. Anything else will get somebody a second present and the nth person the choice sample(n-1,1). Subsequently persons n+1 and maybe n+2 havve to buy presents.

      Delete
    3. The wording, as in all probability puzzles, is critical. You wrote, "...and re-picking if they pick out their own name...." . This means that the probability of persons 1 thru (n-1) is irrelevant, since they will re-pick until they get NOT(their_own_name). So all that matters is the probability that people (1) thru (n-2) do not pick name(n). If that's what you were saying, then I apologize for not following your argument correctly. I have noticed that various puzzle websites word this problem differently, e.g. "each person draws a name [no replacement]. What is the chance that nobody has his own name" , and other variants.

      Delete
  2. I think that's what I meant. English is a bit too sloppy to use exactly. Besides, it is a foreign language for me, I might have missed a nuance.

    Now I am wondering how the alternative might work out. If person x draws own name then everybody returns the names and it starts at person 1 again.

    ReplyDelete