Data structure
The starting data looks like this:
head(Jold)
Seizoen Datum Thuisclub Uitclub Thuisscore Uitscore
1 2011-2012 2011-08-05 Excelsior Feyenoord 0 2
2 2011-2012 2011-08-06 RKC Waalwijk Heracles Almelo 2 2
3 2011-2012 2011-08-06 Roda JC FC Groningen 2 1
4 2011-2012 2011-08-06 SC Heerenveen NEC 2 2
5 2011-2012 2011-08-06 VVV-Venlo FC Utrecht 0 0
6 2011-2012 2011-08-07 ADO Den Haag Vitesse 0 0
The data has to be made ready to be placed into JAGS. All factors have to be made numbers. Note that each game is still one row. The two outcomes will be resolved in JAGS
JAGSData <- with(Jold,list(
N=length(Datum),
nclub = nlevels(Thuisclub),
HomeClub=(1:nlevels(Thuisclub))[Thuisclub],
OutClub=(1:nlevels(Uitclub))[Uitclub],
HomeMadeGoals=Thuisscore,
OutMadeGoals=Uitscore))
str(JAGSData)
List of 6
$ N : int 306
$ nclub : int 18
$ HomeClub : int [1:306] 5 14 15 16 18 1 3 4 11 7 ...
$ OutClub : int [1:306] 9 10 6 12 8 17 13 2 7 3 ...
$ HomeMadeGoals: int [1:306] 0 2 2 2 0 0 3 1 0 2 ...
$ OutMadeGoals : int [1:306] 2 2 1 2 0 0 1 4 1 0 ...
Model
The model in JAGS has to contain all the details and all distributions. That makes it much longer, but also gives the opportunity to fiddle around with details. For now it is reasonably simple. Each club has two properties, an attack strength (Astr) and a defense strength (Dstr). Based on the combination of these strengths is the number of goals, again Poisson distributed. The interesting part is in the way attack and defense strength are constructed. These are created from an overall strength (TopStr) and an attack/defense balance (AD).fbmodel1 <- function() {
for (i in 1:N) {
HomeMadeGoals[i] ~ dpois(HS[i])
OutMadeGoals[i] ~ dpois(OS[i])
log(HS[i]) <- Home + Astr[HomeClub[i]] + Dstr[OutClub[i]]
log(OS[i]) <- Dstr[HomeClub[i]] + Astr[OutClub[i]]
}
for (i in 1:nclub) {
Astr[i] <- TopStr[i]+AD[i]
Dstr[i] <- TopStr[i]-AD[i]
TopStr[i] ~ dnorm(0,tauTop)
AD[i] ~ dnorm(0,tauAD)
}
tauAD <- pow(sigmaAD,-2)
sigmaAD ~ dunif(0,100)
tauTop <- pow(sigmaTop,-2)
sigmaTop ~ dunif(0,100)
Home ~ dnorm(0,.0001)
}
To make this model work it needs a series on incantations in R. To keep things short only a basic output plot is given
params <- c("TopStr","AD","sigmaAD","sigmaTop","Home")
inits <- function(){
list(TopStr=rnorm(JAGSData$nclub),
AD=rnorm(JAGSData$nclub),
sigmaAD=runif(1),
sigmaTop=runif(0,.5),
Home=rnorm(1)
)
}
jagsfit <- jags(JAGSData, model=fbmodel1, inits=inits,
parameters=params,progress.bar="gui",
n.iter=4000)
plot(jagsfit)
inits <- function(){
list(TopStr=rnorm(JAGSData$nclub),
AD=rnorm(JAGSData$nclub),
sigmaAD=runif(1),
sigmaTop=runif(0,.5),
Home=rnorm(1)
)
}
jagsfit <- jags(JAGSData, model=fbmodel1, inits=inits,
parameters=params,progress.bar="gui",
n.iter=4000)
plot(jagsfit)
Parameter extraction
As the interest is in the teams, their properties are extracted. As Gianluca wrote we considered a slightly more complex structure in which we included information on each team's propensity to be "good", "average", or "poor". This helped avoid overshrinkage in the estimations To examine this I looked at the teams relative scores.The plots look like they have a shoulder, so there is a point there.
jags.smat <- jagsfit$BUGSoutput$summary
for (i in 1:nlevels(Jold$Thuisclub))
rownames(jags.smat) <- sub(paste('[',i,']',sep=''),levels(Jold$Thuisclub)[i],
rownames(jags.smat),fixed=TRUE)
plot(density(jags.smat[grep('^Top',rownames(jags.smat)),1])
,main='Distribution of Strength')
points(x=jags.smat[grep('^Top',rownames(jags.smat)),1],
y=rep(0,nlevels(Jold$Thuisclub)))
plot(density(jags.smat[grep('^AD',rownames(jags.smat)),1]),
main='Distribution of Attack / Defense')
points(x=jags.smat[ grep('^AD',rownames(jags.smat)),1],
y=rep(0,nlevels(Jold$Thuisclub)))
model variation
There are a number of ways to create distributions for attack and defense strength (Astr and Dstr). One variation is to make them independent, but also to give them fatter tails, which is handled by a t distribution rather than a normal distribution. This fatter tail could be accommodate teams much better or worse than the majority. Unfortunately this did not work, the number of degrees of freedom for the t distribution varied a lot and was too high to make this an alternative model.
# section replaced in fbmodel1
for (i in 1:nclub) {
Astr[i] ~ dt(0,tauStr,nuStr)
Dstr[i] ~ dt(0,tauStr,nuStr)
}
tauStr <- pow(sigmaStr,-2)
sigmaStr ~ dunif(0,100)
nuStr <- 1/InuStr
InuStr ~ dunif(0,.5)
No comments:
Post a Comment