The conclusion is that it is difficult, the amount of uncertainty is too large for parts of the data. Especially, the multiplicative effect is difficult to retrieve,
Setting up the data
The data setup is the same as the data input: I will take the same design as previously used which is the chocolate data of sensominer. I will use the same kind of parameters as obtained from the chocolate data.data(chocolates)
design <- sensochoc[,1:4]
# hyper parameters
sdPanelmean <- 0.978
sdRound <- 0.383
sdSessionPanelist <- 0.191
mu.lsdP <- 0.387
sigma.lsdP <- 0.323
sPanelistmean <- 1.06
sPanelistsd <- 0.22
Using equivalent formulas as in the data analysis, the responses are simulated. For explanations of the terms involved I refer to the posts created in May while building the model. I won't repeat the model here either.
nPanelist <- nlevels(sensochoc$Panelist)
nSession <- length(unique(sensochoc$Session))
nRound = length(unique(sensochoc$Rank))
mPanelist <- rnorm(nPanelist,sd=sdPanelmean)
mround <- rnorm(nRound,sd=sdRound)
mSessionPanelist <- rnorm(nPanelist*nSession,sd=sdSessionPanelist)
meanProduct <- c(6.98,6.60,4.84,6.36,6.66,6.48)
sPanelist <- rnorm(nPanelist,mean=1,sd=0.22)
grandmean <- mean(meanProduct)
mProduct <- meanProduct-grandmean
sdPanelist <- rlnorm(nPanelist,mean=mu.lsdP,sd=sigma.lsdP)
design$score <- grandmean + mPanelist[design$Panelist] +
mSessionPanelist[interaction(design$Session,design$Panelist)] +
sPanelist[design$Panelist]*
(mProduct[design$Product] + mround[design$Rank])
design$score <- design$score + rnorm(nrow(design),sd=sdPanelist[design$Panelist])
jagsfit <- jags(data=data_list,inits=inits,model.file=model.file,parameters.to.save=parameters,n.chains=4,DIC=FALSE,n.iter=20000)
Results
Hyperparameters
For the hyperparameters, the table shows the input data plus the resulting mean and sd of the sampler.
input | estimated mean | estimated sd | |
sdPanelmean | 0.978 | 0.883 | 0.206 |
sdRound | 0.383 | 0.262 | 0.142 |
sdSessionPanelist | 0.191 | 0.198 | 0.138 |
mu.lsdP | 0.387 | 0.332 | 0.08 |
sigma.lsdP | 0.323 | 0.358 | 0.076 |
It would seem from these data, that the model does reasonable, except for the two parameters involved in lsdP. To remind, these parameters are hyperparameters for the panelists individual accuracy.
Low level parameters
The low level parameters can be extracted and plotted with a structure such as this one:
ProductMean <- fitsummary$quantiles[
grep('meanProduct',rownames(fitsummary$quantiles)),]
colnames(ProductMean) <-c('x2.5','x25','x50','x75','x97.5')
ProductMean <- as.data.frame(ProductMean)
ProductMean <- cbind(meanProduct,ProductMean)
limits <- aes(ymax = ProductMean$x97.5, ymin=ProductMean$x2.5)
p <- ggplot(ProductMean, aes(y=x50, x=meanProduct))
p + geom_point() + geom_errorbar(limits, width=0.2) +
scale_y_continuous('Product mean') +
scale_x_continuous("Input mean") +
geom_abline(slope=1)
I feel sure people will forgive me for not repeating this scrips 4 times with different parameters.
Products mean scores
It would seem that the product means are retrieved reasonable well.
Panelist mean scores
Panelist mean scores are retrieved reasonable well. It appears the panelists mean is estimated so accurately, that some of the parameters are different from 0. The same happened in the chocolate data, which was used to provide hyper parameters for the simulation, which seems a good thing.
Panelist scale use (multiplicative factor)
The panelist scale is a difficult thing to estimate. On top of that, these parameters seem closer to each other than the parameters obtained from the chocolate data. The parameters also have a large error. When one thinks about this, it is not strange. Look at the product means. One product is quite different form the remainder. This means the score for this product is mainly determining a panelists scale. In regression terms, it is a leverage point. Even though every product is scored twice by each panelist, this is not good news for determining panelists scale usage. Maybe this is only possible when the products have quite different scores. But, in that case any model will do the trick of finding product differences. In case of small differences between products the model may not be performing better than a more simple model after all.
Panelist standard error
The panelists standard error can be retrieved reasonably well. The persons with the highest values obtained do indeed have high values. It could be better, but the persons who give the highest values are indeed the worst performers. It would seem that having this parameter is useful in the model, even though the hyper parameters are difficult to estimate.
Discussion
The model performs reasonably well. However, a case can be made of removing the panelists scale use parameters. The information does not seem to be there to estimate this for the relevant data. A parameter of panelists standard error is valuable. This is a bit disappointing. I wanted a structural component, which says this panelist has the systematic behaviour, but that seems to be too difficult. What I get is a model which says a bit about quality of panelists without ability to split this into its components.