Building the model
JAGS, (but also WinBugs and OpenBugs) are programs which can be used to provide samples from posterior distributions. It is up to the user to provide data and model to JAGS and interpret the samples. Luckily, R provides infrastructure to help both in setting up models and data and in posterior analysis of the samples. Still, there are some steps to be done, before the analysis can be executed; Setting up data, defining model, initializing variables and deciding which parameters of the model are interesting. Only then JAGS can be called.Setting up data
Data is any data which goes into JAGS. This includes experimental design, measurements, but also number of rows in the data. For this post I have added some extra data, since I want to compare differences between product means. As I want to compare those, I need to have samples from these specific distributions. All the data needs to go into one big list, which will be given to JAGS later on.
# get libraries
library(SensoMineR)
library(coda)
library(R2jags)
# infrastructure for contrasts
FullContrast <- function(n) {
UseMethod('FullContrast',n)
}
FullContrast.default <- function(n) stop('FullContrast only takes integer and factors')
FullContrast.integer <- function(n) {
mygrid <- expand.grid(v1=1:n,v2=1:n)
mygrid <- mygrid[mygrid$v1<mygrid$v2,]
rownames(mygrid) <- 1:nrow(mygrid)
as.matrix(mygrid)
}
FullContrast.factor <- function(n) {
FullContrast(nlevels(n))
}
# reading data
data(chocolates)
# setting up experimental data
data_list <- with(sensochoc,list(Panelist = as.numeric(Panelist),
nPanelist = nlevels(Panelist),
Product = as.numeric(Product),
nProduct = nlevels(Product),
y=CocoaA,
N=nrow(sensochoc)))
# contrasts
data_list$Panelistcontr <- FullContrast(sensochoc$Panelist)
data_list$nPanelistcontr <- nrow(data_list$Panelistcontr)
data_list$Productcontr <- FullContrast(sensochoc$Product)
data_list$nProductcontr <- nrow(data_list$Productcontr)
Model definition
The model can be written in 'plain' R and then given to JAGS. However, JAGS does not have vector operations, hence there are a lot of for loops which would be unacceptable for normal R usage. Besides the additive effects in the first part of the model, there are quite some extras. all the means in the model are coming out of hyperdistributions. In this case these are normal distributed. The precision (and hence variance) of these hyperdistributions are determined on basis of the data. The final part of the model translates the internal parameters into something which is sensible to interpret. The model also needs to be written into a file so JAGS can use it later on, these are the last two lines.mymodel <- function() {
# core of the model
for (i in 1:N) {
fit[i] <- grandmean + mPanelist[Panelist[i]] + mProduct[Product[i]] + mPanelistProduct[Panelist[i],Product[i]]
y[i] ~ dnorm(fit[i],tau)
}
# grand mean and residual
tau ~ dgamma(0.001,0.001)
gsd <- sqrt(1/tau)
grandmean ~ dnorm(0,.001)
# variable Panelist distribution
mPanelist[1] <- 0
for (i in 2:nPanelist) {
mPanelist[i] ~ dnorm(offsetPanelist,tauPanelist)
}
offsetPanelist ~ dnorm(0,.001)
tauPanelist ~ dgamma(0.001,0.001)
sdPanelist <- sqrt(1/tauPanelist)
# Product distribution
mProduct[1] <- 0
for (i in 2:nProduct) {
mProduct[i] ~ dnorm(offsetProduct,tauProduct)
}
offsetProduct ~ dnorm(0,0.001)
tauProduct ~ dgamma(0.001,0.001)
sdProduct <- sqrt( 1/tauProduct)
# interaction distribution
for (i in 1:nPanelist) {
mPanelistProduct[i,1] <- 0
}
for (i in 2:nProduct) {
mPanelistProduct[1,i] <- 0
}
for (iPa in 2:nPanelist) {
for (iPr in 2:nProduct) {
mPanelistProduct[iPa,iPr] ~dnorm(offsetPP,tauPP)
}
}
offsetPP ~dnorm(0,0.001)
tauPP ~dgamma(0.001,0.001)
sdPP <- 1/sqrt(tauPP)
# getting the interesting data
# true means for Panelist
for (i in 1:nPanelist) {
meanPanelist[i] <- grandmean + mPanelist[i] + mean(mPanelistProduct[i,1:nProduct]) + mean(mProduct[1:nProduct])
}
# true means for Product
for (i in 1:nProduct) {
meanProduct[i] <- grandmean + mProduct[i] + mean(mPanelistProduct[1:nPanelist,i]) + mean(mPanelist[1:nPanelist])
}
for (i in 1:nPanelistcontr) {
Panelistdiff[i] <- meanPanelist[Panelistcontr[i,1]]-meanPanelist[Panelistcontr[i,2]]
}
for (i in 1:nProductcontr) {
Productdiff[i] <- meanProduct[Productcontr[i,1]]-meanProduct[Productcontr[i,2]]
}
}
model.file <- file.path(tempdir(),'mijnmodel.txt')
write.model(mymodel,con=model.file)
Initializing variables
Anything values in the model which are not provided by the data, needs to be initialized. It is most convenient to setup a little model which can be used to get these values.inits <- function() list(
grandmean = rnorm(1,3,1),
mPanelist = c(0,rnorm(data_list$nPanelist-1)) ,
mProduct = c(0,rnorm(data_list$nProduct-1)) ,
mPanelistProduct = rbind(rep(0,data_list$nProduct),cbind(rep(0,data_list$nPanelist-1),matrix(rnorm((data_list$nPanelist-1)*(data_list$nProduct-1)),nrow=data_list$nPanelist-1,ncol=data_list$nProduct-1)))
,
tau = runif(1,1,2),
tauPanelist = runif(1,1,3),
tauProduct = runif(1,1,3)
)
parameters of interest and calling JAGS
The parameters of interest is basically anything which we want know anything about. The JAGS call, is just listing all the parts provided before to JAGS. In this case, the model runs fairly quick, so I decided to have some extra iterations (n.iter) and an extra chain. For this moment, I decided not to calculate DIC.parameters <- c('sdPanelist','sdProduct','gsd','meanPanelist',
'meanProduct','Productdiff','sdPP')
jagsfit <- jags(data=data_list,inits=inits,model.file=model.file,
parameters.to.save=parameters,n.chains=4,DIC=FALSE,n.iter=10000)
Result
The first part of the result can be obtained via a simple print of jagsfit
jagsfit
Inference for Bugs model at "/tmp/Rtmp0VFW9g/mijnmodel.txt", fit using jags,
4 chains, each with 10000 iterations (first 5000 discarded), n.thin = 5
n.sims = 4000 iterations saved
mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
Productdiff[1] 0.562 0.312 -0.037 0.356 0.558 0.773 1.186 1.001 4000
Productdiff[2] 2.303 0.317 1.697 2.087 2.299 2.512 2.932 1.001 4000
Productdiff[3] 1.741 0.314 1.118 1.530 1.748 1.948 2.346 1.001 3700
Productdiff[4] 0.836 0.307 0.224 0.630 0.838 1.044 1.424 1.002 2000
Productdiff[5] 0.274 0.303 -0.322 0.069 0.273 0.478 0.870 1.001 2600
Productdiff[6] -1.467 0.313 -2.081 -1.671 -1.465 -1.257 -0.865 1.001 2700
Productdiff[7] 0.339 0.308 -0.263 0.130 0.338 0.537 0.951 1.001 4000
Productdiff[8] -0.223 0.299 -0.807 -0.430 -0.218 -0.021 0.349 1.001 4000
Productdiff[9] -1.965 0.319 -2.587 -2.183 -1.966 -1.757 -1.322 1.001 4000
Productdiff[10] -0.497 0.294 -1.068 -0.699 -0.495 -0.300 0.082 1.002 1600
Productdiff[11] 0.740 0.317 0.130 0.525 0.733 0.952 1.374 1.001 4000
Productdiff[12] 0.177 0.300 -0.412 -0.019 0.178 0.382 0.753 1.001 4000
Productdiff[13] -1.564 0.322 -2.186 -1.776 -1.570 -1.349 -0.913 1.001 3300
Productdiff[14] -0.097 0.300 -0.686 -0.300 -0.090 0.102 0.505 1.002 2100
Productdiff[15] 0.401 0.300 -0.180 0.197 0.405 0.597 0.997 1.001 4000
gsd 1.689 0.067 1.563 1.644 1.687 1.734 1.821 1.002 1700
meanPanelist[1] 6.667 0.492 5.680 6.334 6.659 7.002 7.601 1.001 3100
meanPanelist[2] 5.513 0.436 4.660 5.219 5.512 5.808 6.369 1.001 4000
meanPanelist[3] 6.325 0.443 5.439 6.031 6.331 6.624 7.171 1.001 4000
meanPanelist[4] 7.394 0.443 6.542 7.094 7.400 7.703 8.262 1.002 1800
meanPanelist[5] 7.104 0.445 6.227 6.807 7.104 7.405 7.982 1.001 4000
meanPanelist[6] 6.201 0.436 5.328 5.912 6.208 6.497 7.034 1.001 4000
meanPanelist[7] 6.386 0.444 5.521 6.078 6.384 6.688 7.248 1.003 1100
meanPanelist[8] 6.451 0.443 5.562 6.160 6.452 6.760 7.301 1.001 4000
meanPanelist[9] 5.184 0.441 4.334 4.882 5.184 5.480 6.050 1.002 2300
meanPanelist[10] 8.110 0.460 7.212 7.815 8.107 8.411 9.020 1.001 3700
meanPanelist[11] 5.122 0.448 4.227 4.811 5.121 5.433 5.989 1.002 2100
meanPanelist[12] 7.191 0.441 6.327 6.891 7.181 7.492 8.061 1.001 2900
meanPanelist[13] 6.719 0.438 5.893 6.421 6.720 7.013 7.563 1.001 4000
meanPanelist[14] 6.266 0.440 5.396 5.978 6.266 6.553 7.138 1.002 1700
meanPanelist[15] 6.859 0.434 6.002 6.571 6.867 7.148 7.716 1.002 2200
meanPanelist[16] 6.079 0.434 5.238 5.783 6.082 6.371 6.930 1.002 2100
meanPanelist[17] 6.054 0.433 5.213 5.769 6.051 6.345 6.893 1.002 1800
meanPanelist[18] 6.122 0.420 5.281 5.844 6.129 6.398 6.939 1.002 2200
meanPanelist[19] 6.185 0.438 5.324 5.888 6.182 6.479 7.064 1.002 1500
meanPanelist[20] 4.988 0.456 4.088 4.680 4.997 5.292 5.873 1.001 4000
meanPanelist[21] 4.994 0.455 4.064 4.689 4.996 5.305 5.884 1.001 3400
meanPanelist[22] 5.062 0.451 4.177 4.763 5.065 5.370 5.944 1.001 2600
meanPanelist[23] 7.171 0.453 6.296 6.855 7.172 7.480 8.057 1.001 4000
meanPanelist[24] 5.663 0.442 4.799 5.368 5.671 5.959 6.532 1.002 2400
meanPanelist[25] 7.514 0.455 6.627 7.210 7.518 7.815 8.425 1.001 4000
meanPanelist[26] 6.320 0.439 5.432 6.023 6.325 6.624 7.156 1.001 2900
meanPanelist[27] 6.853 0.431 6.013 6.562 6.843 7.144 7.705 1.001 4000
meanPanelist[28] 7.045 0.440 6.197 6.741 7.043 7.340 7.926 1.001 4000
meanPanelist[29] 4.789 0.456 3.915 4.472 4.789 5.096 5.687 1.001 4000
meanProduct[1] 7.084 0.223 6.653 6.937 7.081 7.230 7.539 1.001 4000
meanProduct[2] 6.522 0.215 6.097 6.375 6.524 6.669 6.928 1.001 4000
meanProduct[3] 4.781 0.227 4.332 4.626 4.778 4.927 5.227 1.001 4000
meanProduct[4] 6.248 0.213 5.838 6.101 6.248 6.394 6.660 1.002 1500
meanProduct[5] 6.745 0.217 6.317 6.605 6.748 6.885 7.171 1.001 4000
meanProduct[6] 6.344 0.221 5.910 6.199 6.347 6.494 6.772 1.001 4000
sdPP 0.140 0.123 0.024 0.050 0.096 0.193 0.461 1.010 290
sdPanelist 0.996 0.178 0.700 0.869 0.978 1.106 1.393 1.001 4000
sdProduct 0.996 0.536 0.426 0.670 0.864 1.164 2.293 1.001 4000
For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
Variables gsd and sdPanelist might be used to examine panel performance, but to examine this better, they should be compared with results from other descriptors.
plot(jagsfit)
Details about products
A main question if obviously, which products are different? For this we can extract some data from a summaryjagsfit.mc <- as.mcmc(jagsfit)
# plot(jagsfit.mc) # this plot give too many figures for the blog
fitsummary <- summary(jagsfit.mc)
# extract differences
Productdiff <- fitsummary$quantiles[ grep('Productdiff',rownames(fitsummary$quantiles)),]
# extract differences different from 0
data_list$Productcontr[Productdiff[,1]>0 | Productdiff[,5]<0,]
# get the product means
ProductMean <- fitsummary$statistics[ grep('meanProduct',rownames(fitsummary$quantiles)),]
rownames(ProductMean) <- levels(sensochoc$Product)
ProductMean
The result shows us a table of product pairs which are different; most of these are related to product 3, but also product 1 is different from 4 and 6.
> jagsfit.mc <- as.mcmc(jagsfit)
> # plot(jagsfit.mc)
> fitsummary <- summary(jagsfit.mc)
> # extract differences
> Productdiff <- fitsummary$quantiles[ grep('Productdiff',rownames(fitsummary$quantiles)),]
> # extract differences different from 0
> data_list$Productcontr[Productdiff[,1]>0 | Productdiff[,5]<0,]
v1 v2
2 1 3
3 2 3
4 1 4
6 3 4
9 3 5
11 1 6
13 3 6
> # get the product means
> ProductMean <- fitsummary$statistics[ grep('meanProduct',rownames(fitsummary$quantiles)),]
> rownames(ProductMean) <- levels(sensochoc$Product)
> ProductMean
Mean SD Naive SE Time-series SE
choc1 7.083966 0.2225106 0.003518201 0.003233152
choc2 6.521746 0.2150476 0.003400201 0.003988800
choc3 4.780643 0.2272578 0.003593261 0.003553964
choc4 6.247723 0.2128971 0.003366198 0.003382249
choc5 6.745146 0.2165525 0.003423996 0.003230432
choc6 6.344260 0.2206372 0.003488580 0.003662731
Comparison with ANOVA
A few lines in R will give the standard analysis. The product means are very close. This ANOVA shows only differences involving product 3. This is probably due to usage of TukeyHSD, which can be a bit conservative in the ANOVA while the comparison in the Bayesian model is unprotected.
> library(car)
> aa <- aov(CocoaA ~ Product * Panelist,data=sensochoc)
> Anova(aa)
Anova Table (Type II tests)
Response: CocoaA
Sum Sq Df F value Pr(>F)
Product 207.54 5 12.6045 1.876e-10 ***
Panelist 390.43 28 4.2343 1.670e-09 ***
Product:Panelist 322.29 140 0.6991 0.9861
Residuals 573.00 174
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> TukeyHSD(aa,'Product')
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = CocoaA ~ Product * Panelist, data = sensochoc)
$Product
diff lwr upr p adj
choc2-choc1 -0.5344828 -1.5055716 0.4366061 0.6088175
choc3-choc1 -2.4137931 -3.3848819 -1.4427043 0.0000000
choc4-choc1 -0.8275862 -1.7986750 0.1435026 0.1433035
choc5-choc1 -0.2931034 -1.2641923 0.6779854 0.9531915
choc6-choc1 -0.7241379 -1.6952268 0.2469509 0.2674547
choc3-choc2 -1.8793103 -2.8503992 -0.9082215 0.0000014
choc4-choc2 -0.2931034 -1.2641923 0.6779854 0.9531915
choc5-choc2 0.2413793 -0.7297095 1.2124682 0.9797619
choc6-choc2 -0.1896552 -1.1607440 0.7814337 0.9932446
choc4-choc3 1.5862069 0.6151181 2.5572957 0.0000742
choc5-choc3 2.1206897 1.1496008 3.0917785 0.0000000
choc6-choc3 1.6896552 0.7185663 2.6607440 0.0000192
choc5-choc4 0.5344828 -0.4366061 1.5055716 0.6088175
choc6-choc4 0.1034483 -0.8676406 1.0745371 0.9996304
choc6-choc5 -0.4310345 -1.4021233 0.5400544 0.7960685
> model.tables(aa,type='mean',cterms='Product')
Tables of means
Grand mean
6.287356
Product
Product
choc1 choc2 choc3 choc4 choc5 choc6
7.086 6.552 4.672 6.259 6.793 6.362
Conclusion
JAGS can be used to analyzed sensory profiling data. However, if a simple model such as two way ANOVA is used, it does not seem to be worth the trouble.
No comments:
Post a Comment