## Saturday, March 30, 2013

### More ordinal data display

The past two weeks I made a post regarding analyzing ordinal data with R and JAGS. The calculations in the second part made me realize I could actually get top two box intervals out of R. This demonstrated here. For that I needed the inverse logistic transformation. This I pulled out the ARM package, and I now realize the latter also contains a function to analyse ordinal data, which is the second part of this post.

### Data

The data is as before the cheese data. I will pick up at calculating the model and printing the model summary. The subsequent calculation is actually fairly simple.
The first parameters in the summary refer to the the difference between cheese A and cheeses B, C and D. The second part are the thresholds between categories for cheese A. So, when I take the threshold of category 7|8 for cheese A (1.54) and add the differences, I get the 7|8 thresholds for the other three cheeses. Parameter values and their variances can be readily obtained. Associated standard deviations follow from the vcov matrix. Take the inverse logit and the desired result is there. It is almost too simple.
Res.clm <- clm(FResponse ~Cheese,data=cheese2)
summary(Res.clm)
formula: FResponse ~ Cheese
data:    cheese2
logit flexible  208  -355.67 733.35 6(0)  3.14e-11 8.8e+01

Coefficients:
Estimate Std. Error z value Pr(>|z|)
CheeseB  -3.3518     0.4287  -7.819 5.34e-15 ***
CheeseC  -1.7099     0.3715  -4.603 4.16e-06 ***
CheeseD   1.6128     0.3805   4.238 2.25e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Threshold coefficients:
Estimate Std. Error z value
1|2 -5.46738    0.52363 -10.441
2|3 -4.41219    0.42776 -10.315
3|4 -3.31262    0.37004  -8.952
4|5 -2.24401    0.32674  -6.868
5|6 -0.90776    0.28335  -3.204
6|7  0.04425    0.26457   0.167
7|8  1.54592    0.30168   5.124
8|9  3.10577    0.40573   7.655
co <- coef(Res.clm)[c('7|8','CheeseB','CheeseC','CheeseD')]
vc <- vcov(Res.clm)[c('7|8','CheeseB','CheeseC','CheeseD'),
c('7|8','CheeseB','CheeseC','CheeseD')]
names(co) <- levels(cheese2\$Cheese)
sd <- sqrt(c(vc[1,1],diag(vc)[-1]+vc[1,1]-2*vc[1,-1]))
data.frame(
`top 2 box`=arm::invlogit(c(-co[1],-co[1]+co[-1])),
`2.5% limit`=arm::invlogit(c(-co[1],-co[1]+co[-1])+qnorm(.025)*sd),
`97.5% limit`=arm::invlogit(c(-co[1],-co[1]+co[-1])+qnorm(.975)*sd),
check.names=FALSE
)
top 2 box    2.5% limit    97.5% limit
A 0.175676950   0.105533878     0.27795336
B 0.007407959   0.003235069     0.01687231
C 0.037118760   0.018617885     0.07264338
D 0.516712572   0.384663489     0.64646825

### ARM

ARM (Data Analysis Using Regression and Multilevel/Hierarchical Models) is the package associated with the similar titled book (Gelman & Hill) which is certainly value for money. Having said that, I did not remember this being in the book. The package also contains a simulation function, which I used to get some posterior results for further processing.
library(arm)
Res.arm <- bayespolr(FResponse ~Cheese,data=cheese2)
Res.arm
bayespolr(formula = FResponse ~ Cheese, data = cheese2)
coef.est coef.se
CheeseB -3.25     0.42
CheeseC -1.63     0.37
CheeseD  1.59     0.38
1|2     -5.36     0.52
2|3     -4.32     0.42
3|4     -3.23     0.36
4|5     -2.18     0.32
5|6     -0.86     0.28
6|7      0.07     0.26
7|8      1.55     0.30
8|9      3.09     0.40
---
n = 208, k = 11 (including 8 intercepts)
residual deviance = 727.1, null deviance is not computed by polr
sims <- sim(Res.arm,n.sims=1000)
cosims <- coef(sims)
mycoef <- cbind(-cosims[,'7|8'],
-cosims[,'7|8']+cosims[,grep('Cheese',colnames(cosims),value=TRUE)])
mycoef <- invlogit(mycoef)
coefplot(rev(apply(mycoef,2,mean)),rev(apply(mycoef,2,sd)),
varnames=rev(levels(cheese2\$Cheese)),
main='Estimated Proportion Top 2 Box')

## Monday, March 25, 2013

### Ordinal data with JAGS

Last week is had a look at the standard R routines for estimating models for ordinal data. This week, I want to have a look at JAGS for examining the same data. To be honest, most of it is taking an example (inhaler) and removing code. To my surprise this was almost as difficult as adding complexity in models. I did add some output from JAGS. The data, as before, is the cheese data.

### Data preparation and model

The data is same as last week. Hence I will pick up at a created dataframe cheese2. One new item is added, a dataframe difp which is used to index differences between cheese means. The model is a simplified version of inhaler. Note that loads of pre processing of data is removed from JAGS example and placed into R. On the other hand, to minimize the amount of output, some post processing is added, hence not done in R. I guess that's a matter of taste.
Other than that is a fairly straightforward JAGS run with the usual preparation steps.
ordmodel <- function() {
for (i in 1:N) {
for (j in 1:Ncut) {
# Cumulative probability of worse response than j
logit(QQ[i,j]) <- -(a[j] - mu[cheese[i]] )
}
p[i,1] <- 1 - QQ[i,1];
for (j in 2:Ncut) {
p[i,j] <- QQ[i,j-1] - QQ[i,j]
}
p[i,(Ncut+1)] <- QQ[i,Ncut]
response[i] ~ dcat(p[i,1:(Ncut+1)])
}
# Fixed effects
for (g in 2:G) {
# logistic mean for group
mu[g] ~ dnorm(0, 1.0E-06)
}
mu[1] <- 0
# ordered cut points for underlying continuous latent variable
a <- sort(a0)
for(i in 1:(Ncut)) {
a0[i] ~ dnorm(0, 1.0E-6)
}
#top 2 box
for (g in 1:G) {
logit(ttb[g]) <- -(a[Ncut-1] - mu[g] )
}
# Difference between categories
for (i in 1:ndifp) {
dif[i] <- mu[difpa[i]]-mu[difpb[i]]
}
}

difp <- expand.grid(a=1:4,b=1:4)
difp <- difp[difp\$a>difp\$b,]

datain <- list(response=cheese2\$Response, N=nrow(cheese2),
cheese=c(1:4)[cheese2\$Cheese],G=nlevels(cheese2\$Cheese),
Ncut=nlevels(cheese2\$FResponse)-1,difpa=difp\$a,difpb=difp\$b,ndifp =nrow(difp))
params <- c('mu','ttb','dif')
inits <- function() {
list(mu = c(NA,rnorm(3,0,1)),
a0= rnorm(8,0,1))
}

jagsfit <- jags(datain, model=ordmodel, inits=inits, parameters=params,
progress.bar="gui",n.iter=10000)

### Results

The results, happily, are similar to those obtained via polr and clm. All results seem well within variation due to sampling rater than deterministic calculation. MCMCoprobit had completely different results. In hindsight I think those were originally incorrectly estimated and I corrected the post processing in last week's post. Note that MCMCoprobit uses probit rather than logit, the practical difference is minimal, much less than variation in parameters.

Legend for output:
dif: differences between cheese parameters.
mu: mean cheese values as used within the model
ttb: top two box
jagsfit
Inference for Bugs model at "C:/Users/.../Local/Temp/RtmpwPgnSx/model4d41cdc653a.txt", fit using jags,
3 chains, each with 10000 iterations (first 5000 discarded), n.thin = 5
n.sims = 3000 iterations saved
mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
dif[1]    -3.408   0.420  -4.254  -3.691  -3.404  -3.111  -2.606 1.001  2500
dif[2]    -1.729   0.368  -2.444  -1.978  -1.733  -1.478  -1.003 1.001  2300
dif[3]     1.669   0.387   0.943   1.411   1.658   1.920   2.453 1.002  1300
dif[4]     1.679   0.381   0.962   1.420   1.663   1.952   2.420 1.001  3000
dif[5]     5.077   0.481   4.164   4.740   5.063   5.405   6.008 1.001  3000
dif[6]     3.398   0.423   2.561   3.115   3.401   3.669   4.235 1.001  3000
mu[1]      0.000   0.000   0.000   0.000   0.000   0.000   0.000 1.000     1
mu[2]     -3.408   0.420  -4.254  -3.691  -3.404  -3.111  -2.606 1.001  2500
mu[3]     -1.729   0.368  -2.444  -1.978  -1.733  -1.478  -1.003 1.001  2300
mu[4]      1.669   0.387   0.943   1.411   1.658   1.920   2.453 1.002  1300
ttb[1]     0.173   0.043   0.100   0.143   0.170   0.199   0.271 1.003   800
ttb[2]     0.007   0.003   0.003   0.005   0.007   0.009   0.015 1.002  1700
ttb[3]     0.037   0.013   0.017   0.028   0.035   0.044   0.066 1.002  1600
ttb[4]     0.519   0.068   0.386   0.472   0.521   0.565   0.650 1.002  1400
deviance 722.375   4.703 715.176 718.981 721.657 725.071 733.254 1.001  3000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 11.1 and DIC = 733.4
DIC is an estimate of expected predictive error (lower deviance is better).

## Sunday, March 17, 2013

### Ordinal Data

I expect to be getting some ordinal data, from 5 or 9 point rating scales, pretty soon, so I am having a look ahead how to treat those. Often ANOVA is used, even though it is well known not to be ideal fro a statistical point of view, so that is the starting point. Next stops are polr (MASS), clm (ordinal) and MCMCoprobit (MCMCpack). Vglm (VGAM) is skipped. The viewpoint I am using is as somebody who needs to deliver summary results to a project manager or program manager, fully knowing that sales and/or marketing may be borrowing slides too. Keep It Simple, Stupid. The data used for now is the cheese data, (McCullagh & Nelder), but I intend to look for more complex/real data in a future post.

### Data

Data are responses to 4 cheeses on 9 response categories. 1=strong dislike, 9=excellent taste. To be honest, I had forgotten about those data, and ran into them https://onlinecourses.science.psu.edu/stat504/node/177. For compactness (of R script) the data is imported a bit different. I also reformatted to one observation per row. Then a plot so we can see what the data looks like.
cheese <- data.frame(Cheese=rep(LETTERS[1:4],each=9),
Response=rep(1:9,4),N=as.integer(
c(0,0,1,7,8,8,19,8,1,6,9,12,
11,7,6,1,0,0,1,1,6,8,23,7,5
,1,0,0,0,0,1,3,7,14,16,11)))
cheese\$FResponse <- ordered(cheese\$Response)
#one record(row) for each observation
cheese2  <- cheese[cheese\$N>0,]
cheese2 <- lapply(1:nrow(cheese2),function(x)
do.call(rbind,
replicate(cheese2\$N[x],cheese2[x,c(1:2,4)],
simplify=FALSE)
)
)
cheese2 <- do.call(rbind,cheese2)
mosaicplot(
xtabs( ~Cheese + factor(Response,levels=9:1) ,data=cheese2),
color=colorRampPalette(c('green','red'))(9),
main='',
ylab='Response')

### ANOVA

As explained, first approach is ANOVA. From statistical point of view not ideal, ignoring the fact that these are nine categories. However, when explaining what you did, it saves a bit. I would not trust a sales person to understand ANOVA, but then the other side of the table is used to it, so they won't ask.  Anyway, below a number of variations on the result.
library(multcomp)
Res.aov <- aov( Response ~ Cheese, data=cheese2)
anova(Res.aov)
Analysis of Variance Table

Response: Response
Df Sum Sq Mean Sq F value    Pr(>F)
Cheese      3 450.48 150.160  76.296 < 2.2e-16 ***
Residuals 204 401.50   1.968
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(Res.aov)
Tukey multiple comparisons of means
95% family-wise confidence level

Fit: aov(formula = Response ~ Cheese, data = cheese2)

\$Cheese
B-A -2.750000 -3.4626832 -2.0373168 0.0000000
C-A -1.384615 -2.0972986 -0.6719321 0.0000063
D-A  1.173077  0.4603937  1.8857602 0.0001793
C-B  1.365385  0.6527014  2.0780679 0.0000087
D-B  3.923077  3.2103937  4.6357602 0.0000000
D-C  2.557692  1.8450091  3.2703755 0.0000000
par(mar=c(5,4,6,2)+.1)
plot(cld(glht(Res.aov,linfct=mcp(Cheese='Tukey'))))
par(mar=c(5,4,4,2)+.1)
mt <- model.tables(Res.aov,type='means',cterms='Cheese')
data.frame(dimnames(mt\$tables\$Cheese)[1],
Response=as.numeric(mt\$tables\$Cheese),
LSD=cld(glht(Res.aov,linfct=mcp(Cheese='Tukey'))
)\$mcletters\$monospacedLetters
)
Cheese Response  LSD
A      A 6.250000   c
B      B 3.500000 a
C      C 4.865385  b
D      D 7.423077    d

### polr

The first statistically correct way to look at the data is polr from the MASS package. Unfortunately I dislike the output a bit. For instance, base level (cheese A) is ignored in some output. I am not so sure how to interpret the difference between cheese A and cheese B as -3 except for the observation that it is significant and cheese A is better. A thing not obvious documented is the possibility to get proportions for the categories as predictions. So, I can learn that the model has for cheese A 17% of the people in the top 2 box. I am sure top 2 box is a parameter I won't have to explain to my product manager. No confidence interval for it though.
library(MASS)
Res.polr <- polr( FResponse ~ Cheese, data=cheese2 )
summary(Res.polr)
Call:
polr(formula = FResponse ~ Cheese, data = cheese2)

Coefficients:
Value Std. Error t value
CheeseB -3.352     0.4287  -7.819
CheeseC -1.710     0.3715  -4.603
CheeseD  1.613     0.3805   4.238

Intercepts:
Value    Std. Error t value
1|2  -5.4674   0.5236   -10.4413
2|3  -4.4122   0.4278   -10.3148
3|4  -3.3126   0.3700    -8.9522
4|5  -2.2440   0.3267    -6.8680
5|6  -0.9078   0.2833    -3.2037
6|7   0.0443   0.2646     0.1673
7|8   1.5459   0.3017     5.1244
8|9   3.1058   0.4057     7.6547

Residual Deviance: 711.3479
AIC: 733.3479
confint(Res.polr)
2.5 %     97.5 %
CheeseB -4.2134700 -2.5304266
CheeseC -2.4508895 -0.9919797
CheeseD  0.8794083  2.3746894
confint(glht(Res.polr,linfct=mcp(Cheese='Tukey')))
Simultaneous Confidence Intervals

Multiple Comparisons of Means: Tukey Contrasts

Fit: polr(formula = FResponse ~ Cheese, data = cheese2)

Quantile = 2.5617
95% family-wise confidence level

Linear Hypotheses:
Estimate lwr     upr
B - A == 0 -3.3518  -4.4500 -2.2537
C - A == 0 -1.7099  -2.6615 -0.7583
D - A == 0  1.6128   0.6379  2.5876
C - B == 0  1.6419   0.6806  2.6033
D - B == 0  4.9646   3.7434  6.1858
D - C == 0  3.3227   2.2421  4.4033
topred <- data.frame(Cheese=levels(cheese2\$Cheese))
predict(Res.polr,topred,type='prob')
1           2          3          4         5          6
1 0.0042045373 0.007778488 0.02315719 0.06072687 0.1915897 0.22360710
2 0.1075966845 0.149644074 0.25256118 0.24192345 0.1684022 0.04745525
3 0.0228101657 0.040027267 0.10476248 0.20195874 0.3208724 0.16204645
4 0.0008409326 0.001570815 0.00479563 0.01349081 0.0537320 0.09799769
7           8           9
1 0.31326180 0.132805278 0.042868991
2 0.02500937 0.005841786 0.001566035
3 0.11040482 0.029081216 0.008036485
4 0.31086686 0.333235095 0.183470165
topred\$Top2Box.polr <- rowSums(predict(Res.polr,topred,type='prob')[,8:9])
topred
Cheese Top2Box.polr
1      A  0.175674269
2      B  0.007407821
3      C  0.037117701
4      D  0.516705260

### clm

Clm is from the ordinal package. As this package is dedicated to ordinal data it is clearly a bit more advanced than polr. The package has the possibility to use mixed models and multiplicative scale effects. These are things I won't use now, but would like to use or look at once I have panelist data. For now clm function is enough. I am now able to make an overall statement about product differences (polr did not calculate the equivalent of model Res.clm0), although I might use 3 degrees of freedom rather than 1. There is also a confidence interval for the proportion subjects selecting a category.
library(ordinal)
Res.clm <- clm(FResponse ~Cheese,data=cheese2)
summary(Res.clm)
formula: FResponse ~ Cheese
data:    cheese2

logit flexible  208  -355.67 733.35 6(0)  3.14e-11 8.8e+01

Coefficients:
Estimate Std. Error z value Pr(>|z|)
CheeseB  -3.3518     0.4287  -7.819 5.34e-15 ***
CheeseC  -1.7099     0.3715  -4.603 4.16e-06 ***
CheeseD   1.6128     0.3805   4.238 2.25e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Threshold coefficients:
Estimate Std. Error z value
1|2 -5.46738    0.52363 -10.441
2|3 -4.41219    0.42776 -10.315
3|4 -3.31262    0.37004  -8.952
4|5 -2.24401    0.32674  -6.868
5|6 -0.90776    0.28335  -3.204
6|7  0.04425    0.26457   0.167
7|8  1.54592    0.30168   5.124
8|9  3.10577    0.40573   7.655
confint(Res.clm)
2.5 %     97.5 %
CheeseB -4.2134421 -2.5304102
CheeseC -2.4508720 -0.9919723
CheeseD  0.8793992  2.3746663
Res.clm0 <- clm(FResponse ~ .,data=cheese2)
anova(Res.clm0,Res.clm)
Likelihood ratio tests of cumulative link models:

Res.clm  FResponse ~ Cheese logit flexible
Res.clm0 FResponse ~ .      logit flexible

no.par    AIC      logLik LR.stat df Pr(>Chisq)
Res.clm      11 733.35 -3.5567e+02
Res.clm0     12  24.00 -7.8504e-06  711.35  1  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
predict(Res.clm,topred)
\$fit
1           2           3          4         5          6
1 0.0042045475 0.007778711 0.023157362 0.06072674 0.1915909 0.22360309
2 0.1075968958 0.149647575 0.252560377 0.24192109 0.1684021 0.04745442
3 0.0228099720 0.040027964 0.104762084 0.20195678 0.3208734 0.16204462
4 0.0008409256 0.001570844 0.004795616 0.01349065 0.0537319 0.09799495
7           8           9
1 0.31326168 0.132806881 0.042870069
2 0.02500956 0.005841882 0.001566077
3 0.11040645 0.029081977 0.008036783
4 0.31086254 0.333236858 0.183475714
rowSums(predict(Res.clm,topred)\$fit[,8:9])
1           2           3           4
0.175676950 0.007407959 0.037118760 0.516712572
1-predict(Res.clm,topred,type='cum.prob')\$cprob2[,8]
1           2           3           4
0.175676950 0.007407959 0.037118760 0.516712572
topred2 <- data.frame(Cheese=levels(cheese2\$Cheese),
FResponse=ordered(8,levels=1:9))
predict(Res.clm,topred2,type='prob',interval=TRUE)
\$fit
[1] 0.132806881 0.005841882 0.029081977 0.333236858
\$lwr
[1] 0.078727094 0.002502026 0.014310133 0.230268677
\$upr
[1] 0.21535183 0.01357929 0.05820190 0.45502986

### VGAM

I like the VGAM package. It shows great promise for many situations where a vector of observations is measured. This model is but a simple thing for it. On the face of it however, ordinal seems to have extensions which are more appropriate for my current problem. Also, in the help it says 'This package is undergoing continual development and improvement. Until my monograph comes out and this package is released as version 1.0-0 the user should treat everything subject to change'. The data needs to be entered a bit differently and the script is commented, in case anybody wants to know how it might work.
#library(VGAM)
#nobs <- nrow(cheese2)/4
#resD <- resC <- resB <- resA <- matrix(0,9,nobs)
#resA[cheese2\$Response[cheese2\$Cheese=='A']+(0:(nobs-1))*9] <- 1
#resB[cheese2\$Response[cheese2\$Cheese=='B']+(0:(nobs-1))*9] <- 1
#resC[cheese2\$Response[cheese2\$Cheese=='C']+(0:(nobs-1))*9] <- 1
#resD[cheese2\$Response[cheese2\$Cheese=='D']+(0:(nobs-1))*9] <- 1
#cheesevglm <- as.data.frame(t(cbind(resA,resB,resC,resD)))
#cheesevglm\$Cheese <- factor(rep(levels(cheese2\$Cheese),each=nobs))
#
#Res.vglm <- vglm(cbind(V1,V2,V3,V4,V5,V6,V7,V8,V9) ~ Cheese,cumulative,data=cheesevglm)
#summary(Res.vglm)
#predict(Res.vglm,topred,type='response')

### MCMCoprobit

MCMCpack has Bayesian roots. It has many functions, ordinal data is but one of them. It does not rely on JAGS/Winbugs/Openbugs.  A difference between MCMCoprobit and the previous functions is the use of probit rather than logit as the link function. I don't think this should stop me from using it.The big disadvantage is that you get raw samples, so additional work is needed to get presentable results. Luckily this allows the calculation of proportions top 2 box I have been yapping about. A new question is how to interpret the big interval for cheese C.
library(MCMCpack)
res.MCMCpack <- MCMCoprobit(Response ~ Cheese,data=cheese2)
summary(res.MCMCpack)
Iterations = 1001:11000
Thinning interval = 1
Number of chains = 1
Sample size per chain = 10000

1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:

Mean      SD  Naive SE Time-series SE
(Intercept)  3.2423 0.16115 0.0016115       0.016761
CheeseB     -1.9334 0.22376 0.0022376       0.019444
CheeseC     -0.9782 0.20786 0.0020786       0.008635
CheeseD      0.9382 0.20754 0.0020754       0.005334
gamma2       0.6401 0.13234 0.0013234       0.064949
gamma3       1.3265 0.14265 0.0014265       0.072361
gamma4       1.9697 0.10030 0.0010030       0.036723
gamma5       2.7635 0.09922 0.0009922       0.035434
gamma6       3.3180 0.11913 0.0011913       0.051066
gamma7       4.1606 0.12626 0.0012626       0.057679
gamma8       4.9981 0.11750 0.0011750       0.049354

2. Quantiles for each variable:

2.5%     25%     50%     75%   97.5%
(Intercept)  2.9288  3.1358  3.2430  3.3497  3.5576
CheeseB     -2.3606 -2.0869 -1.9372 -1.7850 -1.4926
CheeseC     -1.3895 -1.1200 -0.9763 -0.8394 -0.5730
CheeseD      0.5378  0.7979  0.9380  1.0760  1.3434
gamma2       0.4428  0.5227  0.6221  0.7498  0.8878
gamma3       1.1034  1.2156  1.3013  1.4400  1.6292
gamma4       1.7679  1.9017  1.9779  2.0465  2.1424
gamma5       2.5821  2.6867  2.7606  2.8439  2.9328
gamma6       3.1207  3.2393  3.2912  3.3880  3.5808
gamma7       3.8694  4.0840  4.1776  4.2596  4.3487
gamma8       4.8306  4.9177  4.9788  5.0353  5.2884

### Correction

The original analysis here was incorrect. It is now corrected. It now shows cheese D as most liked, rather than least liked as before. Note that it was obvious wrong, the first figure in this posting (in green/red) had cheese D as most liked with approximately 50% of the rates in top two box. My apologies for this error.
top2box <- data.frame(probability = c(
pnorm(res.MCMCpack[,'gamma7']-(res.MCMCpack[,1]),lower.tail=FALSE),
pnorm(res.MCMCpack[,'gamma7']-(res.MCMCpack[,1]+res.MCMCpack[,'CheeseB']),lower.tail=FALSE),
pnorm(res.MCMCpack[,'gamma7']-(res.MCMCpack[,1]+res.MCMCpack[,'CheeseC']),lower.tail=FALSE),
pnorm(res.MCMCpack[,'gamma7']-(res.MCMCpack[,1]+res.MCMCpack[,'CheeseD']),lower.tail=FALSE)),
Cheese=rep(levels(cheese2\$Cheese),each=nrow(res.MCMCpack)))
densityplot(~ probability | Cheese,data=top2box,xlab='Proportion Top 2 Box',
panel= function(x, ...) {
panel.densityplot(x,plot.points='rug' , ...)
} )

## Sunday, March 10, 2013

### More sequential testing for triangle tests

I looked before at triangle tests and at sequential testing in triangle tests (blog entry). In the latter post it was demonstrated that a sequential test is possible, without costs in desired error of the first kind. The latter because the desired alpha (e.g. 5%) is never exactly obtained in any test based on the binomial distribution. So, e.g. alpha=3% is obtained, leaving 2% to spare. This new post demonstrates a similar intermediate decision is possible with regards to H1 and ends with a demonstration simulation combining all.

### Triangle tests

As mentioned before, a triangle test is a simple frequently used sensory test. The aim is to determine if two products are different. Subjects get three samples, two of one kind, one of the other kind. The subjects need to determine which of the three samples is the odd one. If a sufficient number of tests have the correct answer the products are different.

#### True discriminators

One additional concept is used, that is the concept of true discriminators. This is the idea that there are two kinds of subjects. People who recognize the odd product, and people who make a random guess. The former always select the odd sample, the latter one third of the time. It is easy to go from true discriminators to correct choices. True discriminators is just what sensory specialists like to use.

library(ggplot2)
True2Correct <- function(x) 1/3 + x*2/3
Correct2True <- function(x) x-(1-x)/2

### Power for 35 subjects

I chose to use 35 subjects as this is quite normal. It gives an alpha of 4.4% and with 33% true discriminators a power of 84%. The plot shows that 35 is reasonable for getting 30% to 50% true discriminators, which is more or less the region of interest.
nObs <-35
(limit <- qbinom(.95,nObs,1/3))
[1] 161-pbinom(limit,nObs,1/3) # alpha
[1] 0.04419916#plot power
qplot(y=1-pbinom(limit,nObs,True2Correct(seq(0,.6,length.out=30))),
x=seq(0,.6,length.out=30),geom='path',
xlab='Proportion True Discriminators',
ylab='Power') +
geom_vline(aes(xintercept = PropH1TDisc,colour='blue'))
ggsave('een.png')

# choose H1, hence power
PropH1TDisc <- 1/3
1-pbinom(limit,nObs,True2Correct(PropH1TDisc))
[1] 0.8417316

### Adding a hypothesis test after 16 triangles

First the expected distributions under H0 and H1 are plotted. The plot shows us that after 16 triangle tests, there are outcomes which will almost only occur under H0, and outcomes which will almost only occur under H1.
Subsequently a routine is made which gives the chance to go over the limit (16) for the full test, given a certain number correct after 16 triangles. For example, if the first 16 have 0 correct, this is the probability of getting 0 in the first 16, times the probability of getting more than the remaining correct (16-0) to get to the  limit in the second set of tests. Add the values for 0 and 1, and we know the chance to get over the limit in the first test, given 0 or 1 correct in the first observations. So, the cumulative of these values provides the costs in power in case a certain cut-off is used after 16 triangles to decide H0 won't be rejected (the products are not different).
It appears that up to 5 correct these costs are low. The remaining power is 83%.
nObs1 <- 16
# expected dsitributions
qplot(y=dbinom(0:nObs1,nObs1,True2Correct(PropH1TDisc)),
x=0:nObs1,colour='H1, 33% Discriminators',
xlab='Number of correct',
ylab='Probability')  +
geom_point(aes(y=dbinom(0:nObs1,nObs1,1/3),
x=0:nObs1,colour='H0, No Discriminators')) +
labs(colour = "Distribution") +
theme(legend.position='bottom')
ggsave('twee.png')

Powercost <- function(nstep1,nObs1,limit,nObs,PropH1TDisc) {
nObs2 <- nObs-nObs1
Correct2Limit <- limit-nstep1
ProbCorrect <- True2Correct(PropH1TDisc)
ifelse (Correct2Limit < 0,0,
dbinom(nstep1,nObs1,ProbCorrect) *
pbinom(Correct2Limit,nObs2,ProbCorrect,lower.tail=FALSE)
)
}

PowerExpect <- data.frame(
nCorrect1=0:nObs1,
pH1=pbinom(0:nObs1,nObs1,True2Correct(PropH1TDisc)),
Powercost=cumsum(Powercost(0:nObs1,nObs1,limit,nObs,PropH1TDisc))
)

PowerExpect\$Powerremaining <- 1-pbinom(limit,nObs,True2Correct(PropH1TDisc))-PowerExpect\$Powercost
PowerExpect[1:9,]

nCorrect1          pH1    Powercost Powerremaining
1         0 2.317820e-06 4.111782e-09      0.8417316
2         1 4.867422e-05 4.110800e-07      0.8417312
3         2 4.832655e-04 1.396838e-05      0.8417176
4         3 3.018381e-03 2.294403e-04      0.8415021
5         4 1.331729e-02 2.139068e-03      0.8395925
6         5 4.421401e-02 1.247786e-02      0.8292537
7         6 1.150190e-01 4.884815e-02      0.7928834
8         7 2.414565e-01 1.359299e-01      0.7058016
9         8 4.192592e-01 2.832904e-01      0.5584411

### A simulation

Validation of the calculation comes from a simulation. Two functions are used. First, a function simulating number correct triangle tests for a given number of rue discriminators. The second function calls for the triangles to be executed in two phases, and gives the verdict. Either a decision after 16 triangles, or a decision after a full run. The decision after 16 is H0 not reject when less than 6 correct are observed, H0 reject, (the limit of 9 is from the previous sequential testing post) or the remainder of the data is used. Finally, this function is called hundred thousand times under both distributions and the results are tabulated.
It appears that almost half of the time a decision can be made after 16 tests. Chance of error of the first kind approximately 5%. Power is approximately 83%
rNCTriangle <- function(nObs,PropTDisc) {
PropCorrect <- True2Correct(PropTDisc)
sum(sample(0:1,nObs,c(1-PropCorrect,PropCorrect),replace=TRUE))
}

TrialRun <- function(PropTDisc) {
test1 <- rNCTriangle(16,PropTDisc)
test2 <- rNCTriangle(19,PropTDisc)
ans <- if (test1+test2>limit) 'Reject H0 end' else 'Not Reject H0 end'
if (test1<6) ans <- 'Not Reject H0 fase 1'
if (test1>9) ans <- 'Reject H0 fase 1'
ans
}

Nsamp <- 1e5
DecisionH0 <- sapply(1:Nsamp,function(x) TrialRun(0))
as.data.frame(table(DecisionH0)/Nsamp)

DecisionH0    Freq
1    Not Reject H0 end 0.40487
2 Not Reject H0 fase 1 0.54435
3        Reject H0 end 0.03488
4     Reject H0 fase 1 0.01590
DecisionH1 <- sapply(1:Nsamp,function(x) TrialRun(1/3))

as.data.frame(table(DecisionH1)/Nsamp)

DecisionH1    Freq
1    Not Reject H0 end 0.11983
2 Not Reject H0 fase 1 0.04316
3        Reject H0 end 0.45319
4     Reject H0 fase 1 0.38382