Sunday, April 21, 2013

Ordinal data, models with observers

I recently made three posts regarding analysis of ordinal data. A post looking at all methods I could find in R, a post with an additional method and a post using JAGS. Common in all three was using the cheese data, a data set where only product data was available, observers were not there. The real world is different, there are observers and they are quite different. So, in this post I have added observers and look at two models again; Simple ANOVA and, complex, an ordinal model with random observers.

Data

To spare myself looking for data, and keeping in mind I may want to run some simulations, I have created a data generator. In this generator I entered some of the things I could imagine regarding the observers. First of all they all have different sensitivities. In other words, what is very salt for some, is not at all salt for others. This is expressed in an additive and a multiplicative effect. Second, they have observation error. The same product does not always give the same response. Third is the categories, observers use them slightly different. I am absolutely aware these are much more properties than I might estimate. After all, a typical consumer takes two or three products in a test, while I have many more parameters. Finally, the outer limits (categories 1 and 9) are placed a bit outward. This is because there is an aversion to using these categories.
num2scale <- function(x,scale) {
  cut(x,c(-Inf,scale,Inf),1:(length(scale)+1))
}
pop.limits2ind.limits <- function(scale,sd=.5) {
  newscale <- scale+rnorm(length(scale),sd=sd)
  newscale[order(newscale)]
}

obs.score <- function(obs,pop.score,pop.limits,scale.sd=0.5,
    sensitivity.sd=.1, precision.sd=1,
    additive.sd=2,center.scale=5,
    labels=LETTERS[1:length(pop.score)]) {
  # individual sensitivity (multiplicative)
  obs.errorfreeintensity <- center.scale + 
      (pop.score-center.scale)*rlnorm(1,sd=sensitivity.sd)
  #individual (additive) 
  obs.errorfreeintensity <- obs.errorfreeintensity +
      rnorm(1,mean=0,sd=additive.sd)
  # individual observation error 
  obs.intensity <- obs.errorfreeintensity+
      rnorm(length(pop.score),sd=precision.sd)
  # individual cut offs between categories  
  obs.limits <- pop.limits2ind.limits(pop.limits)
  obs.score <- num2scale(obs.intensity,obs.limits)
  data.frame(obs=obs,
      score = obs.score,
      product=labels
  )
}

panel.score <- function(n=100,pop.score,pop.limits,scale.sd=0.5,
    sensitivity.sd=.1,precision.sd=1,
    additive.sd=2,center.scale=5,
    labels=LETTERS[1:length(pop.score)]) {
  la <- lapply(1:n,function(x) {
        obs.score(x,pop.score,pop.limits,scale.sd,sensitivity.sd,
            precision.sd,labels=labels)
      })
  dc <- do.call(rbind,la)
  dc$obs <- factor(dc$obs)
  dc
}

pop.limits <- c(1,3:8,10)
prodmeans <- c(7,7,8)
scores <- panel.score(40,prodmeans,pop.limits)
scores$numresponse <- as.numeric(levels(scores$score))[scores$score]

Analysis - ANOVA

The first analysis method is ANOVA. Plain ANOVA. Not because I dislike variance components or such, but because this is what is done most commonly. On top of that, I cannot imagine somebody taking these data, pulling it through a mixed model, while forgetting it is not from a continuous scale.
library(multcomp)
Res.aov <- aov( numresponse ~ obs + product , data=scores)
anova(Res.aov)

Analysis of Variance Table

Response: numresponse
          Df  Sum Sq Mean Sq F value    Pr(>F)    
obs       39 239.300  6.1359  6.3686 2.321e-12 ***
product    2   9.517  4.7583  4.9388   0.00956 ** 
Residuals 78  75.150  0.9635                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
mt <- model.tables(Res.aov,type='means',cterms='product') 

data.frame(dimnames(mt$tables$product)[1],
    Response=as.numeric(mt$tables$product),
    LSD=cld(glht(Res.aov,linfct=mcp(product='Tukey'))
    )$mcletters$monospacedLetters
)    
  product Response LSD
A       A    6.725  a 
B       B    6.850  a 
C       C    7.375   b

Analysis - cumulative link mixed model (clmm)

The alternative, is the other extreme. Do it as correct as available, using both a cumulative link and making observers random. This model is standard available in the ordinal package. This means two intermediate models are not shown. Both of these models lack the simplicity of ANOVA while being less correct than clmm, hence inferior to both clmm and ANOVA.
library(ordinal) 
Res.clmm <- clmm(score  ~ product + (1|obs),data=scores)
summary(Res.clmm)

Cumulative Link Mixed Model fitted with the Laplace approximation

formula: score ~ product + (1 | obs)
data:    scores

 link  threshold nobs logLik  AIC    niter    max.grad cond.H
 logit flexible  120  -179.57 379.14 37(5046) 8.77e-06 Inf   

Random effects:
      Var Std.Dev
obs 8.425   2.903
Number of groups:  obs 40 

Coefficients:
         Estimate Std. Error z value Pr(>|z|)   
productB   0.1309     0.4225   0.310  0.75680   
productC   1.4640     0.4718   3.103  0.00192 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Threshold coefficients:
    Estimate Std. Error z value
2|3  -6.0078         NA      NA
3|4  -5.6340         NA      NA
4|5  -4.1059     0.4528  -9.069
5|6  -3.0521     0.4769  -6.400
6|7  -1.0248     0.4970  -2.062
7|8   0.5499     0.5303   1.037
8|9   4.0583     0.7429   5.463
Res.clmm2 <- clmm(score ~ 1 + (1|obs),data=scores)

anova(Res.clmm2,Res.clmm)

Likelihood ratio tests of cumulative link models:

          formula:                    link: threshold:
Res.clmm2 score ~ 1 + (1 | obs)       logit flexible  
Res.clmm  score ~ product + (1 | obs) logit flexible  

          no.par    AIC  logLik LR.stat df Pr(>Chisq)   
Res.clmm2      8 387.41 -185.70                         
Res.clmm      10 379.14 -179.57  12.266  2    0.00217 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
co <- coef(Res.clmm)[c('7|8','productB','productC')]

vc <- vcov(Res.clmm)[c('7|8','productB','productC'),
    c('7|8','productB','productC')]
names(co) <- levels(scores$product)
sd <- sqrt(c(vc[1,1],diag(vc)[-1]+vc[1,1]-2*vc[1,-1]))
data.frame(
    `top 2 box`=arm::invlogit(c(-co[1],-co[1]+co[-1])),
    `2.5% limit`=arm::invlogit(c(-co[1],-co[1]+co[-1])+qnorm(.025)*sd),
    `97.5% limit`=arm::invlogit(c(-co[1],-co[1]+co[-1])+qnorm(.975)*sd),
    check.names=FALSE
)

  top 2 box 2.5% limit 97.5% limit
A 0.3658831  0.1694873   0.6199713
B 0.3967401  0.1753347   0.6704337
C 0.7138311  0.4498042   0.8838691

No comments:

Post a Comment