data
To quote the SAS/STAT User guide: 'a one-way classification model with heterogeneous variances is fit. The data, (...), represent amounts of different types of fat absorbed by batches of doughnuts during cooking, measured in grams'. Data are read below. The final statement is just ordering the data similar to SAS, which is convenient for comparison on my side.
r1 <- read.table(textConnection('
1 164 1 172 1 168 1 177 1 156 1 195
2 178 2 191 2 197 2 182 2 185 2 177
3 175 3 193 3 178 3 171 3 163 3 176
4 155 4 166 4 149 4 164 4 170 4 168
') ,col.names=paste('a',1:12,sep=''))
r2 <- data.frame(FatType=factor(as.matrix(r1[,seq(1,11,by=2)])),
Absorbed=as.numeric(as.matrix(r1[,seq(2,12,by=2)])))
r2 <- r2[c(seq(1,24,by=4),seq(2,24,by=4),seq(3,24,by=4),seq(4,24,by=4)),]
nlme
For a change, the gls() function is used from the nlme package. A few preparations are needed. A variable x is used as dummy variable in the varIdent sub-statement. Variable i is used in influence diagnostic plotting. SAS contrasts speak for themselves.
r2$x <- 1
r2$i <-1:24
options(contrasts=c( 'contr.SAS','contr.SAS'))
library(nlme)
lme1 <- gls(Absorbed ~ FatType,
weights=varIdent(form=~x | FatType),data=r2 )
summary(lme1)
Generalized least squares fit by REML
Model: Absorbed ~ FatType
Data: r2
AIC BIC logLik
170.3109 178.2767 -77.15543
Variance function:
Structure: Different standard deviations per stratum
Formula: ~x | FatType
Parameter estimates:
1 2 3 4
1.0000000 0.5825169 0.7404827 0.6162593
Coefficients:
Value Std.Error t-value p-value
(Intercept) 162 3.356586 48.26332 0.0000
FatType1 10 6.397916 1.56301 0.1337
FatType2 23 4.618803 4.97965 0.0001
FatType3 14 5.247222 2.66808 0.0148
Correlation:
(Intr) FtTyp1 FtTyp2
FatType1 -0.525
FatType2 -0.727 0.381
FatType3 -0.640 0.336 0.465
Standardized residuals:
Min Q1 Med Q3 Max
-1.581138e+00 -6.625646e-01 -6.533961e-15 5.473172e-01 1.723923e+00
Residual standard error: 13.34166
Degrees of freedom: 24 total; 20 residual
As ever, nlme shows results differently from SAS. The missing bit is the variances. These can be displayed by this statement.
(lme1$sigma*
coef(lme1$modelStruct$varStruct,
uncons = FALSE, allCoef = TRUE))^2
1 2 3 4
177.99996 60.39999 97.59999 67.60003
diagnostics
Having set up the model, leave-one-out statistics is just a sapply() away. For brevity only a few are listed.
sa <- sapply(1:nrow(r2) ,function(x) {
rx <- r2
rx$Absorbed[x] <- NA
lmex <- gls(Absorbed ~ FatType,
weights=varIdent(form=~x | FatType),data=rx,na.action=na.omit )
cc <- coef(lmex$modelStruct$varStruct,uncons = FALSE, allCoef = TRUE)
cc <- cc[order(names(cc))]
c(coef(lmex),
(lmex$sigma*cc)^2)
})
tsa <- as.data.frame( t(sa))
head(tsa)
(Intercept) FatType1 FatType2 FatType3 1 2 3 4
1 162 11.6 23 14 203.29918 60.40017 97.59971 67.60023
2 162 10.0 23 14 222.50049 60.39999 97.59995 67.59992
3 162 10.8 23 14 217.70003 60.40000 97.59999 67.59999
4 162 9.0 23 14 214.99969 60.40013 97.59964 67.60018
5 162 13.2 23 14 145.70021 60.39986 97.60000 67.60007
6 162 5.4 23 14 63.79997 60.40006 97.59999 67.59996
Plots are more interesting.
tsa$i <- 1:nrow(tsa)
library(lattice)
par(mfrow=c(2,2))
plot(`(Intercept)`~i,data=tsa,xlab='')
plot(FatType1~i,data=tsa,xlab='')
plot(FatType2~i,data=tsa,xlab='')
plot(FatType3~i,data=tsa,xlab='')
It is a bit funny in influences, since fat-types are displayed equivalent to intercept. In terms of influence on the FatTypes means a plot not shown in SAS/STAT:
names(tsa)[1] <- 'Intercept'
xyplot(Intercept + I(Intercept+FatType1)
+ I(Intercept+FatType2)+ I(Intercept+FatType3) ~ i ,
data=tsa,
outer=TRUE,
strip=function(..., factor.levels)
strip.default( ...,factor.levels=paste('FatType',c(4,1:3))))
The next plot shows influence on the estimated variances
names(tsa)[5:8] <- paste('FT',1:4,sep='')
xyplot(FT1 + FT2 + FT3 + FT4 ~ i ,data=tsa,outer=TRUE)
residuals
The ground work for residuals is also present. Note that the studentized residuals come out slightly different; there seems to be a factor (5/6)^2 difference.
resdf <- data.frame(Absorbed=r2$Absorbed,
predicted=predict(lme1),
residual=residuals(lme1),
std=attr(residuals(lme1),'std')
)
resdf$lofit <- 0
resdf$lofit[r2$FatType=='1'] <-
rowSums(tsa[,c('Intercept','FatType1')])[r2$FatType=='1']
resdf$lofit[r2$FatType=='2'] <-
rowSums(tsa[,c('Intercept','FatType2')])[r2$FatType=='2']
resdf$lofit[r2$FatType=='3'] <-
rowSums(tsa[,c('Intercept','FatType3')])[r2$FatType=='3']
resdf$lofit[r2$FatType=='4'] <- tsa[,c('Intercept')][r2$FatType=='4']
resdf$PRESSres <- resdf$Absorbed-resdf$lofit
resdf$StudRes <- resdf$residual/resdf$std
head(resdf)
Absorbed predicted residual std lofit PRESSres StudRes
1 164 172 -8.000000e+00 13.34166 173.6 -9.600000e+00 -5.996254e-01
5 172 172 -2.842171e-14 13.34166 172.0 5.684342e-14 -2.130297e-15
9 168 172 -4.000000e+00 13.34166 172.8 -4.800000e+00 -2.998127e-01
13 177 172 5.000000e+00 13.34166 171.0 6.000000e+00 3.747659e-01
17 156 172 -1.600000e+01 13.34166 175.2 -1.920000e+01 -1.199251e+00
21 195 172 2.300000e+01 13.34166 167.4 2.760000e+01 1.723923e+00
nlme's own tools
nlme is not without tools for diagnostics as shown below. I labelled a number of points mentions in SAS/STAT Guide. The cut-off for labeling is determined by id. The value of 0.15 coincides with a two sided 85% interval, is chosen so as to label the four points mentioned in SAS/STAT Guide. It is certainly not a limit I would choose and I have some doubt about four influential points on a total of 24 points.
plot(lme1,form= resid(., type = "p") ~ i | FatType,id=.15)
Second diagnostic plot contains normal plots. The marked points seem reasonable in line with the others.
qqnorm(lme1, ~ resid(., type = "p") | FatType,id=.15)
MCMCglmm
While the nlme sections is really long, MCMCglmm is really short. To run the model there were some tricks. An additional variable avar is nested under FatTypes. MCMCglmm gives a units variance, which does not make sense in the context of this model. I did not know how to remove it, but a suitable prior was used to suppress it.
library(MCMCglmm)
r2$avar <- factor(rep(1:6,4))
prior1 <- list(R=list(V=diag(1)*.1,nu=10),
G=list(G1=list(V=diag(4),nu=.01) )
)
mc1 <- MCMCglmm(Absorbed ~ FatType,
random=~ idh(FatType):avar,
data=r2 ,pr=TRUE,verbose=FALSE,
prior=prior1, nitt=50000)
summary(mc1)
Iterations = 3001:49991
Thinning interval = 10
Sample size = 4700
DIC: 36.68339
G-structure: ~idh(FatType):avar
post.mean l-95% CI u-95% CI eff.samp
1.avar 295.5 42.52 773.6 4700
2.avar 100.1 14.96 259.5 4712
3.avar 164.9 21.59 443.2 4700
4.avar 110.4 18.25 291.0 4700
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 0.1256 0.03751 0.2588 4359
Location effects: Absorbed ~ FatType
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 161.9363 153.0445 169.8837 4700 < 2e-04 ***
FatType1 10.0627 -6.9373 25.8132 4969 0.18681
FatType2 22.9649 12.0911 34.9963 4700 0.00128 **
FatType3 13.9831 0.8961 27.4354 4700 0.04043 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
posterior.mode(mc1$VCV)
1.avar 2.avar 3.avar 4.avar units
133.71520455 42.92154371 73.72997958 51.18475597 0.09115953
Influence diagnostics
It is easy enough to extract parameter estimates from the simulations, so a sapply() to get leave-one-out estimates similar to nlme is obvious. However, residuals() is not yet implemented, so that is far away. It was my choice not to do this.
No comments:
Post a Comment