Data
This seems like a funny way to create data, but the presentation contains the data in these four lines. The code after that, putting it in a data.frame, is mine.
T1 <- c(28.39,33.36,24.29,28.61,59.49,42.30)
T2 <- c(49.42,36.78,34.81,45.54,28.23,25.71)
R1 <- c(39.86,32.75,34.97,45.44,27.87,24.26)
R2 <- c(35.44,33.40,24.65,31.77,65.29,37.01)
datain <- data.frame(
aval=c(T1,R2,R1,T2),
subjid=factor(c(
rep(c(1,4,6,7,9,12),2),
rep(c(2,3,5,8,10,11),2))),
seqa=rep(c('TR','RT'),each=12),
trta=rep(c('T','R','R','T'),each=6),
aperiod=factor(rep(c(1,2,1,2),each=6))
)
datain <- datain[order(datain$subjid,datain$aperiod),]
datain$logAval <- log(datain$aval)
Anova
With these balanced data Anova gives no problems
A1 <- aov(logAval ~ aperiod +seqa+ trta + Error(subjid),
data=datain)
summary(A1)
Error: subjid
Df Sum Sq Mean Sq F value Pr(>F)
seqa 1 0.0023 0.0023 0.014 0.907
Residuals 10 1.5943 0.1594
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
aperiod 1 0.02050 0.020501 3.784 0.0804 .
trta 1 0.00040 0.000397 0.073 0.7921
Residuals 10 0.05417 0.005417
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Extraction of parameters of interest.
The degrees of freedom are calculated outside the anova. This particular bit should be correct even with unbalanced data, but that is untested.
wi <- summary(A1)$`Error: Within`[[1]]
countn <- rowSums(table(datain$seqa,datain$subjid)>0)
sn <- sqrt(1/(2*countn[1]) + 1/(2*countn[2]))
SE <- sqrt(wi$`Mean Sq`[rownames(wi)=='Residuals'])*sn
td <- qt(.05,wi$`Df`[rownames(wi)=='Residuals'])
cat('Point estimate (%)',100*exp(coef(A1)$Within['trtaT']),'\n',
'90% interval (%) ' ,
100*exp(coef(A1)$Within['trtaT']+td*SE),'to ',
100*exp(coef(A1)$Within['trtaT']-td*SE),
'\n',
'CVIntra (%) ',
100*sqrt(exp(wi$`Mean Sq`[rownames(wi)=='Residuals'])-1),
'\n')
Point estimate (%) 100.8168
90% interval (%) 95.47312 to 106.4596
CVIntra (%) 7.370138
nlme
nlme gave no problems. the parameters are the same.
library(nlme)
L1 <- lme(logAval ~ aperiod + seqa + trta,
random=~1|subjid,
data=datain)
summary(L1)
Linear mixed-effects model fit by REML
Data: datain
AIC BIC logLik
6.767892 12.74229 2.616054
Random effects:
Formula: ~1 | subjid
(Intercept) Residual
StdDev: 0.2775045 0.07360159
Fixed effects: logAval ~ aperiod + seqa + trta
Value Std.Error DF t-value p-value
(Intercept) 3.510257 0.11720776 10 29.949011 0.0000
aperiod2 0.058454 0.03004772 10 1.945360 0.0804
seqaTR 0.019577 0.16301059 10 0.120097 0.9068
trtaT 0.008135 0.03004772 10 0.270732 0.7921
Correlation:
(Intr) aperd2 seqaTR
aperiod2 -0.128
seqaTR -0.695 0.000
trtaT -0.128 0.000 0.000
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.21397837 -0.41862241 -0.05987476 0.37505780 1.30243785
Number of Observations: 24
Number of Groups: 12
Extraction of parameters of interest
One noticeable things here is that VarCorr() resulted in a character variable, so this is transformed by as.numerical.
coL1 <- fixef(L1)
SEL1 <- sqrt(vcov(L1)['trtaT','trtaT'])
dfL1 <- L1$fixDF$terms[names(L1$fixDF$terms)=='trta']
tdL1 <- qt(.05,dfL1)
cat('Point estimate (%)',100*exp(coL1['trtaT']),'\n',
'90% interval (%) ' ,
100*exp(coL1['trtaT']+tdL1*SEL1),'to ',
100*exp(coL1['trtaT']-tdL1*SEL1),
'\n',
'CVIntra (%) ',
100*sqrt(exp(as.numeric(VarCorr(L1)['Residual','Variance']))-1),
'\n')
Point estimate (%) 100.8168
90% interval (%) 95.47312 to 106.4596
CVIntra (%) 7.370138
MCMCglmm
Definitely not the standard or expected calculation, but the answers are really close to the expected values.
M1 <- MCMCglmm(logAval ~ aperiod + seqa + trta,
random= ~ subjid ,
data=datain,family='gaussian',nitt=500000,thin=20,
burnin=50000,verbose=FALSE)
summary(M1)
Iterations = 50001:499981
Thinning interval = 20
Sample size = 22500
DIC: -41.6716
G-structure: ~subjid
post.mean l-95% CI u-95% CI eff.samp
subjid 0.09517 0.02514 0.1989 22500
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 0.006832 0.00187 0.01401 22500
Location effects: logAval ~ aperiod + seqa + trta
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 3.510768 3.245017 3.764446 22518 <4e-05 ***
aperiod2 0.058452 -0.008879 0.125118 22500 0.0838 .
seqaTR 0.018367 -0.326948 0.397998 22500 0.9148
trtaT 0.007888 -0.056976 0.077410 22500 0.7985
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Extraction of parameters of interest
Fairly straightforward, it is all in the samples. Notice the CV Intra is larger than expected, this was observed before in previous REML exercises. By and large, getting decent (point) estimates for variances is a major weakness for MCMCglmm.
M1est <- 100*exp(quantile(M1$Sol[,'trtaT'],c(.05,.5,.95)))
cat('Point estimate (%)',M1est[2],'\n',
'90% interval (%) ' ,
M1est[1],'to ',M1est[3],
'\n',
'CVIntra (%) ',
100*sqrt(exp(mean(M1$VCV[,'units']))-1),
'\n')
Point estimate (%) 100.7712
90% interval (%) 95.45551 to 106.4557
CVIntra (%) 8.279958
lme4
lme4 was non-cooperating. I expected to use mcmcsamp(), but that function has disappeared in my version of lme4. Since I was a minor version behind on R I updated, which lead to updating Eclipse to from Juno to Kepler, which lead to updating Java while I was at it, but mcmcsamp() was gone. As a note, I had an old version of R/ & lme4 sitting around, the values out of mcmcsamp were quite different from expected. I am using version 0.999999-2 for these calculations.
library(lme4)
R1 <- lmer(logAval ~ aperiod + seqa + trta + (1|subjid),
data=datain)
summary(R1)
Linear mixed model fit by REML ['lmerMod']
Formula: logAval ~ aperiod + seqa + trta + (1 | subjid)
Data: datain
REML criterion at convergence: -5.2321
Random effects:
Groups Name Variance Std.Dev.
subjid (Intercept) 0.077009 0.2775
Residual 0.005417 0.0736
Number of obs: 24, groups: subjid, 12
Fixed effects:
Estimate Std. Error t value
(Intercept) 3.510257 0.117208 29.949
aperiod2 0.058454 0.030048 1.945
seqaTR 0.019577 0.163011 0.120
trtaT 0.008135 0.030048 0.271
Correlation of Fixed Effects:
(Intr) aperd2 seqaTR
aperiod2 -0.128
seqaTR -0.695 0.000
trtaT -0.128 0.000 0.000
Extraction of parameters of interest
The easy way out as detailed in http://glmm.wikidot.com/faq would be to impute the df value from another program, such as aov(). This would mean 10 df, and all the values would be equal to the presentation. I this blog I want to try something else. I wonder of this combination of simulate() and refit() is legit? The 90% interval is slightly narrower than the one in the presentation.
simeff <- function(m0) {
s <- simulate(m0)
fixef(refit(m0,s))['trtaT']
}
ff <- replicate(1000,simeff(R1))
R1est <- 100*exp(quantile(ff,c(.05,.5,.95)))
cat('Point estimate (%)',R1est[2],'\n',
'90% interval (%) ' ,
R1est[1],'to ',R1est[3],
'\n',
'CVIntra (%) ',
100*sqrt(exp(sigma(R1)^2)-1),
'\n')
Point estimate (%) 100.7082
90% interval (%) 95.75779 to 105.7563
CVIntra (%) 7.370137
The bear package can be done in so many different ways, why we need it?
ReplyDeleteI don't know the bear package. But looking at the help files, it is a complete toolbox. Starting with concentration data, ending with bio-equivalence. I feel sure that choices in dosing regime and blood sampling regime merit choices in analysis. Hence many ways in analysis are easily needed.
DeleteThe admittedly funny coding arises from the educational context (denoting test and reference treatments in the two periods). I wouldn’t use it in the real world as well…
ReplyDeleteIn bioequivalence one has to follow regulatory requirements. For EMA that means all effects fixed and for the FDA a mixed effects model (with Satterthwaite’s df). Bad luck for the latter in R. Maybe (duno) the FDA accepts Kenward/Roger (available in the package pbkrtest).
Your resampling approach would not be acceptable to regulatory agencies, since the confidence interval is narrower – which would translate into a higher chance of approving an inequivalent treatment (increased consumer’s risk). First I guessed that it might be a convergence issue, but going form 1e3 to 1e5 improved only the PE (from 100.7082 to 100.8164); the CI is still too narrow (95.93121, 105.9267).
All the best,
Helmut