Data
There are 6 data-sets in the analysis.
- The first data from Schütz.
- The 'Dataset from Chow & Liu'. This is a two-period data set. Schütz did these data in three versions. This version is complete (balanced).
- Data set 2; incomplete (one subject misses one evaluation). Note the EU guide states: 'subjects in a crossover trial who do not provide evaluable data for both of the test and reference products should not be included'.
- Data set 2; unbalanced (one subject is missing).
- EU data set 1: 'a four period crossover study where subjects receive both test and reference twice, with some subjects providing data for only a subset of the treatment periods'
- EU data set 2: 'a three period crossover study where all subjects receive reference twice and test once'
Bio-equivalence analysis
There are five algorithms. I wrapped each in a function and made a unified print() method, so as to ease comparison. These functions have two input parameters; the data and the index for the level of the reference treatment.
- The code from Schütz slide 14. Note this only did two period data, and not the incomplete data.
- ANOVA. In contrast to my previous post subject is now fixed. For the interval I use just the residual degrees of freedom.
- nlme This should deliver what proc mixed would deliver
- MCMCglmm. A Bayesian MCMC sampler.
- lme4. This is traditional mixed model with point estimate and interval via a simulation
Result comparison
Data set 1
Method Point_estimate p5CI p95CI CVIntra
[1,] "Schutz" 100.8168 95.47312 106.4596 7.370138
[2,] "ANOVA" 100.8168 95.47312 106.4596 7.370138
[3,] "nlme" 100.8168 95.47312 106.4596 7.370138
[4,] "MCMCglmm" 100.8011 95.38906 106.5084 8.336442
[5,] "lme4" 100.71 96.03672 105.9925 7.370137
Data set 2
Method Point_estimate p5CI p95CI CVIntra
[1,] "Schutz" 97.17545 88.3128 106.9275 19.47357
[2,] "ANOVA" 97.17545 88.3128 106.9275 19.47357
[3,] "nlme" 97.17545 88.31279 106.9275 19.47359
[4,] "MCMCglmm" 97.09287 86.66635 108.7019 24.21701
[5,] "lme4" 97.51018 89.10346 106.7388 19.47351
Data set 3: Inbalanced
Method Point_estimate p5CI p95CI CVIntra
[1,] "Schutz" 95.79137 87.02926 105.4356 19.05692
[2,] "ANOVA" 95.60894 86.86353 105.2348 19.05692
[3,] "nlme" 95.60894 86.86352 105.2349 19.05693
[4,] "MCMCglmm" 95.6052 85.20512 107.4927 24.13791
[5,] "lme4" 95.80032 86.8732 103.6471 19.05691
Data set 4: Incomplete
Method Point_estimate p5CI p95CI CVIntra
[1,] "ANOVA" 95.60894 86.86353 105.2348 19.05692
[2,] "nlme" 96.46814 87.6157 106.215 19.2212
[3,] "MCMCglmm" 96.78571 86.78392 108.6978 23.77217
[4,] "lme4" 96.76121 88.38716 105.469 19.22122
Data set 5: four period crossover
Method Point_estimate p5CI p95CI CVIntra
[1,] "ANOVA" 115.6587 107.1057 124.8948 41.65396
[2,] "nlme" 115.7298 107.1707 124.9725 41.66876
[3,] "MCMCglmm" 115.7334 107.0849 125.0422 41.87724
[4,] "lme4" 115.7831 106.6829 124.6334 41.66873
Data set 6: three period crossover
Method Point_estimate p5CI p95CI CVIntra
[1,] "ANOVA" 102.2644 97.31555 107.4649 11.85557
[2,] "nlme" 102.2644 97.31555 107.4649 11.85557
[3,] "MCMCglmm" 102.2624 97.29063 107.4589 12.17123
[4,] "lme4" 102.3081 97.30974 107.192 11.85559
- Only to use the Schütz code for balanced two period crossover data. But in that case nlme and ANOVA provide the same answers. Hence it is purely educational
- Given the width of the intervals, the point estimates are close enough so differences can be ignored.
- For all data except the incomplete data the difference between nlme and ANOVA is absent or very small.
- MCMCglmm has some differences, compared to other methods it over estimates CVIntra. MCMCglmm gives a slightly wider confidence interval for bioequivalence.
- lme4 had in most cases a slightly narrower interval.
Code
###############################################################################
# Data
###############################################################################
#Schutz 1
T1 <- c(28.39,33.36,24.29,28.61,59.49,42.30)
T2 <- c(49.42,36.78,34.81,45.54,28.23,25.71)
R1 <- c(39.86,32.75,34.97,45.44,27.87,24.26)
R2 <- c(35.44,33.40,24.65,31.77,65.29,37.01)
S1 <- data.frame(
aval=c(T1,R2,R1,T2),
subjid=factor(c(
rep(c(1,4,6,7,9,12),2),
rep(c(2,3,5,8,10,11),2))),
seqa=rep(c('TR','RT'),each=12),
trta=rep(c('T','R','R','T'),each=6),
aperiod=factor(rep(c(1,2,1,2),each=6))
)
S1 <- S1[order(S1$subjid,S1$aperiod),]
S1$logAval <- log(S1$aval)
S2 <- read.table(textConnection('
Subject Period Sequence Formulation AUC Subject Period Sequence Formulation AUC
1 1 RT Reference 74.675 13 1 TR Test 122.45
1 2 RT Test 73.675 13 2 TR Reference 124.975
2 1 TR Test 74.825 14 1 TR Test 99.075
2 2 TR Reference 37.35 14 2 TR Reference 85.225
3 1 TR Test 86.875 15 1 RT Reference 69.725
3 2 TR Reference 51.925 15 2 RT Test 59.425
4 1 RT Reference 96.4 16 1 RT Reference 86.275
4 2 RT Test 93.25 16 2 RT Test 76.125
5 1 RT Reference 101.95 17 1 TR Test 86.35
5 2 RT Test 102.125 17 2 TR Reference 95.925
6 1 RT Reference 79.05 18 1 TR Test 49.925
6 2 RT Test 69.45 18 2 TR Reference 67.1
7 1 TR Test 81.675 19 1 RT Reference 112.675
7 2 TR Reference 72.175 19 2 RT Test 114.875
8 1 TR Test 92.7 20 1 RT Reference 99.525
8 2 TR Reference 77.5 20 2 RT Test 116.25
9 1 TR Test 50.45 21 1 TR Test 42.7
9 2 TR Reference 71.875 21 2 TR Reference 59.425
10 1 TR Test 66.125 22 1 TR Test 91.725
10 2 TR Reference 94.025 22 2 TR Reference 114.05
11 1 RT Reference 79.05 23 1 RT Reference 89.425
11 2 RT Test 69.025 23 2 RT Test 64.175
12 1 RT Reference 85.95 24 1 RT Reference 55.175
12 2 RT Test 68.7 24 2 RT Test 74.575
'),header=TRUE,stringsAsFactors=FALSE)
S2 <- with(S2,data.frame(
aval=c(AUC,AUC.1),
subjid=factor(c(Subject,Subject.1)),
seqa=factor(c(Sequence,Sequence.1)),
trta=factor(c(Formulation,Formulation.1)),
aperiod=factor(c(Period,Period.1))))
S2$logAval <- log(S2$aval)
S2InC <- S2[!(S2$subjid=='24' & S2$aperiod=='2'),]
S2InB <- S2[!(S2$subjid=='24'),]
##
#The EU data sets I copied-pasted into .csv prior to analysis.
#The EU data sets I copied-pasted into .csv prior to analysis.
#EU1
EU1 <- read.csv('eu1.csv')
EU1$subjid <- factor(EU1$SUBJECT)
EU1$aperiod <- factor(EU1$PERIOD)
EU1$aval <- EU1$DATA
EU1$seqa <- EU1$SEQUENCE
EU1$trta <- EU1$FORMULATION
EU1$logAval <- log(EU1$aval)
#EU2
EU2 <- read.csv('eu2.csv')
EU2$subjid <- factor(EU2$SUBJECT)
EU2$aperiod <- factor(EU2$PERIOD)
EU2$aval <- EU2$DATA
EU2$seqa <- EU2$SEQUENCE
EU2$trta <- EU2$FORMULATION
EU2$logAval <- log(EU2$aval)
###############################################################################
# Analysis functions
###############################################################################
print.BEQ <- function(x,...) {
result <- paste(
" Back transformed (raw data scale)\n",
"Point estimate:",
round(x$Point_estimate, digits=2),"%\n",
"90 % confidence interval:",
round(x$p5CI,digits=2), "% to",
round(x$p95CI, digits=2),"%\n",
"CVIntra:",round(x$CVIntra,digits=2),
"%\n")
cat(result)
invisible(x)
}
BESchutz <- function(datain,iref) {
prepdata <- datain[order(datain$seqa,datain$subjid,datain$aperiod),]
T1 <- with(prepdata,aval[seqa==levels(seqa)[iref] & aperiod==1])
R1 <- with(prepdata,aval[seqa==levels(seqa)[3-iref] & aperiod==1])
R2 <- with(prepdata,aval[seqa==levels(seqa)[iref] & aperiod==2])
T2 <- with(prepdata,aval[seqa==levels(seqa)[3-iref] & aperiod==2])
RT <- log(R1) - log(T2)
TR <- log(R2) - log(T1)
n1 <- length(RT)
mRT <- mean(RT)
vRT <- var(RT)
n2 <- length(TR)
mTR <- mean(TR)
vTR <- var(TR)
mD <- mean(log(c(T1,T2))) - mean(log(c(R1,R2)))
MSE <- (((n1-1)*vRT + (n2-1)*vTR)/(n1+n2-2))/2
alpha <- 0.05
lo <- mD - qt(1-alpha,n1+n2-2)*sqrt(MSE)*
sqrt((1/(2*n1) + 1/(2*n2)))
hi <- mD + qt(1-alpha,n1+n2-2)*sqrt(MSE)*
sqrt((1/(2*n1) + 1/(2*n2)))
#result <- paste(
# paste(" Back transformed (raw data scale)\n",
# "Point estimate:",
# round(100*exp(mD), digits=2),"%\n"),
# paste("90 % confidence interval:"),
# paste(round(100*exp(lo), digits=2), "% to"),
# paste(round(100*exp(hi), digits=2),"%\n",
# paste("CVintra:",round(100*sqrt(exp(MSE)-1),
# digits=2),"%\n")))
result <- list(Method='Schutz',
Point_estimate=100*exp(mD),
p5CI=100*exp(lo),
p95CI=100*exp(hi),
CVIntra=100*sqrt(exp(MSE)-1))
class(result) <- c('BEQ',class(result))
result
}
#######
# plain anova
#######
BEanova <- function(datain,iref) {
trtastring <- paste('trta',levels(datain$trta)[iref],sep='')
A1 <- aov(logAval ~ aperiod +seqa+ trta + subjid,
data=datain)
SE <- sqrt(vcov(A1)[trtastring,trtastring])
td <- qt(.05,A1$df.residual)
result <- list(Method='ANOVA',
Point_estimate=100*exp(coef(A1)[trtastring]),
p5CI=100*exp(coef(A1)[trtastring]+td*SE),
p95CI=100*exp(coef(A1)[trtastring]-td*SE),
CVIntra=100*sqrt(exp(sum(A1$residuals^2)/A1$df.residual)-1))
class(result) <- c('BEQ',class(result))
result
}
########
# nlme
########
library(nlme)
BEnlme <- function(datain,iref) {
trtastring <- paste('trta',levels(datain$trta)[iref],sep='')
L1 <- lme(logAval ~ aperiod + seqa + trta,
random=~1|subjid,
data=datain)
coL1 <- fixef(L1)
SEL1 <- sqrt(vcov(L1)[trtastring,trtastring])
dfL1 <- L1$fixDF$terms[names(L1$fixDF$terms)=='trta']
tdL1 <- qt(.05,dfL1)
result <- list(Method='nlme',
Point_estimate=100*exp(coL1[trtastring]),
p5CI=100*exp(coL1[trtastring]+tdL1*SEL1),
p95CI=100*exp(coL1[trtastring]-tdL1*SEL1),
CVIntra=100*sqrt(exp(as.numeric(VarCorr(L1)['Residual','Variance']))-1))
class(result) <- c('BEQ',class(result))
result
}
######
# MCMCglm
######
library(MCMCglmm)
BEmcmcglmm <- function(datain,iref) {
trtastring <- paste('trta',levels(datain$trta)[iref],sep='')
M1 <- MCMCglmm(logAval ~ aperiod + seqa + trta,
random= ~ subjid ,
data=datain,family='gaussian',nitt=500000,thin=20,
burnin=50000,verbose=FALSE)
M1est <- 100*exp(quantile(M1$Sol[,trtastring],c(.05,.5,.95)))
result=list(Method='MCMCglmm',
Point_estimate=M1est[2],
p5CI=M1est[1],
p95CI=M1est[3],
CVIntra= 100*sqrt(exp(mean(M1$VCV[,'units']))-1))
class(result) <- c('BEQ',class(result))
result
}
#######
# lme4
######
library(lme4)
BElmer <- function(datain,iref) {
trtastring <- paste('trta',levels(datain$trta)[iref],sep='')
R1 <- lmer(logAval ~ aperiod + seqa + trta + (1|subjid),
data=datain)
simeff <- function(m0) {
s <- simulate(m0)
fixef(refit(m0,s))[trtastring]
}
ff <- replicate(1000,simeff(R1))
R1est <- 100*exp(quantile(ff,c(.05,.5,.95)))
result <- list(Method='lme4',
Point_estimate=R1est[2],
p5CI=R1est[1],
p95CI=R1est[3],
CVIntra=100*sqrt(exp(sigma(R1)^2)-1))
class(result) <- c('BEQ',class(result))
result
}
###############################################################################
# Result summary
###############################################################################
t(cbind(BESchutz(S1,2)
, BEanova(S1,2)
, BEnlme(S1,2)
, BEmcmcglmm(S1,2)
, BElmer(S1,2)
))
t(cbind(BESchutz(S2,2)
, BEanova(S2,2)
, BEnlme(S2,2)
, BEmcmcglmm(S2,2)
, BElmer(S2,2)
))
t(cbind( BESchutz(S2InB,2)
, BEanova(S2InB,2)
, BEnlme(S2InB,2)
, BEmcmcglmm(S2InB,2)
, BElmer(S2InB,2)
))
t(cbind( BEanova(S2InC,2)
, BEnlme(S2InC,2)
, BEmcmcglmm(S2InC,2)
, BElmer(S2InC,2)
))
t(cbind( BEanova(EU1,2)
, BEnlme(EU1,2)
, BEmcmcglmm(EU1,2)
, BElmer(EU1,2)
))
t(cbind( BEanova(EU2,2)
, BEnlme(EU2,2)
, BEmcmcglmm(EU2,2)
, BElmer(EU2,2)
))
Hi,
ReplyDeleteI don’t think that your setup of the replicated studies (data sets 5 & 6) is correct. See the results reported in EMA’s Q&A-document for ‘Method C’ – which is also the coding preferred by the FDA.
Data set 5
PE 115.6577% (90% CI: 107.1044 – 124.8939%)
Data set 6
PE 102.2644% (90% CI: 97.0532 – 107.7554%)
These results are obtained in SAS and Pharsight’s Phoenix/WinNonlin.
R is a nasty beast.
Helmut
True. My results ANOVA and nlme are more like 'Method A' and 'Method B' as given by EMA. I don't know if the FDA model can be run in R.
Delete