Sunday, February 22, 2015

Priors on odds and probability of success

In Bayesian Approaches to Clinical Trials and Health-Care Evaluation (David J. Spiegelhalter, Keith R. Abrams, Jonathan P. Myles) they mention that an non informative prior should be uniform over the range of interest. They combine this with the desire that Odds ratio has the same prior as 1/Odds ratio and from this they conclude log Odds ratio should be uniform distributed. They write it is equivalent to Beta(0,0), hence not a proper distribution, but use it anyway. In this post looks at the prior both at log odds scale and probability scale.
As a side, this exercise forced me to look at the Jacobian, I needed it to transform a density to the other scale. For this yacas came in handy, it has been a while since I differentiated by hand.

Analysis of prior

In general, I will try to give both the distribution according to a MCMC sampler (black lines) and an analytical solution (green line). The MCMC sampler used is LaplacesDemon, since it is fully in R it allows reusing analytical code and easy adding of the Jacobian.
As a side, in many of the plots I made the smoothing bandwidth a bit smaller, so some of the details could be captured.
Four priors are used. Probability of success Beta(0.5,0.5) and Beta(1,1). Log odds uniform on -10 to 10 and normal(0,10).
I think LaplacesDemon with the standard algorithm has some difficulty with the beta(0.5,0.5) distribution.



Using Data

After observing data, one success and five failure again same prior. For the priors on log odds there is no exact formulas to obtain the exact likelihood in the book. I used a formula for normal approximation of log odds ratio (2.30) and simplified from there. I am actually surprised how close that is to the MCMC sampler.



Code

# general code
library(magrittr)
library('LaplacesDemon')

#conversion and Jacobian
prob2Lo <- function(prob) log(prob/(1-prob))
Lo2prob <- function(Lo) exp(Lo)/(1+exp(Lo))
Jprob2Lo <- function(prob) prob*(1-prob)
JLo2prob <- function(Lo) (exp(Lo)+1)^2/exp(Lo)

# wrapper for Jacobian
dprob2dLo <- function(Lo,dprob,log=FALSE) {
  prob <- Lo2prob(Lo)
  if (log) {dprob(prob)+log(Jprob2Lo(prob))
  } else dprob(prob)*Jprob2Lo(prob)
}

dLo2dprob <- function(prob,dLo,log=FALSE) {
  Lo <- prob2Lo(prob)
  if (log) {dLo(Lo)+log(JLo2prob(Lo))
  } else dLo(Lo)*JLo2prob(Lo)
}

# helper function
densplot <- function (x,adjust=1,...) {
  density(x,adjust) %>%
      plot(.,...)
}

mon.names <- "LP"
parm.names <- c('prob')
MyData <- list(mon.names=mon.names,
    parm.names=parm.names)
N <-1
Model <- function(parm, Data)
{
  LL <- 1
  yhat <- 1
  LP <- dbeta(parm,.5,.5,log=TRUE)+LL
  Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LL,
      yhat=1, parm=parm)
  return(Modelout)
}
Initial.Values <- c(0.5)
Fit <- LaplacesDemon(Model, Data=MyData,
    Initial.Values,
    Iteration=1e5,Status=1e4)
png('B55.png')
par(mfrow=c(1,2))
densplot(prob2Lo(Fit$Posterior2),
    adjust=.2,
    main='Prior: Probability ~ Beta .5 .5',
    xlab='Log odds')
Lo <- seq(-10,10,.1)
dLo <- dprob2dLo(Lo,dprob=function(prob)
      dbeta(prob,1,1))
lines(x=Lo,y=dLo,col='green')
##
densplot(Fit$Posterior2,
    adjust=.01,
    main='Prior: Probability ~ Beta .5 .5',
    xlab='Probability Success')
prob <- seq(0.01,.99,.01)
dprob <- dbeta(prob,.5,.5)
lines(x=prob,y=dprob,col='green')
dev.off()
###
Model <- function(parm, Data)
{
  LL <- 1
  yhat <- 1
  LP <- dbeta(parm,1,1,log=TRUE)+LL
  Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LL,
      yhat=1, parm=parm)
  return(Modelout)
}
Initial.Values <- c(0.5)
Fit <- LaplacesDemon(Model,
    Data=MyData,
    Initial.Values,
    Iteration=1e5,Status=1e4)
png('B11.png')
par(mfrow=c(1,2))
densplot(prob2Lo(Fit$Posterior2),
    adjust=.1,
    main='Prior: Prob ~ Beta 1 1',
    xlab='Log odds')
Lo <- seq(-10,10,.1)
dLo <- dprob2dLo(Lo,dprob=function(prob)
      dbeta(prob,1,1))
lines(x=Lo,y=dLo,col='green')
densplot(Fit$Posterior2,adj=.01,
    main='Prior: Prob ~ Beta 1 1',
    xlab='Probability Success')
prob <- seq(0.01,.99,.01)
dprob <- dbeta(prob,1,1)
lines(x=prob,y=dprob,col='green')
dev.off()
###
# prior on log odds
##
Model <- function(parm, Data)
{
  LL <- 1
  yhat <- 1
  LP <- dunif(parm,-10,10,log=TRUE)+LL
  Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LL,
      yhat=1, parm=parm)
  return(Modelout)
}
Initial.Values <- c(0)
Fit <- LaplacesDemon(Model,
    Data=MyData,
    Initial.Values,
    Iteration=1e5,
    Status=1e4)
png('u1010.png')
par(mfrow=c(1,2))
densplot(Fit$Posterior2,
    adjust=.1,
    main='Prior: Log Odds ~ Unif -10 10',
    xlab='Log odds')
Lo <- seq(-10.1,10.1,.1)
dLo <- dunif(Lo,-10,10)
lines(x=Lo,y=dLo,col='green')
densplot(Lo2prob(Fit$Posterior2),
    adjust=.01,
    main='Prior: Log Odds ~ Unif -10 10',
    xlab='Probability Success')
prob <- seq(0.01,.99,.01)
dprob <- dLo2dprob(prob,dLo=function(Lo)
      dunif(Lo,-10,10))
lines(x=prob,y=dprob,col='green')
dev.off()
#
Model <- function(parm, Data)
{
  LL <- 1
  yhat <- 1
  LP <- dnorm(parm,0,10,log=TRUE)+LL
  Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LL,
      yhat=1, parm=parm)
  return(Modelout)
}
Initial.Values <- c(0)
Fit <- LaplacesDemon(Model,
    Data=MyData,
    Initial.Values,
    Iteration=1e5,
    Status=1e4)
png('N010.png')
par(mfrow=c(1,2))
densplot(Fit$Posterior2,
    adjust=.1,
    main='Prior: Log Odds ~ norm 0,10',
    xlab='Log odds')
Lo <- seq(-20.1,20.1,.1)
dLo <- dnorm(Lo,0,10)
lines(x=Lo,y=dLo,col='green')
##
densplot(Lo2prob(Fit$Posterior2),
    adjust=.01,
    main='Prior: Log Odds ~ norm 0,10',
    xlab='Probability Success')
prob <- seq(0.01,.99,.01)
dprob <- dLo2dprob(prob,dLo=function(Lo) dnorm(Lo,0,10))
lines(x=prob,y=dprob,col='green')
dev.off()

#############
# with data
#############
# 1 out of 6
parm.names <- c('prob')
MyData <- list(mon.names=mon.names,
    parm.names=parm.names,
    n=6,s=1)

Model <- function(parm, Data)
{
  prob <- parm
  LL <- dbinom(prob,x=Data$s,size=Data$n,log=TRUE)
  LP <- dbeta(prob,.5,.5,log=TRUE)+LL
  Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LL,
      yhat=1, parm=parm)
  return(Modelout)
}
Initial.Values <- c(.5)
Fit <- LaplacesDemon(Model, Data=MyData,
    Initial.Values,
    Iteration=1e5,Status=1e4)
png('db55.png')
par(mfrow=c(1,2))
densplot(prob2Lo(Fit$Posterior2),
    adjust=.1,
    main='Prior: Probability ~ Beta .5 .5',
    xlab='Log odds')
Lo <- seq(-10,10,.1)
dLo <- dprob2dLo(Lo,dprob=function(prob)
      dbeta(prob,1.5,5.5))
lines(x=Lo,y=dLo,col='green')
##
densplot(Fit$Posterior2,
    adjust=.01,
    main='Prior Probability ~ Beta .5 .5',
    xlab='Probability Success',
    xlim=c(0,1))
prob <- seq(0.01,.99,.01)
dprob <- dbeta(prob,1.5,5.5)
lines(x=prob,y=dprob,col='green')
dev.off()
# 1 1
parm.names <- c('prob')
MyData <- list(mon.names=mon.names,
    parm.names=parm.names,n=6,s=1)

Model <- function(parm, Data)
{
  prob <- parm
  LL <- dbinom(prob,x=Data$s,size=Data$n,log=TRUE)
  LP <- dbeta(prob,1,1,log=TRUE)+LL
  Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LL,
      yhat=1, parm=parm)
  return(Modelout)
}
Initial.Values <- c(.5)
Fit <- LaplacesDemon(Model, Data=MyData,
    Initial.Values,
    Iteration=1e5,Status=1e4)
png('db11.png')
par(mfrow=c(1,2))
densplot(prob2Lo(Fit$Posterior2),
    adjust=.1,
    main='Prior: Probability ~ Beta 1 1',
    xlab='Log odds')
Lo <- seq(-10,10,.1)
dLo <- dprob2dLo(Lo,dprob=function(prob) dbeta(prob,2,6))
lines(x=Lo,y=dLo,col='green')
densplot(Fit$Posterior2,
    adjust=.01,
    main='Prior Probability ~ Beta 1 1',
    xlab='Probability Success',
    xlim=c(0,1))
prob <- seq(0.01,.99,.01)
dprob <- dbeta(prob,2,6)
lines(x=prob,y=dprob,col='green')
dev.off()
##u -10 10 #########
mon.names <- "LP"
parm.names <- c('Lo')
MyData <- list(mon.names=mon.names,
    parm.names=parm.names,
    n=6,s=1)

Model <- function(parm, Data)
{
  Lo <- parm
  LL <-dprob2dLo(Lo,
      dprob=function(prob)
        dbinom(prob,
            x=Data$s,
            size=Data$n ,
            log=TRUE),
      log=TRUE)
  yhat <- Lo
  LP <- dunif(Lo,-10,10,log=TRUE)+LL
  Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LL,
      yhat=1, parm=parm)
  return(Modelout)
}
Initial.Values <- c(.1)
Fit <- LaplacesDemon(Model,
    Data=MyData,
    Initial.Values,
    Iteration=1e5,
    Status=1e4)
png('du1010.png')
par(mfrow=c(1,2))
densplot(Fit$Posterior2,
    adjust=.1,
    main='Prior: Log Odds ~ Unif -10 10',
    xlab='Log odds')
Lo <- seq(-20.1,20.1,.1)
dLo <- dnorm(Lo,log(1.5/5.5),sqrt(1/1.5+1/5.5))
lines(x=Lo,y=dLo,col='green')
densplot(Lo2prob(Fit$Posterior2),
    adjust=.01,
    main='Prior: Log Odds ~ Unif -10 10',
    xlab='Probability Success')
prob <- seq(0.01,.99,.01)
dprob <- dLo2dprob(prob,dLo=function(Lo) dnorm(Lo,log(1.5/5.5),sqrt(1/1.5+1/5.5)))
lines(x=prob,y=dprob,col='green')
dev.off()
## N 0 10
parm.names <- c('Lo')
MyData <- list(mon.names=mon.names,
    parm.names=parm.names,
    n=6,s=1)

Model <- function(parm, Data)
{
  Lo <- parm
  LL <-dprob2dLo(Lo,
      dprob=function(prob)
        dbinom(prob,
            x=Data$s,
            size=Data$n,
            log=TRUE),
      log=TRUE)
  yhat <- Lo
  LP <- dnorm(Lo,0,10,log=TRUE)+LL
  Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LL,
      yhat=1, parm=parm)
  return(Modelout)
}
Initial.Values <- c(.1)
Fit <- LaplacesDemon(Model,
    Data=MyData,
    Initial.Values,
    Iteration=1e5,
    Status=1e4)
png('dn010.png')
par(mfrow=c(1,2))

t1 <- 0
t2 <- log(1.5/5.5)
var1 <- 100
var2 <- 1/1.5+1/5.5
tt <- (t1/var1+t2/var2)/(1/var1+1/var2)
vv <- 1/(1/var1+1/var2)
densplot(Fit$Posterior2,
    adjust=.1,
    main='Prior: Log Odds ~ norm 0,10',
    xlab='Log odds')
Lo <- seq(-20.1,20.1,.1)
dLo <- dnorm(Lo,tt,sqrt(vv))
lines(x=Lo,y=dLo,col='green')
densplot(Lo2prob(Fit$Posterior2),
    adjust=.01,
    main='Prior: Log Odds ~ norm 0,10',
    xlab='Probability Success')
prob <- seq(0.01,.99,.01)
dprob <- dLo2dprob(prob,dLo=function(Lo) dnorm(Lo,tt,sqrt(vv)))
lines(x=prob,y=dprob,col='green')
dev.off()

Sunday, February 15, 2015

Trial until first succes

Studying on in Bayesian Approaches to Clinical Trials and Health-Care Evaluation (David J. Spiegelhalter, Keith R. Abrams, Jonathan P. Myles) they state that if you have one success in a trial, that a design is needed to know what that means. For example with six trials and one success, Were there six trials and one success, or were there trials until the first success? Just for fun, I am trying what I can learn with that latter setup.

One sided hypothesis

For a hypothesis of 50% chance of success, it is pretty obvious that only the one sided hypothesis of less than 50% success can be alternative. The hypothesis would be rejected if it takes enough trials to get the first success. This can be simulated pretty easy. It can also be calculated, but for that it seems better to estimate the chances of 1 to 5 trials and take the remainder rather then make a sum of a potential infinite number of fails before the first success. For comparison the binomial test is added. Again one sided, probability of upto one success.
empprob <- function(p,n=1000) {
  samps <- sample(c(FALSE,TRUE),n,replace=TRUE,p=c(1-p,p))
  ww <- which(samps)
  dd <- diff(c(1,ww))
  sum(dd>=6)/length(dd)
}
pr <- seq(0,1,.005)
sa <- sapply(pr,empprob,n=1e4)

dbin <- function(size,prob,log=FALSE) {
  if (log)
    log(prob)+(size-1)*log(1-prob) else
    prob*(1-prob)^(size-1)
}
pbin <- function(size,prob) {
  1-sum(dbin(1:(size-1),prob))
}
sa2 <- sapply(pr,function(x) pbin(6,x))
sa3 <- pbinom(1,6,pr)

plot(x=pr,
    y=sa,
    type='l',
    ylab='Alpha',
    xlab='probability under H0: pr>p',
    main='Five fail one succes')
lines(x=pr,
    y=sa2,
    type='l',
    col='blue')
lines(x=pr,
    y=sa3,
    type='l',
    col='red')
legend(x='topright',col=c('black','blue','red'),
    lty=1,legend=c('simulation, trials until first succes',
        'calculation, trials until first succes',
        'Binomial')) 

Bayes estimate

Obviously, a Bayes estimate is interesting. Using LaplacesDemon I can reuse the density which I developed previously. Note that in hindsight for one experiment it should be equivalent to a binomial. After all, there is a factor six (six permutations) in the density, but that cancels out.
library('LaplacesDemon')
mon.names <- "LP"
parm.names <- c('x','y')
MyData <- list(mon.names=mon.names, parm.names=parm.names)
N <-1
Model <- function(parm, Data)
{
  x <- parm[1]
  y <- parm[2]
  LL <- dbin(6,x,log=TRUE)+dbinom(1,6,y,log=TRUE)
  LP <- sum(dbeta(parm,.5,.5,log=TRUE))+LL
  Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LL,
      yhat=x, parm=parm)
  return(Modelout)
}
Initial.Values <- c(.5,.5)
Fit <- LaplacesDemon(Model, Data=MyData, Initial.Values)

Fit 
Summary of Stationary Samples
               Mean        SD        MCSE      ESS          LB     Median
x         0.2184365 0.1409504 0.011367733 277.9335  0.02194684  0.1991762
y         0.2087487 0.1358410 0.009702183 314.1335  0.02145930  0.1855887
Deviance  8.8469564 1.7228087 0.124345091 341.6582  7.30631061  8.1527719
LP       -4.4234782 0.8614044 0.062172546 341.6582 -6.82652015 -4.0763859
                 UB
x         0.5671549
y         0.5254163
Deviance 13.6530403
LP       -3.6531553


SAS

Proc MCMC has a general distribution where one can use an own distribution. Hence it is easy to do the same Bayes estimate in SAS too. Hence I took this as a toy example to try this feature.
data a;
    size=6;
    output;
run;

proc mcmc data=a;
    parm prob .5;
    prior prob ~uniform(0, 1);
    ll=log(1-prob)+(size-1)*log(prob);
    model size ~ dgeneral(ll);
run;

Sunday, February 8, 2015

Hierarchical log odds model example

I am working through Bayesian Approaches to Clinical Trials and Health-Care Evaluation (David J. Spiegelhalter, Keith R. Abrams, Jonathan P. Myles) (referred to as BACTHCE from here on). In chapter three I saw an example (3.13) where I wanted to do the calculations myself. The example concerns a hierarchical model of death rates which is calculated via a normal approximation of odds ratios.

Data 

According to BACTHCE: 'Epidemiology, animal models and biochemical studies suggested intravenous magnesium sulphate may have a protective effect after acute myocardial infarction (AMI), articularly through preventing serious arrhythmias'. A small set of randomized trials was performed. We will analyze those data. In the data below MgD and ControlD refers to deaths for Mg treated and control patients, while MgN and ControlN refers to the number of patients enrolled in each group.
r1 <- read.csv(text="
        Trial      MgD  MgN ControlD ControlN
        Morton       1   40   2   36 
        Rasmussen    9  135  23  135
        Smith        2  200   7  200
        Abraham      1   48   1   46
        Feldstedt   10  150   8  148
        Shechter     1   59   9   56
        Ceremuzynski 1   25   3   23
        LIMIT-2     90 1159 118 1157
        ",skip=1,sep='')

Analysis

Odds rations

The first analysis will be to reproduce all the derived variables, such as log odds ratios. This will reproduce the essentials of table 3.8. All formulas are in BACTHCE. Equivalent N is the N if this were normal distributed data with SD=2.
logoddrat <- function(
    a1,b1=N1-a1,N1=a1+b1,
    a2,b2=N2-a2,N2=a2+b2) {
  log(((a1+.5)/(b1+.5))/
          ((a2+.5)/(b2+.5)))
}
sdlogoddrat <- function(
    a1,b1=N1-a1,N1=a1+b1,
    a2,b2=N2-a2,N2=a2+b2) {
  sqrt(1/(a1+.5)+1/(b1+.5) +1
          /(a2+.5)+1/(b2+.5))
}

r1$logoddsratio <- with(r1,
    logoddrat(a1=MgD,N1=MgN,a2=ControlD,N2=ControlN))
r1$sdodds <- with(r1,
    sdlogoddrat(a1=MgD,N1=MgN,a2=ControlD,N2=ControlN))
r1$equivN <- (2/r1$sdodds)^2
r1
         Trial MgD  MgN ControlD ControlN logoddsratio    sdodds     equivN
1       Morton   1   40        2       36  -0.64616697 1.0587581   3.568342
2    Rasmussen   9  135       23      135  -1.02299771 0.4057220  24.299805
3        Smith   2  200        7      200  -1.12412388 0.7372510   7.359177
4      Abraham   1   48        1       46  -0.04301739 1.1731854   2.906208
5    Feldstedt  10  150        8      148   0.21130909 0.4765711  17.611833
6     Shechter   1   59        9       56  -2.05412373 0.9000425   4.937805
7 Ceremuzynski   1   25        3       23  -1.02554609 1.0207731   3.838854
8      LIMIT-2  90 1159      118     1157  -0.29801453 0.1462380 187.042101

Odds ratios in plot

The plot shows the odds ratios as function of standard deviation of the hyper priors (tau). Mu is the overall mean. Note that I could not find formula for the overall mean in BACTHCE, and just assumed that the value for By in shrinkage was applied. 
mut <- function(tau,r1) {  
  yy <- r1$logoddsratio
  By <- r1$sdodds^2/(r1$sdodds^2+tau^2)
  mus <- sum(yy*r1$equivN*By)/sum(r1$equivN*By)
  yy <- r1$logoddsratio*(1-By)+mus*By
  data.frame(Trial=c(as.character(r1$Trial),'Mu'),tau=tau,mu=c(yy,mus)) 
}

la <- lapply(seq(0,2,.01),function(x) mut(x,r1))
lac <- do.call(rbind,la) 
lac$Trial <- factor(lac$Trial,levels=c(levels(r1$Trial),'Mu'))

ggplot(lac,aes(x=tau,y=mu,col=Trial)) +
    geom_line() +
    theme(legend.position="bottom")  +
    guides(col=guide_legend(ncol=3)) +
    scale_y_continuous('Log odds ratio')

Beta binomial hierarchical model

I wanted to know how that would compare if I did that calculation as a beta-binomial, for which I used JAGS. Lor is the estimated log odds ratio, which with -0.7 is close to what is found in BACTHCE using a method of moments estimator for tau. It can be seen that tau in the neighborhood of 0 is possible. The theta parameter represents the probability of death in both groups. What I noticed here is that the control group has a slightly larger standard deviation in theta. Is that a reason for heterogeneity in tau which was mentioned in BACTHCE?
library(R2jags)
datain <- as.list(r1[,2:5])
datain$n <- nrow(r1)
model1 <- function() {
  for (i in 1:2) {
    kappa_log[i] ~  dpar(1,1.5)
    kappa[i] <- exp(kappa_log[i])
    theta[i] ~ dnorm(thetam[i],thetatau[i])
    thetam[i] ~ dunif(0,1)
    thetatau[i] <- pow(thetasd[i],-2) 
    thetasd[i] ~ dnorm(0,.3) %_% I(0,) 
    a[i] <- kappa[i] * theta[i]
    b[i] <- kappa[i] * (1 - theta[i]) 
  } 
  for (i in 1:n) {
    MgD[i] ~ dbetabin(a[1],b[1],MgN[i])
    ControlD[i] ~ dbetabin(a[2],b[2],ControlN[i])
  }
  lor <- log((theta[1]/(1-theta[1]))/
          (theta[2]/(1-theta[2])))
}

parameters <- c('theta','lor')

jj <- jags(model.file=model1,data=datain,parameters=parameters,DIC=FALSE)
jj 
Inference for Bugs model at "C:/Users/Kees/AppData/Local/Temp/Rtmp6h4pKI/model1814317e3684.txt", fit using jags,
 3 chains, each with 2000 iterations (first 1000 discarded)
 n.sims = 3000 iterations saved
         mu.vect sd.vect   2.5%    25%    50%    75% 97.5%  Rhat n.eff
lor       -0.701   0.485 -1.637 -1.019 -0.706 -0.388 0.298 1.001  2500
theta[1]   0.057   0.021  0.029  0.043  0.053  0.065 0.116 1.003  1200
theta[2]   0.106   0.031  0.060  0.085  0.101  0.123 0.183 1.001  3000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

Beta binomial in SAS

In clinical trials, SAS is often the tool of choice. Hence the beta binomial in PROC MCMC. The results are a bit different from JAGS, but then I had some difficulty creating the model in PROC MCMC and reruns gave somewhat varying results. I already did increase the number of samples, but there might be better tricks to get stable results.
data r1;
input  Trial $ MgD  MgN ControlD ControlN;
cards;
        Morton       1   40   2   36 
        Rasmussen    9  135  23  135
        Smith        2  200   7  200
        Abraham      1   48   1   46
        Feldstedt   10  150   8  148
        Shechter     1   59   9   56
        Ceremuzynski 1   25   3   23
        LIMIT-2     90 1159 118 1157
;;;;
run;

proc mcmc data=r1 monitor=(theta1 theta2 lor) 
     ntu=2000 nbi=4000 nmc=8000 nthin=8;
      parms (kappa_log1 kappa_log2) 1.8 (theta1 theta2) .1;
      prior kappa_log1 ~ uniform(0,5);
      prior kappa_log2 ~ uniform(0,5);
      prior theta1 ~ uniform(0,1);
      prior theta2 ~ uniform(0,1);
      kappa1 = exp(kappa_log1);
      kappa2 = exp(kappa_log2);
      a1 = kappa1*theta1;
      b1 = kappa1*(1-theta1);
      a2 = kappa2*theta2;
      b2 = kappa2*(1-theta2);
      lor = log((a1/b1)/(a2/b2));
      random Mgp ~beta(a1,b1) subject=trial;
      random Controlp ~beta(a2,b2) subject=trial;
      model MgD ~  binomial(MgN,Mgp);
      model ControlD ~ binomial(ControlN,Controlp); 
 run;  
Posterior Summaries and Intervals
ParameterNMeanStandard
Deviation
95% HPD Interval
theta110000.05190.01570.02440.0841
theta210000.09590.02240.05960.1404
lor1000-0.67900.3880-1.48620.0872



Effective Sample Sizes
ParameterESSAutocorrelation
Time
Efficiency
theta1591.51.69070.5915
theta2247.04.04830.2470
lor343.72.90970.3437

Sunday, February 1, 2015

Unemployment

Beginning 2013 I made a post plotting unemployment in Europe. Last year I did the same. Now that the unemployment numbers of December are on Eurostat, I am making them again. The plots shown are unemployment and its first derivative, both smoothed.
You can discuss these data extensively. Eurostat has a page for those who are interested. What I am personally interested in is a feeling for the trend where we are going in the future. In many places things are improving, for which youth unemployment is a good indicator. However, a number of changes have occurred in the last month(s). Cheap oil, dropping Euro prices, quantitative easing starting in the euro zone and ending in the USA, so I probably should not speculate.

Unemployment


First Derivative


 




Code

This code has bee lightly adapted from last years. This had mostly to do with a change in countries, the Euro Area has grown, and it seems last year I managed to get all indicator variables in columns, while this year the values were in three age columns.
library(ggplot2)
library(KernSmooth)
library(plyr)
library(scales) # to access breaks/formatting functions

r1 <- read.csv("une_rt_m.csv",na.strings=':',check.names=FALSE)
names(r1)[2] <- 'GEO'
r1 <- reshape(r1,
    varying=list(names(r1)[3:5]),
    v.names='Value',
    timevar='AGE',
    idvar=names(r1)[1:2],
    times=names(r1)[3:5],
    direction='long')
rownames(r1) <- 1:nrow(r1)
levels(r1$GEO) <- sub(' countries)',')' ,levels(r1$GEO),fixed=TRUE)
levels(r1$GEO) <- sub('European Union','EU' ,levels(r1$GEO))
levels(r1$GEO)[levels(r1$GEO)=='Euro area (EA11-2000, EA12-2006, EA13-2007, EA15-2008, EA16-2010, EA17-2013, EA18-2014, EA19)'] <- "Euro Area"
levels(r1$GEO)[levels(r1$GEO)=='United Kingdom'] <- 'UK'
levels(r1$GEO)[levels(r1$GEO)=='United States'] <- 'US'
levels(r1$GEO)[levels(r1$GEO)=='Germany (until 1990 former territory of the FRG)'] <- 'Germany'
levels(r1$GEO)
grep('12|13|15|16|17|18|25|27',x=levels(r1$GEO),value=TRUE)
r1 <- r1[!(r1$GEO %in% grep('12|13|15|16|17|18|25|27',x=levels(r1$GEO),value=TRUE)),]
r1$GEO <- factor(r1$GEO)
r1$Age <- factor(r1$AGE,levels=unique(r1$AGE))
r1$Date <- as.Date(paste(gsub('M','-',as.character(r1$TIME)),'-01',sep=''))

#
maxi <- aggregate(r1$Value,by=list(GEO=r1$GEO),FUN=max,na.rm=TRUE)
parts <- data.frame(
    low = maxi$GEO[maxi$x<quantile(maxi$x,1/3)]
    ,middle = maxi$GEO[maxi$x>quantile(maxi$x,1/3) & maxi$x<quantile(maxi$x,2/3)]
    ,high = maxi$GEO[maxi$x>quantile(maxi$x,2/3)]
)
#ggplot(r1[r1$GEO %in% low,],aes(x=Date,y=Value,colour=Age)) +
#        facet_wrap( ~ GEO, drop=TRUE) +
#        geom_line()  +
#        theme(legend.position = "bottom")
#        ylab('% Unemployment') + xlab('Year')


r1$class <- interaction(r1$GEO,r1$Age)
head(r1)
r3 <- r1[complete.cases(r1),]
r3$class <- factor(r3$class)
Perc <- ddply(.data=r3,.variables=.(class),
    function(piece,...) {
        lp <- locpoly(x=as.numeric(piece$Date),y=piece$Value,
            drv=0,bandwidth=90)
        sdf <- data.frame(Date=as.Date(lp$x,origin='1970-01-01'),
            sPerc=lp$y,Age=piece$Age[1],GEO=piece$GEO[1])}
    ,.inform=FALSE
)
for (i in c('low','middle','high')) {
    png(paste(i,'.png',sep=''))
    print(
        ggplot(Perc[Perc$GEO %in% parts[,i] ,],
                aes(x=Date,y=sPerc,colour=Age)) +
            facet_wrap( ~ GEO, drop=TRUE) +
            geom_line()  +
            theme(legend.position = "bottom")+
            ylab('% Unemployment') + xlab('Year') +
            scale_x_date(breaks = date_breaks("5 years"),
                labels = date_format("%y"))
    )
    dev.off()
}

dPerc <- ddply(.data=r3,.variables=.(class),
    function(piece,...) {
        lp <- locpoly(x=as.numeric(piece$Date),y=piece$Value,
            drv=1,bandwidth=365/2)
        sdf <- data.frame(Date=as.Date(lp$x,origin='1970-01-01'),
            dPerc=lp$y,Age=piece$Age[1],GEO=piece$GEO[1])}
    ,.inform=FALSE
)

for (i in c('low','middle','high')) {
    png(paste('d',i,'.png',sep=''))
    print(
        ggplot(dPerc[dPerc$GEO %in% parts[,i] ,],
                aes(x=Date,y=dPerc,colour=Age)) +
            facet_wrap( ~ GEO, drop=TRUE) +
            geom_line()  +
            theme(legend.position = "bottom")+
            ylab('Change in % Unemployment') + xlab('Year')+
            scale_x_date(breaks = date_breaks("5 years"),
                labels = date_format("%y"))
    )
    dev.off()
}


Sunday, January 25, 2015

Odds, odds ratio, language and intuition

I was reading a statistics book the other day. I am at the beginning. In this section I read '(we) report results as odds ratios, which is more intuitive'. I must have read sentences stating this any number of times. But I don't agree.
It may be my background. My mother tongue is Dutch, which does not have a word for odds. Hence odds is a word which I only learned when I got into statistics. It is an abstract mathematical construct. The odds ratio then, is a ratio between two abstract numbers and initially was just a number. Since I never did much in odds and odds ratio, they never gained meaning either.
Now I am just wondering, what proportion of people reading this kind of books does not have an intuition for odds?