Sunday, October 5, 2014

Bayes models from SAS PROC MCMC in R, post 2

This is my second post in converting SAS's PROC MCMC examples in R. The task in his week is determining the transformation parameter in a Box-Cox transformation. SAS only determines Lambda, but I am not so sure about that. What I used to do was get an interval for Lambda, select an interpretable value (e.g. 1/2 is square root, -1 is inverse) and progress using this parameter. However, if that is my approach in a Bayesian context, maybe I should make my prior on lambda only those values I find acceptable. On the other hand, if I model lambda continuous, should not the uncertainty in Lambda also effect the uncertainty in the remainder of the model? In this post I will consider them all continuous. Hence the first step is to get SAS to give me those values, for which I added those parameters in the code and ran it through SAS Studio. Subsequently find the code for the various R approaches. For readability I stuck the code next to text, while output is at the bottom of the post. 

Note on previous post

Last week I failed to get some samplers. I now realize that some of these might have been solved with the ones trick or the zero trick. In addition, LaplacesDemon has dnormm() which is more elegant than my approach.

SAS

This is just the model from the SAS manual with extra parameters monitored.
proc mcmc data=boxcox nmc=50000 propcov=quanew seed=12567 monitor=(lda beta0 beta1 sd);
    ods select PostSumInt TADpanel;
    parms beta0 0 beta1 0 lda 1 s2 1;
    beginnodata;
    prior beta: ~ general(0);
    prior s2 ~ gamma(shape=3, scale=2);
    prior lda ~ unif(-2, 2);
    sd=sqrt(s2);
    endnodata;
    ys=(y**lda-1)/lda;
    mu=beta0+beta1*x;
    ll=(lda-1)*log(y)+lpdfnorm(ys, mu, sd);
    model general(ll);
run;

STAN

Rstan has the Jacobian needed for this exercise. It was fairly straightforward to fit it all together.
library(plyr)
library(dplyr)
# stan
library(rstan)
datain <- list(n=100,x=set2$x,y=set2$y)
parameters <- c('lda','beta0','beta1','std')
fit <- '
    data {
    int <lower=1> n;
    vector[n] y;
    vector[n] x;
    }
    parameters {
    real <lower=-2,upper=2> lda;
    real beta0;
    real beta1;
    real <lower=0> s2;
    }
    transformed parameters {
    vector[n] ys;
    vector[n] ll;
    vector[n] mu;
    real <lower=0> std;
    std <- sqrt(s2);
    for (i in 1:n) {
    ys[i] <- (pow(y[i],lda)-1)/lda;
    }
    mu <- beta0+beta1*x;
    ll <- (lda-1)*log(y);
    }
    model {
    s2 ~ gamma(3, 0.5);
    increment_log_prob(ll);
    increment_log_prob(normal_log(ys,mu,std));
    }
    ' %>%
stan(model_code = .,
    data = datain,
    pars=parameters)
fit

JAGS 

The JAGS code is not straightforward for two reasons.
The first reason is that Jags does not have the concept of Jacobian. However, upon some investigation I found the zeros trick. Which basically means that an arbitrary contribution to the deviance might be added by transforming that deviance and pretending to JAGS the values are from a Poisson distribution.
The second reason is that JAGS has directed acyclic graphs. At least, I think that is the reason. One cannot first calculate transformed values and then make them dependent on a distribution. My solution was to subtract transformed and estimated values and make the result normal around 0.
To get acceptable results in terms of n.eff and Rhat I increased the number of samples to 10000 per chain.
library(R2jags)
datain <- list(n=100,
    x=set2$x,
    y=set2$y,
    zero1 = rep(0,100),
    zero2 = rep(0,100))
parameters <- c('LDA','beta0','beta1','std')

jmodel <- function() {
    tau <- 1/s2
    std <- sqrt(s2)
    for (i in 1:n) {       
        mu[i] <- beta0+x[i]*beta1
        ys[i] <- (pow(y[i],LDA)-1)/LDA -mu[i]
        zero1[i] ~ dnorm(ys[i],tau)
        ll[i] <- -((LDA-1)*log(y[i]))+100
        zero2[i] ~ dpois(ll[i])
    }
    beta0 ~ dnorm(0,.0001)
    beta1 ~ dnorm(0,.0001)
    s2 ~ dgamma(3,.5)
    LDA ~ dunif(-2,2)
}
inits  <- function() {
    list(beta0=rnorm(1),
        beta1=rnorm(1),
        LDA=runif(1,-1,1),
        s2=runif(1,.1,3))
}
jj <- jags(model.file=jmodel,
    data=datain,
    inits=inits,
    n.iter=10000,
    parameters=parameters)
jj

MCMC

As such it is relatively easy to build the model for MCMC. Based on the demo vignette I was also able to set scale to obtain an acceptance rate about 0.2. After some additional calculations I came to batch length of 10000, and chose for 20 batches to keep computation time somewhat limited. All in all it took quite some attempts to get to these settings in terms of nbatch and batch length. MCMC only gives summary statistics based on each separate batch. These are subsequently summarized. The results seem a bit different from SAS and JAGS. In addition a small tweak was needed to get SD as output rather than internal variable S2.
mydens <- function(x,...) {
    if (x[4]<0 ) return(-Inf)
    std <- sqrt(x[4])
    mu <- x[2]+set2$x*x[3]
    ys <- (set2$y^x[1]-1)/x[1]
    ll <- (x[1]-1)*log(set2$y)
    sum(dnorm(ys,mu,std,log=TRUE)+ll) +dgamma(x[4],shape=3,scale=2)
}
sam <-  metrop(mydens,
    initial=c(.46,-2,1.8,1.3),
    nbatch=20,
    blen=10000,
    scale=.025,
    outfun=function(z, ...) {
        z[4] <- sqrt(z[4])
        c(z, z^2)}
)
sam$accept
foo <- apply(sam$batch, 2, mean)
mu <- foo[1:4]
sigmasq <- foo[5:8] - mu^2
data.frame(param=c('LDA','beta0','beta1','SD'),mu,sigmasq)



MCMCpack

Obviously, the deviance code is same as MCMC. Based on the ACF a thinning of 20 was chosen. I did not see an obvious reason to change from the default 20000 samples.
library(MCMCpack)
library(coda)
mydens <- function(x,...) {
    if (x[4]<0 ) return(-Inf)
    std <- sqrt(x[4])
    mu <- x[2]+set2$x*x[3]
    ys <- (set2$y^x[1]-1)/x[1]
    ll <- (x[1]-1)*log(set2$y)
    sum(dnorm(ys,mu,std,log=TRUE)+ll) +dgamma(x[4],shape=3,scale=2)
}
sam <- MCMCmetrop1R(mydens,
    theta.init=c(.46,-2,1.8,1.3),
    tune=1.5,thin=20)
summary(sam)

rcppbugs

Rcppbugs has many of the same limits as JAGS in terms of available distributions. The same tricks as used in JAGS are hence used in rcppbugs, with one exception, where JAGS has a Poison distribution, rcppbugs lacks it, hence the ones trick is used, which is the equivalent to zeros trick, but using Bernoulli distribution. The ACF gave quite a big lag, hence I upped both the number of samples and increased thinning.

library(rcppbugs)
x <- set2$x
y <- set2$y
one <- rep(1,100)
yzero <- rep(0,100)
beta <- mcmc.normal(c(.1,.1),mu=0,tau=0.0001)
LDA <- mcmc.uniform(.5,-2,2)
tau.y <- mcmc.gamma(sd(as.vector(y)),alpha=0.1,beta=0.1)
y.hat <- deterministic(function(x,beta,y,LDA) {
        ypred=beta[1]+beta[2]*x
        ytrans=(y^LDA-1)/LDA
        ypred-ytrans
    }, x, beta,y,LDA)
Jac <- deterministic(function(LDA,y)
        exp((LDA-1)*log(y))/100,LDA,y)
cJac <- mcmc.bernoulli(one,Jac,observed=TRUE)
y.lik <- mcmc.normal(yzero,mu=y.hat,tau=tau.y,observed=TRUE)

m <- create.model(beta,LDA, tau.y,
    y.hat,
    y.lik,
    Jac,
    cJac)
ans <- run.model(m, iterations=1e6L, burn=1e4L, adapt=1e3L, thin=1000L)

mysum <- function(x) c(mean=mean(x),sd=sd(x),quantile(x,c(.025,.5,.975)))
rbind(LDA=mysum(ans$LDA),
    beta0=mysum(ans$b[,1]),
    beta1=mysum(ans$b[,2]),
    sd=mysum(1/sqrt(ans$tau.y)))

LaplacesDemon

LaplacesDemon also needed a lot of samples. At a later attempt I tried the HARM sampler which worked much better. Getting the right sampler is important.
library('LaplacesDemon')
mon.names <- "LP"
parm.names <- c('beta0','beta1','LDA','std')
PGF <- function(Data) c(rnorm(2,0,1),runif(1,.4,.5),runif(1,.5,2))

MyData <- list(mon.names=mon.names,
    parm.names=parm.names,
    PGF=PGF,
    y=set2$y,
    x=set2$x)
Model <- function(parm, Data)
{
    ### Parameters
    std <- interval(parm[4], 1e-100, Inf)
    parm[4] <- std
    s2 <- parm[4]^2
    mu <- parm[1]+Data$x*parm[2]
    ys <- (Data$y^parm[3]-1)/parm[3]
    ll <- (parm[3]-1)*log(Data$y)
    LL <- sum(dnorm(ys,mu,std,log=TRUE)+ll)
    LP <- LL+dgamma(s2,shape=3,scale=2)
    Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LP,
        yhat=1,
        parm=parm)
    return(Modelout)
}
# very dependent on initial values
Initial.Values <- GIV(Model, MyData, PGF=TRUE)
Fit <- LaplacesDemon(Model,
    Data=MyData,
    Algorithm='HARM',
    Specs=list(alpha.star=0.234, B=NULL),
    Initial.Values = Initial.Values,
    Iterations=40000,Status=2000,Thinning=30
)
Fit

Output

SAS


                              Standard 
Parameter     N     Mean      Deviation     95% HPD Interval
lda        50000    0.4701     0.0290     0.4142     0.5281
beta0      50000   -2.2064     0.4215    -3.0564    -1.4201
beta1      50000    1.9260     0.1806     1.5854     2.2826
sd         50000    1.3275     0.1482     1.0648     1.6328


STAN

Inference for Stan model: __LHS.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

         mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
lda      0.47    0.00 0.03    0.41    0.45    0.47    0.49    0.53   595 1.01
beta0   -2.20    0.02 0.40   -3.01   -2.45   -2.18   -1.94   -1.44   684 1.01
beta1    1.92    0.01 0.17    1.59    1.81    1.91    2.02    2.28   571 1.01
std      1.32    0.01 0.15    1.06    1.22    1.31    1.42    1.65   691 1.00
lp__  -309.43    0.05 1.45 -313.02 -310.13 -309.09 -308.36 -307.64   977 1.00

Samples were drawn using NUTS(diag_e) at Sat Oct  4 15:14:25 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

JAGS 


Inference for Bugs model at "/tmp/RtmpSgjWNR/modelef330519173.txt", fit using jags,
 3 chains, each with 10000 iterations (first 5000 discarded), n.thin = 5
 n.sims = 3000 iterations saved
           mu.vect sd.vect      2.5%       25%       50%       75%     97.5%
LDA          0.472   0.030     0.413     0.451     0.472     0.491     0.535
beta0       -2.229   0.443    -3.200    -2.512    -2.199    -1.920    -1.450
beta1        1.938   0.192     1.608     1.805     1.925     2.054     2.356
std          1.336   0.155     1.079     1.227     1.319     1.433     1.680
deviance 20620.595   3.365 20616.435 20618.142 20619.763 20622.197 20629.012
          Rhat n.eff
LDA      1.009  2900
beta0    1.003  1400
beta1    1.008  1500
std      1.002  3000
deviance 1.011   480

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 5.6 and DIC = 20626.2
DIC is an estimate of expected predictive error (lower deviance is better).

MCMC


  param         mu      sigmasq
1   LDA  0.4614172 0.0007537545
2 beta0 -2.0808097 0.1414263712
3 beta1  1.8693639 0.0267559710
4    SD  1.2832247 0.0180115043

MCMCpack

Note that MCMCpack has s2 rather than sd in output

Iterations = 501:20481
Thinning interval = 20
Number of chains = 1
Sample size per chain = 1000

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

        Mean      SD  Naive SE Time-series SE
[1,]  0.4657 0.02927 0.0009256      0.0009256
[2,] -2.1565 0.40274 0.0127358      0.0113576
[3,]  1.8988 0.17764 0.0056175      0.0052342
[4,]  1.7048 0.36930 0.0116782      0.0133882

2. Quantiles for each variable:

        2.5%     25%    50%     75%   97.5%
var1  0.4058  0.4479  0.466  0.4854  0.5215
var2 -2.9944 -2.4064 -2.146 -1.8799 -1.4346
var3  1.5606  1.7810  1.890  2.0099  2.2647
var4  1.1204  1.4473  1.652  1.9261  2.5331

rcppbugs


            mean         sd       2.5%        50%      97.5%
LDA    0.4597165 0.02875052  0.4050122  0.4592264  0.5207108
beta0 -2.0937414 0.39740763 -2.9511520 -2.0668905 -1.4015904
beta1  1.8647679 0.17411536  1.5640795  1.8566258  2.2509009
sd     1.2612126 0.14011464  1.0228554  1.2502009  1.5634482

LaplacesDemon

Call:
LaplacesDemon(Model = Model, Data = MyData, Initial.Values = Initial.Values,
    Iterations = 40000, Status = 2000, Thinning = 30, Algorithm = "HARM",
    Specs = list(alpha.star = 0.234, B = NULL))

Acceptance Rate: 0.22825
Algorithm: Hit-And-Run Metropolis
Covariance Matrix: (NOT SHOWN HERE; diagonal shown instead)
       beta0        beta1          LDA          std
0.1329756466 0.0283948376 0.0007961449 0.0174295808

Covariance (Diagonal) History: (NOT SHOWN HERE)
Deviance Information Criterion (DIC):
         All Stationary
Dbar 619.952    619.768
pD     9.531      3.530
DIC  629.483    623.298
Initial Values:
[1] -0.2820944  0.3190142  0.4778030  1.3380326

Iterations: 40000
Log(Marginal Likelihood): -313.4286
Minutes of run-time: 0.12
Model: (NOT SHOWN HERE)
Monitor: (NOT SHOWN HERE)
Parameters (Number of): 4
Posterior1: (NOT SHOWN HERE)
Posterior2: (NOT SHOWN HERE)
Recommended Burn-In of Thinned Samples: 532
Recommended Burn-In of Un-thinned Samples: 15960
Recommended Thinning: 31
Specs: (NOT SHOWN HERE)
Status is displayed every 2000 iterations
Summary1: (SHOWN BELOW)
Summary2: (SHOWN BELOW)
Thinned Samples: 1333
Thinning: 30


Summary of All Samples
                 Mean        SD        MCSE       ESS           LB       Median
beta0      -2.0923727 0.3614098 0.091343057  17.08625   -2.8552455   -2.0861434
beta1       1.8711745 0.1631220 0.037952180  22.13164    1.5969508    1.8620763
LDA         0.4612769 0.0282230 0.004279672  34.00465    0.4086439    0.4616171
std         1.2653917 0.1320557 0.020805981  98.82246    1.0352842    1.2574597
Deviance  619.9523160 4.3660058 0.415674235 188.93047  616.3618665  619.2464350
LP       -309.9042345 2.1778219 0.206890192 190.00073 -313.2903260 -309.5529196
                   UB
beta0      -1.4732055
beta1       2.2047884
LDA         0.5171681
std         1.5495557
Deviance  626.7398118
LP       -308.1117717


Summary of Stationary Samples
                 Mean         SD        MCSE        ESS           LB
beta0      -2.0498690 0.36586611 0.118090993   8.836406   -2.9348999
beta1       1.8509120 0.16921671 0.050214255  15.798638    1.5936363
LDA         0.4577846 0.02886563 0.006775378  21.788913    0.4078682
std         1.2449204 0.12994746 0.027000671  55.838883    1.0332563
Deviance  619.7681778 2.65715568 0.326080727 121.032423  616.3311632
LP       -309.8148748 1.32309998 0.206890192 123.122008 -313.0799005
               Median           UB
beta0      -1.9902222   -1.5178152
beta1       1.8236876    2.2326418
LDA         0.4558953    0.5206561
std         1.2320277    1.5344502
Deviance  619.3065650  626.3929182
LP       -309.5806926 -308.1020531

Appendix: data used


s1 <- scan(what=list(double(),double()),text='
10.0  3.0  72.6  8.3  59.7  8.1  20.1  4.8  90.1  9.8   1.1  0.9
78.2  8.5  87.4  9.0   9.5  3.4   0.1  1.4   0.1  1.1  42.5  5.1
57.0  7.5   9.9  1.9   0.5  1.0 121.1  9.9  37.5  5.9  49.5  6.7
8.3  1.8   0.6  1.8  53.0  6.7 112.8 10.0  40.7  6.4   5.1  2.4
73.3  9.5 122.4  9.9  87.2  9.4 121.2  9.9  23.1  4.3   7.1  3.5
12.4  3.3   5.6  2.7 113.0  9.6 110.5 10.0   3.1  1.5  52.4  7.9
80.4  8.1   0.6  1.6 115.1  9.1  15.9  3.1  56.5  7.3  85.4  9.8
32.5  5.8  43.0  6.2   0.1  0.8  21.8  5.2  15.2  3.5   5.2  3.0
0.2  0.8  73.5  8.2   4.9  3.2   0.2  0.3  69.0  9.2   3.6  3.5
0.2  0.9 101.3  9.9  10.0  3.7  16.9  3.0  11.2  5.0   0.2  0.4
80.8  9.4  24.9  5.7 113.5  9.7   6.2  2.1  12.5  3.2   4.8  1.8
80.1  8.3  26.4  4.8  13.4  3.8  99.8  9.7  44.1  6.2  15.3  3.8
2.2  1.5  10.3  2.7  13.8  4.7  38.6  4.5  79.1  9.8  33.6  5.8
9.1  4.5  89.3  9.1   5.5  2.6  20.0  4.8   2.9  2.9  82.9  8.4
7.0  3.5  14.5  2.9  16.0  3.7  29.3  6.1  48.9  6.3   1.6  1.9
34.7  6.2  33.5  6.5  26.0  5.6  12.7  3.1   0.1  0.3  15.4  4.2
2.6  1.8  58.6  7.9  81.2  8.1  37.2  6.9
')
set2 <- data.frame(x=s1[[2]],y=s1[[1]])

2 comments:

  1. I think you should provide the dataset in link. This way we can test your scripts.

    ReplyDelete
    Replies
    1. It is at the very bottom of the post.

      Delete