Priors
There were two proposals. A common step was 'parameterizing in terms of mean theta = (alpha / (alpha + beta)) and total count kappa = (alpha + beta)'. In my own data there are nine regions hence nine sets of theta, kappa, alpha, beta. As hyperprior over the nine regions' incidence I chose a normal distribution for theta. The difference between the proposals is in kappa. I made no hyperpriors over the nine values of kappa.Prior: Uniform on log scale
'Andrew (...) suggests taking the priors on alpha and beta to be uniform on the log scale.' In code this means log kappa is uniform distributed.Prior: p(alpha,beta) prop to (alpha + beta)(-5/2)
Apparently in Bayesian Data Analysis 3 it is proposed to make p(alpha,beta) prop to (alpha + beta)(-5/2). This is handled via a Pareto distribution.Sampling kappa priors
Via a small program I had a look what this means in terms of samples. The uniform log scale seemed to be reasonable even for the range of my beta parameters. On the other hand the p(alpha,beta) prop to (alpha + beta)(-5/2) proposal kept kappa way too low for the data at hand. This seems informative for a different range of kappa and hence beta rather than uninformative. There I made a different variation.modelk <- '
parameters {
real kappa_log1;
real kappa2;
real <lower=10^6> kappa3;
}
transformed parameters {
real kappa1;
kappa1 <- exp(kappa_log1);
}
model {
kappa_log1 ~ uniform(0,18);
kappa2 ~ pareto(0.1,1.5);
kappa3 ~ pareto(10^6,1.5);
}
'
parametersk <- c('kappa1','kappa2','kappa3')
initsk <- function() {
list(
kappa_log1=runif(1,0,18)
, kappa2 = runif(1,.2,10)
, kappa3 = runif(1,2e6,1e7)
)
}
kfits <- stan(model_code = modelk,
pars=parametersk,
inits=initsk)
kfits
Inference for Stan model: modelk.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50%
kappa1 1994621.64 550416.10 7754182.44 1.57 82.85 4772.51
kappa2 0.24 0.02 0.27 0.10 0.12 0.16
kappa3 2602295.43 296845.16 4314974.09 1021790.77 1209898.43 1558773.41
lp__ -18.81 0.16 1.65 -23.11 -19.64 -18.41
75% 97.5% n_eff Rhat
kappa1 160223.83 25951765.02 198 1.03
kappa2 0.25 0.89 123 1.03
kappa3 2490623.45 11991112.38 211 1.02
lp__ -17.59 -16.83 112 1.03
Samples were drawn using NUTS(diag_e) at Sat Aug 30 14:12:51 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Effect in the whole model
Prior: Uniform on log scale
This works reasonable well. With significantly less samples than last week there are reasonable estimates. The only thing worrying is that incidence[1] (it was named betamean[1] in the previous models, but that was actually a bit of a misnomer) is a bit too high.model1 <- '
data {
int<lower=0> n;
int<lower=0> nregion;
int count[n];
int<lower=1,upper=nregion> Region[n];
int Population[n];
real scale;
}
parameters {
vector <lower=0,upper=1> [nregion] theta;
vector [nregion] kappa_log;
vector <lower=0,upper=1> [n] p1;
real <lower=0,upper=1> thetam;
real <lower=0> thetasd;
}
transformed parameters {
vector [nregion] kappa;
vector [nregion] a;
vector [nregion] b;
kappa <- exp(kappa_log);
for (i in 1:nregion) {
a[i] <- kappa[i] * theta[i];
b[i] <- kappa[i] * (1 - theta[i]);
}
}
model {
kappa_log ~ uniform(12,18);
for (i in 1:n) {
p1[i] ~ beta(a[Region[i]],b[Region[i]]);
}
count ~ binomial(Population,p1);
theta ~ normal(thetam,thetasd);
thetam ~ uniform(0,1);
thetasd ~ uniform(0,.1);
}
generated quantities {
vector [n] pp;
vector [nregion] incidence;
real meaninc;
incidence <- scale*theta;
pp <- p1 * scale;
meaninc <- scale*thetam;
}
'
inits1 <- function()
list(theta=rnorm(datain$nregion,1e-6,1e-7),
thetam = rnorm(1,1e-6,1e-7),
thetasd = rnorm(1,1e-6,1e-7),
kappa_log=runif(datain$nregion,15,18),
p1=rnorm(datain$n,1e-7,1e-8))
parameters1 <- c('a','b','incidence','kappa','kappa_log','meaninc')
fits1 <- stan(model_code = model1,
data = datain,
pars=parameters1 ,
init=inits1)
print(fits1,probs=NA)
Inference for Stan model: model1.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd n_eff Rhat
a[1] 7.95 1.32 13.67 NA 107 1.04
a[2] 9.16 1.43 16.76 NA 138 1.02
a[3] 50.52 2.61 32.13 NA 151 1.03
a[4] 35.68 2.25 32.61 NA 210 1.01
a[5] 9.10 1.32 13.99 NA 113 1.04
a[6] 11.99 1.48 17.40 NA 139 1.01
a[7] 13.64 1.83 20.56 NA 126 1.01
a[8] 14.19 2.56 20.35 NA 63 1.05
a[9] 12.14 2.51 21.39 NA 73 1.05
b[1] 5956080.76 1213799.11 10903417.54 NA 81 1.04
b[2] 5293475.42 844235.36 9984631.29 NA 140 1.01
b[3] 28376424.43 1399820.31 17900739.23 NA 164 1.03
b[4] 19757543.83 1195288.60 17665183.92 NA 218 1.02
b[5] 5493903.33 897750.46 9136674.30 NA 104 1.05
b[6] 6449567.62 787646.39 9190268.32 NA 136 1.01
b[7] 8500594.37 1172609.55 12531293.24 NA 114 1.01
b[8] 8901106.72 1698740.11 12644831.14 NA 55 1.06
b[9] 7267575.15 1406258.66 12121859.82 NA 74 1.05
incidence[1] 0.10 0.00 0.02 NA 84 1.07
incidence[2] 0.11 0.00 0.02 NA 177 1.02
incidence[3] 0.11 0.00 0.01 NA 153 1.01
incidence[4] 0.11 0.00 0.01 NA 178 1.02
incidence[5] 0.11 0.00 0.02 NA 216 1.01
incidence[6] 0.12 0.00 0.02 NA 121 1.03
incidence[7] 0.10 0.00 0.02 NA 161 1.02
incidence[8] 0.10 0.00 0.02 NA 81 1.06
incidence[9] 0.10 0.00 0.02 NA 119 1.04
kappa[1] 5956088.71 1213800.44 10903430.55 NA 81 1.04
kappa[2] 5293484.58 844236.78 9984647.90 NA 140 1.01
kappa[3] 28376474.95 1399822.89 17900770.95 NA 164 1.03
kappa[4] 19757579.51 1195290.81 17665216.05 NA 218 1.02
kappa[5] 5493912.43 897751.76 9136688.06 NA 104 1.05
kappa[6] 6449579.61 787647.86 9190285.60 NA 136 1.01
kappa[7] 8500608.01 1172611.40 12531313.60 NA 114 1.01
kappa[8] 8901120.91 1698742.67 12644851.22 NA 55 1.06
kappa[9] 7267587.30 1406261.16 12121881.04 NA 74 1.05
kappa_log[1] 14.56 0.18 1.40 NA 60 1.08
kappa_log[2] 14.37 0.13 1.42 NA 122 1.01
kappa_log[3] 16.88 0.07 0.87 NA 167 1.03
kappa_log[4] 16.28 0.08 1.15 NA 210 1.01
kappa_log[5] 14.76 0.11 1.17 NA 119 1.03
kappa_log[6] 15.10 0.08 1.03 NA 153 1.01
kappa_log[7] 15.08 0.17 1.36 NA 61 1.03
kappa_log[8] 15.12 0.22 1.38 NA 41 1.08
kappa_log[9] 14.76 0.17 1.45 NA 70 1.04
meaninc 0.11 0.00 0.01 NA 107 1.02
lp__ -7659.18 1.14 11.63 NA 105 1.01
Samples were drawn using NUTS(diag_e) at Sat Aug 30 14:24:52 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Prior: p(alpha,beta) prop to (alpha + beta)(-5/2)
This model did quite well. I wish all my models worked as well.
model2 <- '
data {
int<lower=0> n;
int<lower=0> nregion;
int count[n];
int<lower=1,upper=nregion> Region[n];
int Population[n];
real scale;
}
parameters {
vector <lower=0,upper=1> [nregion] theta;
vector <lower=1e6> [nregion] kappa;
vector <lower=0,upper=1> [n] p1;
real <lower=0,upper=1> thetam;
real <lower=0> thetasd;
}
transformed parameters {
vector [nregion] a;
vector [nregion] b;
for (i in 1:nregion) {
a[i] <- kappa[i] * theta[i];
b[i] <- kappa[i] * (1 - theta[i]);
}
}
model {
kappa ~ pareto(10^6,1.5);
for (i in 1:n) {
p1[i] ~ beta(a[Region[i]],b[Region[i]]);
}
count ~ binomial(Population,p1);
theta ~ normal(thetam,thetasd);
thetam ~ uniform(0,1);
thetasd ~ uniform(0,.1);
}
generated quantities {
vector [n] pp;
vector [nregion] incidence;
real meaninc;
incidence <- scale*theta;
pp <- p1 * scale;
meaninc <- scale*thetam;
}
'
inits2 <- function()
list(theta=rnorm(datain$nregion,1e-6,1e-7),
thetam = rnorm(1,1e-6,1e-7),
thetasd = rnorm(1,1e-6,1e-7),
kappa=runif(datain$nregion,1e6,1e7),
p1=rnorm(datain$n,1e-7,1e-8))
parameters2 <- c('a','b','incidence','kappa','meaninc')
fits2 <- stan(model_code = model2,
data = datain,
pars=parameters2,
init=inits2)
print(fits2,probs=NA)
Inference for Stan model: model2.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd n_eff Rhat
a[1] 3.03 0.06 2.54 NA 1851 1
a[2] 3.36 0.05 2.74 NA 2746 1
a[3] 22.28 1.39 39.52 NA 807 1
a[4] 7.27 0.27 12.39 NA 2173 1
a[5] 3.61 0.08 3.50 NA 2025 1
a[6] 4.38 0.09 3.70 NA 1743 1
a[7] 3.58 0.13 4.14 NA 957 1
a[8] 3.54 0.11 3.60 NA 1125 1
a[9] 3.20 0.10 3.81 NA 1346 1
b[1] 2132903.53 47586.20 2085059.70 NA 1920 1
b[2] 1804395.74 29404.95 1540741.77 NA 2745 1
b[3] 12482567.37 782922.81 22226255.17 NA 806 1
b[4] 4057579.18 146369.50 6818624.58 NA 2170 1
b[5] 2080275.79 44577.27 2020891.35 NA 2055 1
b[6] 2368569.05 48762.30 1922950.01 NA 1555 1
b[7] 2281906.54 81326.60 2583506.11 NA 1009 1
b[8] 2319737.52 71109.80 2408437.85 NA 1147 1
b[9] 2092476.48 61180.23 2313166.69 NA 1430 1
incidence[1] 0.09 0.00 0.02 NA 746 1
incidence[2] 0.12 0.00 0.02 NA 1173 1
incidence[3] 0.11 0.00 0.02 NA 1209 1
incidence[4] 0.11 0.00 0.02 NA 1387 1
incidence[5] 0.11 0.00 0.02 NA 1406 1
incidence[6] 0.12 0.00 0.02 NA 1248 1
incidence[7] 0.10 0.00 0.02 NA 999 1
incidence[8] 0.10 0.00 0.02 NA 1063 1
incidence[9] 0.10 0.00 0.02 NA 935 1
kappa[1] 2132906.56 47586.25 2085062.04 NA 1920 1
kappa[2] 1804399.10 29405.01 1540744.41 NA 2745 1
kappa[3] 12482589.65 782924.19 22226294.41 NA 806 1
kappa[4] 4057586.45 146369.76 6818636.82 NA 2170 1
kappa[5] 2080279.40 44577.35 2020894.78 NA 2055 1
kappa[6] 2368573.43 48762.38 1922953.62 NA 1555 1
kappa[7] 2281910.12 81326.73 2583510.17 NA 1009 1
kappa[8] 2319741.06 71109.91 2408441.36 NA 1147 1
kappa[9] 2092479.69 61180.33 2313170.41 NA 1430 1
meaninc 0.11 0.00 0.01 NA 901 1
lp__ -7881.32 0.42 8.66 NA 428 1
Samples were drawn using NUTS(diag_e) at Sat Aug 30 14:34:49 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).