Saturday, April 25, 2015

Unemployment of Europe in 2014 by NUTS 2 region

During the Christmas break I worked on some code to show unemployment by NUTS 2 region. At that point no 2014 data was available. When I noticed the 214 was available I dug up the code and plotted again.

Data and Code

As written, the code was made beginning this year. At that point it seemed Eurostat's directories had all moved and hence SmarterPoland was not used. Consequently, it is not used here either. However it seems to be much more easily than doing the data extraction work myself.
The mapping is created based on code found at rpubs.com: Mapping Data from Eurostat using R. The main change I made was using as continuous a scale as possible. I wanted to see details in southern Germany, but also let the scale show details in the worst regions. It appears not Greece but southern Spain has the worst unemployment. But since Spain has some better regions overall as country it seems a bit better off than Greece. The color then, are a trade off between red is bad, green is good and an ordering where colors are unique. Hence orange sits between yellow and blue to avoid yellow and blue mixing to deliver green again. I have the nagging feeling that it will be horrific for colorblind people.
In addition, I removed France's overseas parts and added a legend.
library("rgdal")
library("RColorBrewer")
library("classInt")
#library("SmarterPoland")
library(fields)

# create a new empty object called 'temp' in which to store a zip file
# containing boundary data
# temp <- tempfile(fileext = ".zip")
# now download the zip file from its location on the Eurostat website and
# put it into the temp object
# new Eurostat website
# old: http://epp.eurostat.ec.europa.eu
# new: http://ec.europa.eu/eurostat

# download.file(
# "http://ec.europa.eu/eurostat/cache/GISCO/geodatafiles/NUTS_2010_60M_SH.zip",temp)
# now unzip the boundary data
# unzip(temp)

EU_NUTS <- readOGR(dsn = "./NUTS_2010_60M_SH/data", layer = "NUTS_RG_60M_2010")
ToRemove <- EU_NUTS@data$STAT_LEVL!=2 | grepl('FR9',EU_NUTS@data$NUTS_ID)
EUN <- EU_NUTS[!ToRemove,]

## OGR data source with driver: ESRI Shapefile 
## Source: "./NUTS_2010_60M_SH/data", layer: "NUTS_RG_60M_2010"
## with 1920 features and 4 fields
## Feature type: wkbPolygon with 2 dimensions
#plot(EU_NUTS)

myunempl <- read.csv('lfst_r_lfu3rt.csv',na=':',skip=10)

#EU_NUTS <- spTransform(EU_NUTS, CRS("+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs"))

EUN@data = data.frame(EUN@data[,1:4], myunempl[
        match(EUN@data[, "NUTS_ID"],myunempl[, "GEO.TIME"]),   ])

EUN <- EUN[!is.na(EUN@data$X2014),]

plot <- plot(EUN, col = rgb(colorRamp(
            c('darkGreen','springgreen','Yellow','Orange',
                'Dark Blue','Light Blue','Purple','Red'))
 ((EUN@data$X2014-min(EUN@data$X2014))/
     (max(EUN@data$X2014)-min(EUN@data$X2014)))/255), 
 axes = FALSE, border = NA)    

image.plot(add=TRUE,
    zlim=c(min(EUN@data$X2014),max(EUN@data$X2014)),
    col=rgb(colorRamp(
        c('darkGreen','springgreen','Yellow','Orange',
            'Dark Blue','Light Blue','Purple','Red'))(seq(0,1,length.out=50))/255),
    legend.only=TRUE,
    smallplot=c(.02,.06,.15,.85))

Sunday, April 19, 2015

PK analysis: Theoph again

This week I wanted to repeat the Theoph PK analysis of two weeks ago in Stan. It also suddenly dawned on me. For a non-linear model in classical statistics we think it normal to provide good initializations. In contrast, for those MCMC samples I remember reading to start with highly dispersed initial values. So, for non-linear models in Bayesian context, we got two contrasting views on initialization. From my experience, I probably should abandon those highly dispersed initializations as they do not lead to convergence.

Model and data

The data is the same as two weeks ago. The only difference is that I realized that a dose in mg/kg should be converted to total dose in order to get a sensible idea of clearance (CL). The model then, is a slightly modified version, and runs through Stan.
The problem with this model is not so much the formulation, but rather, as indicated, the initialization.  En even then, it seems to be getting unacceptable samples at a high rate. I don't think this is a critique of the model. The estimates seem quite reasonable. 

library(MEMSS)
library(ggplot2)

library(rstan)

Theoph$dose <- Theoph$Dose*Theoph$Wt
datain <-  list(
    Time=Theoph$Time,
    conc=Theoph$conc,
    n=nrow(Theoph),
    subject =c(1:nlevels(Theoph$Subject))[Theoph$Subject],
    nsubject = nlevels(Theoph$Subject),
    dose=unique(subset(Theoph,, c(Subject,dose)))$dose
)

smodel <- '
data {
    int <lower=0> n;
    int <lower=0> nsubject;
    int <lower=1,upper=nsubject>  subject[n];
    real  dose[nsubject];
    real  Time[n];
    real  conc[n];
}
parameters {
    real <lower=0> sdr;
    real <lower=0> kas[nsubject];
    real <lower=0> kes[nsubject];
    real <lower=0> CLs[nsubject];
    real lke;
    real lka;
    real <lower=0> CL;
    real kesd;
    real kasd;
    real CLsd;
}
transformed parameters {
    real pred[n];
    real <lower=0> c0star[nsubject];
    for (i in 1:nsubject) {
       c0star[i] <- dose[i]*((kes[i]*kas[i])/CLs[i])/
      (kas[i]-kes[i]);
    }
    for (i in 1:n) {
       pred[i] <- c0star[subject[i]]*
          (exp(-kes[subject[i]]*Time[i])-exp(-kas[subject[i]]*Time[i]));
    }
}
model {
    conc ~ normal(pred,sdr);
    kes ~ lognormal(lke,kesd);
    kas ~ lognormal(lka,kasd);
    lke ~ uniform(-3,3);
    lka ~ uniform(-3,3);
    CL ~ uniform(0.01,300);
    CLs ~ normal(CL,CLsd);
    kesd ~ uniform(0.01,2);
    kasd ~ uniform(0.01,2);
    CLsd ~ uniform(0.01,10);
}
generated quantities {
    real Ka;
    real Ke;
    Ka <- exp(lka);
    Ke <- exp(lke);
}'
# (kes[i]<kas[i])*
fstan <- stan(model_code = smodel,
    data = datain,
    pars=c('CL','kes','kas','CLs',
        'c0star','Ka','Ke'),
    init= function() {
      list(lka = rnorm(1,log(2),.3),
          kas=   rnorm(12,2,.2),
          lke=   rnorm(1,log(.08),.01),
          kes=   rnorm(12,.08,.01),
          CLs = rnorm(12,1,.1),
          CL = rnorm(1,1,.1),
          kesd=runif(1,0.04,0.1),
          kasd=runif(1,0.2,2),
          sdr= runif(1,1,3),
          CLsd=runif(1,.1,1)
      )})
fstan

Inference for Stan model: smodel.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

            mean se_mean    sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
CL          2.80    0.01  0.20  2.41  2.66  2.80  2.92  3.23   990 1.01
kes[1]      0.08    0.00  0.01  0.06  0.07  0.08  0.09  0.09   324 1.01
kes[2]      0.08    0.00  0.01  0.07  0.08  0.09  0.09  0.10   904 1.01
kes[3]      0.09    0.00  0.01  0.07  0.08  0.09  0.09  0.10  1314 1.00
kes[4]      0.09    0.00  0.01  0.08  0.09  0.09  0.10  0.12   675 1.01
kes[5]      0.09    0.00  0.01  0.08  0.08  0.09  0.09  0.11   995 1.00
kes[6]      0.09    0.00  0.01  0.07  0.08  0.09  0.09  0.10   914 1.00
kes[7]      0.09    0.00  0.01  0.07  0.08  0.09  0.09  0.10  1231 1.00
kes[8]      0.09    0.00  0.01  0.07  0.08  0.09  0.09  0.11  1326 1.01
kes[9]      0.08    0.00  0.01  0.07  0.08  0.08  0.09  0.10  1085 1.01
kes[10]     0.09    0.00  0.01  0.07  0.08  0.09  0.09  0.11  1129 1.00
kes[11]     0.09    0.00  0.01  0.07  0.08  0.09  0.09  0.11  1044 1.01
kes[12]     0.09    0.00  0.01  0.07  0.08  0.09  0.09  0.10   835 1.01
kas[1]      1.48    0.01  0.22  1.08  1.32  1.46  1.61  1.94   338 1.01
kas[2]      0.66    0.00  0.09  0.51  0.60  0.66  0.72  0.86  1262 1.00
kas[3]      4.00    0.03  0.90  2.67  3.40  3.87  4.41  6.16  1214 1.00
kas[4]      0.95    0.00  0.12  0.73  0.87  0.94  1.03  1.21  1476 1.00
kas[5]      2.12    0.01  0.32  1.56  1.90  2.10  2.34  2.78  1021 1.01
kas[6]      2.41    0.02  0.47  1.58  2.08  2.37  2.67  3.42   570 1.00
kas[7]      1.21    0.00  0.17  0.90  1.10  1.20  1.31  1.58  1277 1.00
kas[8]      1.50    0.01  0.18  1.17  1.39  1.50  1.61  1.87   739 1.01
kas[9]      1.27    0.01  0.24  0.88  1.10  1.26  1.41  1.81  1108 1.00
kas[10]     0.80    0.00  0.13  0.57  0.71  0.79  0.88  1.10  1264 1.00
kas[11]     1.45    0.01  0.24  1.03  1.27  1.43  1.60  1.98  1243 1.01
kas[12]     8.14    0.13  4.24  4.31  5.79  7.22  8.84 18.78  1001 1.00
CLs[1]      2.07    0.01  0.17  1.71  1.97  2.09  2.17  2.36   462 1.01
CLs[2]      2.07    0.00  0.13  1.80  1.98  2.06  2.15  2.34  1187 1.00
CLs[3]      3.38    0.01  0.25  2.91  3.21  3.38  3.54  3.91   860 1.00
CLs[4]      2.37    0.01  0.16  2.09  2.26  2.35  2.46  2.74   635 1.00
CLs[5]      2.99    0.01  0.22  2.61  2.84  2.97  3.11  3.46   819 1.00
CLs[6]      2.88    0.01  0.21  2.48  2.75  2.87  3.01  3.32   785 1.00
CLs[7]      2.73    0.01  0.19  2.37  2.60  2.73  2.84  3.11  1020 1.01
CLs[8]      2.39    0.01  0.16  2.09  2.28  2.38  2.49  2.73   680 1.02
CLs[9]      3.60    0.01  0.30  3.04  3.40  3.59  3.78  4.23  1085 1.00
CLs[10]     3.07    0.01  0.23  2.62  2.92  3.06  3.21  3.57   810 1.01
CLs[11]     3.15    0.01  0.23  2.70  2.99  3.15  3.30  3.62   944 1.01
CLs[12]     2.82    0.01  0.21  2.42  2.69  2.81  2.95  3.27   889 1.00
c0star[1]  12.85    0.07  0.84 11.26 12.26 12.84 13.45 14.45   148 1.01
c0star[2]  15.18    0.04  1.36 12.74 14.28 15.15 15.93 18.21  1225 1.00
c0star[3]   8.43    0.01  0.49  7.52  8.10  8.40  8.74  9.49  1415 1.01
c0star[4]  13.76    0.04  1.15 11.92 13.02 13.58 14.30 16.57   873 1.00
c0star[5]   9.96    0.02  0.62  8.78  9.57  9.91 10.32 11.32  1128 1.00
c0star[6]   9.87    0.03  0.63  8.73  9.43  9.85 10.27 11.25   603 1.00
c0star[7]  10.99    0.02  0.76  9.64 10.50 10.93 11.45 12.57  1373 1.00
c0star[8]  12.54    0.03  0.74 11.20 12.02 12.50 12.99 14.11   720 1.00
c0star[9]   8.08    0.02  0.68  6.82  7.62  8.06  8.46  9.45   793 1.01
c0star[10] 10.24    0.02  1.01  8.53  9.57 10.19 10.77 12.61  1730 1.00
c0star[11]  9.37    0.02  0.69  8.13  8.92  9.32  9.76 10.87  1187 1.00
c0star[12]  8.40    0.01  0.40  7.63  8.14  8.39  8.65  9.22  1757 1.00
Ka          1.63    0.02  0.42  0.95  1.34  1.58  1.87  2.57   516 1.01
Ke          0.09    0.00  0.01  0.08  0.08  0.09  0.09  0.10   516 1.02
lp__       17.34    1.53 10.27  0.41 10.66 16.13 22.63 45.26    45 1.07

Samples were drawn using NUTS(diag_e) at Sun Apr 19 09:57:07 2015.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

Plot

This is just what I used last week, adapted for Stan. In addition, to get correct scale, mean dose in mg has been used in the calculations.

stansampls <- extract(fstan,c('CL','Ke','Ka'))
Time <- seq(1,max(datain$Time))

la2 <- sapply(1:nrow(stansampls$CL),
    function(i) {
      CL <- stansampls$CL[i]
      Ka <- stansampls$Ka[i]
      Ke <- stansampls$Ke[i]
      c0star <- mean(datain$dose)*((Ke*Ka)/CL)/(Ka-Ke)
      y<- c0star* (exp(-Ke*Time)-exp(-Ka*Time))
    })

res2 <- data.frame(
    Time=Time,
    Conc=apply(la2,1,mean),
    C05=apply(la2,1,FUN=function(x) quantile(x,0.05)),
    C95=apply(la2,1,FUN=function(x) quantile(x,0.95)))

ggplot(res2, aes(x=Time)) +
    geom_line(aes(y=Conc))+
    scale_y_log10('theophylline (mg/L)')+
    geom_line(aes(y=conc,x=Time,col=Subject),alpha=1,data=Theoph)+
    geom_ribbon(aes(ymin=C05, ymax=C95),alpha=.2)+
    theme(legend.position='bottom')

Sunday, April 12, 2015

Cefamandole: PK after IV with six subjects in JAGS

After last week's post on Theoph I noticed the MEMSS library contained more PK data sets. The Cefamandole data is an IV data set with six subjects. It is somewhat challenging, since there seem to be several elimination rates.

Data

Data are displayed below. Notice that the lines seem somewhat curved, there is clearly more going on that just simple elimination.

Models

The first model fits concentration as function of two eliminations. The problem here is that the eliminations need to be ordered. All faster eliminations should have one prior, all slower another. After much fiddling around I found that I cannot force the ordering at subject level with priors or limits. Hence I chose a different way out. The model gets an extra component, it will predict 0 if the parameters are incorrectly ordered. This indeed gave the desired result in terms of model behavior.

The model itself though, was less to my linking. It goes way too low at the later times. Upon consideration, since the model assumes homoscedastic error it does not matter to the likelihoood if the fit at the lower end is not so good. After all, an error of 1 is nothing compared to errors of 10 or 20 at the upper end of the scale. In addition it can be argued that the error is not homoscedastic. If it were homoscedastic, the underlying data would show way more variation at the lower end of the scale. It would need confirmation from the measuring team, but for now I will assume proportional error.

Model 2: Proportional error

The proportional error will be implemented by using a log normal distribution for concentration. This was pretty simple to code. The log of the expected concentration is used in combination with dlnorm(). The only issue remaining is that expectation 0 for incorrectly ordered parameters has to be shifted a bit, log(0) is a bit of an issue.
This model looks pretty decent. There are still some things to do, similar to last week, translate the parameters c1star and c2star into the parameters which can be interpreted from PK perspective.

Model outputs

Model 1

Inference for Bugs model at "C:/Users/Kees/AppData/Local/Temp/RtmpSe5nAn/modele207c187661.txt", fit using jags,
 4 chains, each with 10000 iterations (first 5000 discarded), n.thin = 10
 n.sims = 2000 iterations saved
          mu.vect sd.vect    2.5%     25%     50%     75%    97.5%  Rhat n.eff
C[1]       91.473  24.508  51.690  76.444  88.708 103.600  144.987 1.016   160
C[2]      324.947 132.978 147.584 253.433 304.913 366.175  626.711 1.007   460
Ke[1]       0.020   0.002   0.016   0.018   0.019   0.021    0.024 1.045    76
Ke[2]       0.141   0.192   0.081   0.116   0.132   0.147    0.202 1.072   180
c1star[1]  58.783   8.381  43.892  53.222  58.514  63.406   76.601 1.047    64
c1star[2]  59.841  18.962  32.887  47.487  56.324  67.606  108.087 1.048    72
c1star[3]  92.337  10.569  70.647  85.720  92.805  99.469  112.338 1.041    74
c1star[4]  89.554  12.223  67.975  81.086  88.831  96.932  116.699 1.027   110
c1star[5] 146.419  21.504 106.395 132.153 145.199 159.149  193.558 1.032    84
c1star[6] 121.228  14.872  87.712 112.706 122.379 130.819  148.106 1.092    42
c2star[1] 426.241 107.777 278.572 355.121 406.845 471.583  703.121 1.038    92
c2star[2] 201.118  35.861 149.478 179.179 197.081 217.084  278.248 1.033   830
c2star[3] 489.294 218.162 268.493 349.208 419.178 537.325 1125.062 1.088    42
c2star[4] 449.402  68.164 340.627 402.410 440.666 487.652  603.535 1.038    75
c2star[5] 402.518  43.391 335.760 373.724 397.703 423.902  498.253 1.017   380
c2star[6] 140.893  46.737  79.548 114.343 133.240 157.967  237.080 1.024   650
ke1[1]      0.020   0.002   0.016   0.018   0.020   0.021    0.025 1.042    74
ke1[2]      0.021   0.004   0.016   0.019   0.020   0.023    0.032 1.060    57
ke1[3]      0.018   0.002   0.014   0.017   0.018   0.019    0.022 1.031    99
ke1[4]      0.020   0.002   0.017   0.019   0.020   0.021    0.025 1.023   120
ke1[5]      0.020   0.002   0.016   0.019   0.020   0.021    0.024 1.036    74
ke1[6]      0.019   0.002   0.015   0.018   0.019   0.020    0.022 1.071    46
ke2[1]      0.167   0.025   0.128   0.150   0.164   0.181    0.225 1.050    64
ke2[2]      0.101   0.026   0.067   0.085   0.096   0.111    0.164 1.050    91
ke2[3]      0.181   0.042   0.122   0.151   0.173   0.200    0.283 1.076    43
ke2[4]      0.143   0.019   0.110   0.130   0.141   0.155    0.181 1.040    74
ke2[5]      0.112   0.017   0.086   0.100   0.110   0.121    0.151 1.027   130
ke2[6]      0.118   0.038   0.062   0.093   0.114   0.137    0.205 1.051    61
deviance  464.708   7.996 451.264 458.919 464.029 469.608  482.206 1.006   550

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 31.8 and DIC = 496.6
DIC is an estimate of expected predictive error (lower deviance is better).

Model 2

Inference for Bugs model at "C:/Users/Kees/AppData/Local/Temp/RtmpSe5nAn/modele2045997fac.txt", fit using jags,
 4 chains, each with 10000 iterations (first 5000 discarded), n.thin = 10
 n.sims = 2000 iterations saved
          mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
C[1]       23.273  10.942   8.344  16.933  21.737  27.259  48.399 1.004   710
C[2]      159.545  32.361 106.916 141.343 157.165 172.618 230.957 1.004  1200
Ke[1]       0.010   0.001   0.009   0.010   0.010   0.011   0.012 1.010   270
Ke[2]       0.040   0.005   0.032   0.036   0.039   0.042   0.050 1.006   660
c1star[1]  14.141   3.164   8.481  11.970  13.923  16.049  20.826 1.012   230
c1star[2]  13.177   4.777   6.178   9.622  12.276  16.064  24.394 1.020   210
c1star[3]  34.030   6.876  20.963  29.102  33.778  38.458  47.854 1.006   450
c1star[4]  12.399   4.038   6.525   9.482  11.712  14.524  22.656 1.019   130
c1star[5]  40.359   8.756  24.398  34.248  40.002  45.836  58.609 1.007   360
c1star[6]  38.415   9.194  21.683  32.043  37.885  44.595  57.469 1.006   440
c2star[1] 123.292  17.872  93.250 110.418 121.625 134.114 164.930 1.000  2000
c2star[2] 134.174  18.465 103.312 121.677 132.580 145.038 176.803 1.004   680
c2star[3] 138.541  24.539  97.522 122.254 135.654 151.958 195.143 1.000  2000
c2star[4] 184.399  20.868 148.465 170.310 183.027 197.048 229.782 1.002  1500
c2star[5] 247.359  38.113 178.403 222.054 244.317 269.412 330.096 1.003  2000
c2star[6] 150.684  22.359 111.345 135.290 149.545 164.626 198.568 1.002  2000
ke1[1]      0.010   0.001   0.008   0.010   0.010   0.011   0.012 1.013   210
ke1[2]      0.012   0.001   0.009   0.011   0.011   0.012   0.014 1.022   180
ke1[3]      0.010   0.001   0.008   0.009   0.010   0.010   0.011 1.005   540
ke1[4]      0.011   0.001   0.009   0.010   0.010   0.011   0.013 1.020   130
ke1[5]      0.010   0.001   0.008   0.010   0.010   0.011   0.012 1.007   350
ke1[6]      0.010   0.001   0.009   0.010   0.010   0.011   0.012 1.005   540
ke2[1]      0.042   0.005   0.034   0.038   0.041   0.044   0.055 1.004   690
ke2[2]      0.041   0.006   0.032   0.037   0.040   0.044   0.056 1.013   230
ke2[3]      0.042   0.008   0.032   0.037   0.040   0.045   0.062 1.006   440
ke2[4]      0.037   0.003   0.031   0.034   0.036   0.038   0.044 1.008   320
ke2[5]      0.040   0.005   0.032   0.037   0.040   0.043   0.053 1.003  1100
ke2[6]      0.036   0.006   0.027   0.033   0.036   0.040   0.048 1.003  1200
deviance  372.564   7.382 359.811 367.354 372.140 376.777 388.833 1.001  2000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 27.3 and DIC = 399.8
DIC is an estimate of expected predictive error (lower deviance is better).

Code used

library(MEMSS)
library(ggplot2)
library(R2jags)
png('Cefamandole.png')
qplot(y=conc,x=Time,col=Subject,data=Cefamandole) +
    scale_y_log10('cefamandole (mcg/ml)')+
    geom_line()+
    theme(legend.position='bottom')
dev.off()

library(R2jags)
datain <-  list(
    time=Cefamandole$Time,
    conc=Cefamandole$conc,
    n=nrow(Cefamandole),
    subject =c(1:nlevels(Cefamandole$Subject))[Cefamandole$Subject],
    nsubject = nlevels(Cefamandole$Subject)
)

model1 <- function() {
  tau <- 1/pow(sigma,2)
  sigma ~ dunif(0,100)
  for (i in 1:n) {
    pred[i] <- (ke1[subject[i]]<ke2[subject[i]])*
        (c1star[subject[i]]*exp(-ke1[subject[i]]*time[i])+
          c2star[subject[i]]*exp(-ke2[subject[i]]*time[i]))
    conc[i] ~ dnorm(pred[i],tau)
  }
  
  for(i in 1:nsubject) {
    ke1[i] ~ dlnorm(ke[1],kemtau[1])
    ke2[i] ~ dlnorm(ke[2],kemtau[2]) 
    c1star[i] ~ dlnorm(cm[1],ctau[1])
    c2star[i] ~ dlnorm(cm[2],ctau[2])
  }
  for (i in 1:2) {
    kem[i] ~ dunif(-10,10) 
    kemtau[i] <- 1/pow(kesd[i],2) 
    kesd[i] ~ dunif(0,10)
    Ke[i] <- exp(ke[i])
    cm[i] ~ dunif(-10,20)
    ctau[i] <- 1/pow(csd[i],2)
    csd[i] ~ dunif(0,100)
    C[i] <- exp(cm[i])    
  }
  ke <- sort(kem)


parameters <- c('Ke','ke1','ke2','c1star','c2star','C')
inits <- function() {
  list(ke1 = rnorm(6,0.13,.03),
      ke2=  rnorm(6,0.0180,.005), 
      kem=  log(c(rnorm(1,0.13,.03),rnorm(1,0.0180,.005))),
      kesd =runif(2,.1,.4),
      cm = log(rnorm(2,25,5)),
      c1star=rnorm(6,25,3),
      c2star=rnorm(6,25,3)
  )}
jagsfit1 <- jags(datain, model=model1, 
    inits=inits,
    parameters=parameters,progress.bar="gui",
    n.iter=10000,
    n.chains=4,n.thin=10)
jagsfit1
#plot(jagsfit1)
#traceplot(jagsfit1)

Time <- seq(0,max(Cefamandole$Time))
la1 <- sapply(1:nrow(jagsfit1$BUGSoutput$sims.matrix),
    function(i) {
      C1 <- jagsfit1$BUGSoutput$sims.matrix[i,'C[1]']
      C2 <- jagsfit1$BUGSoutput$sims.matrix[i,'C[2]']
      k1 <- jagsfit1$BUGSoutput$sims.matrix[i,'Ke[1]']
      k2 <- jagsfit1$BUGSoutput$sims.matrix[i,'Ke[2]']
      y=C1*exp(-k1*Time)+C2*exp(-k2*Time)
    })
res1 <- data.frame(
    Time=Time,
    Conc=apply(la1,1,mean),
    C05=apply(la1,1,FUN=function(x) quantile(x,0.05)),
    C95=apply(la1,1,FUN=function(x) quantile(x,0.95)))
png('cef1.png')
ggplot(res1, aes(x=Time)) +
    geom_line(aes(y=Conc))+
    scale_y_log10('cefamandole (mcg/ml)')+
    geom_line(aes(y=conc,x=Time,col=Subject),alpha=1,data=Cefamandole)+
    geom_ribbon(aes(ymin=C05, ymax=C95),alpha=.2)+
    theme(legend.position='bottom')
dev.off()
#########
#
# model 2: proportional error
#
#########
model2 <- function() {
  tau <- 1/pow(sigma,2)
  sigma ~ dunif(0,100)
  for (i in 1:n) {
    pred[i] <- log(
        (ke1[subject[i]]<ke2[subject[i]])*
            (c1star[subject[i]]*exp(-ke1[subject[i]]*time[i])+
              c2star[subject[i]]*exp(-ke2[subject[i]]*time[i]))
            +.001*(ke1[subject[i]]>ke2[subject[i]]))
    conc[i] ~ dlnorm(pred[i],tau)
  }
  
  
  for(i in 1:nsubject) {
    ke1[i] ~ dlnorm(ke[1],kemtau[1])
    ke2[i] ~ dlnorm(ke[2],kemtau[2]) 
    c1star[i] ~ dlnorm(cm[1],ctau[1])
    c2star[i] ~ dlnorm(cm[2],ctau[2])
  }
  for (i in 1:2) {
    kem[i] ~ dunif(-10,10) 
    kemtau[i] <- 1/pow(kesd[i],2) 
    kesd[i] ~ dunif(0,10)
    Ke[i] <- exp(ke[i])
    cm[i] ~ dunif(-10,20)
    ctau[i] <- 1/pow(csd[i],2)
    csd[i] ~ dunif(0,100)
    C[i] <- exp(cm[i])    
  }
  ke <- sort(kem)
  


parameters <- c('Ke','ke1','ke2','c1star','c2star','C')
inits <- function() {
  list(ke1 = rnorm(6,0.13,.03),
      ke2=  rnorm(6,0.0180,.005), 
      kem=  log(c(rnorm(1,0.13,.03),rnorm(1,0.0180,.005))),
      kesd =runif(2,.1,.4),
      cm = log(rnorm(2,25,5)),
      c1star=rnorm(6,25,3),
      c2star=rnorm(6,25,3)
  )}
jagsfit2 <- jags(datain, model=model2, 
    inits=inits,
    parameters=parameters,progress.bar="gui",
    n.iter=10000,
    n.chains=4,n.thin=10)
jagsfit2
la2 <- sapply(1:nrow(jagsfit2$BUGSoutput$sims.matrix),
    function(i) {
      C1 <- jagsfit2$BUGSoutput$sims.matrix[i,'C[1]']
      C2 <- jagsfit2$BUGSoutput$sims.matrix[i,'C[2]']
      k1 <- jagsfit2$BUGSoutput$sims.matrix[i,'Ke[1]']
      k2 <- jagsfit2$BUGSoutput$sims.matrix[i,'Ke[2]']
      y=C1*exp(-k1*Time)+C2*exp(-k2*Time)
    })


res2 <- data.frame(
    Time=Time,
    Conc=apply(la2,1,mean),
    C05=apply(la2,1,FUN=function(x) quantile(x,0.05)),
    C95=apply(la2,1,FUN=function(x) quantile(x,0.95)))

png('cef2.png')
ggplot(res2, aes(x=Time)) +
    geom_line(aes(y=Conc))+
    scale_y_log10('cefamandole (mcg/ml)')+
    geom_line(aes(y=conc,x=Time,col=Subject),alpha=1,data=Cefamandole)+
    geom_ribbon(aes(ymin=C05, ymax=C95),alpha=.2)+
    theme(legend.position='bottom')
dev.off()

Sunday, April 5, 2015

Hierarchical Two Compartimental PK Model

In this post I am running the Theoph dataset from MEMSS (Data sets and sample analyses from Pinheiro and Bates, "Mixed-effects Models in S and S-PLUS" (Springer, 2000)) in a JAGS model. To quote the MEMSS manual:
'Boeckmann, Sheiner and Beal (1994) report data from a study by Dr. Robert Upton of the kinetics of the anti-asthmatic drug theophylline. Twelve subjects were given oral doses of theophylline then serum concentrations were measured at 11 time points over the next 25 hours.
These data are analyzed in Davidian and Giltinan (1995) and Pinheiro and Bates (2000) using a two-compartment open pharmacokinetic model, for which a self-starting model function, SSfol, is available.'

Model

The model which is used in the example for the data is SSfol. Its value is:
Dose * exp(lKe+lKa-lCl) * (exp(-exp(lKe)*input) - exp(-exp(lKa)*input))
    / (exp(lKa) - exp(lKe))
With input the time variable. This looks very complex, but there seems to be a trick here. What is estimated is not Ke (elimination rate) but its natural logarithm. The same is true for Ka and CL. This is probably done to keep the parameters positive. So in terms of ordinary parameters:
Dose * (Ke+Ka)/CL * (exp(-Ke*input) - exp(-Ka*input)) / (Ka - Ke)
This looks much more simple.

Estimates using nls

The example code uses nls() to fit the model. Below the model is wrapped in an apply. In addition, the exponent is taken.
library(MEMSS)
la <- lapply(levels(Theoph$Subject), function(iSubject)
    {Theoph.i <- subset(Theoph, Subject == iSubject)
      fm1 <- nls(conc ~ SSfol(Dose, Time, lKe, lKa, lCl),
          data = Theoph.i)
      ee <- exp(coef(fm1))
      dd <- data.frame(Ke=ee[1],Ka=ee[2],CL=ee[3])
      row.names(dd) <- iSubject
      dd
    }
)
nlsres <- do.call(rbind,la)
nlsres
          Ke        Ka         CL
A 0.05395450 1.7774170 0.01992348
B 0.07396616 0.6955019 0.03244300
C 0.09812331 3.8490404 0.05724601
D 0.10557584 0.8328979 0.04199695
E 0.10166133 1.9426573 0.04476553
F 0.08142497 2.4535654 0.03955890
G 0.08746697 1.1714752 0.03739991
H 0.08843514 1.4715045 0.04360427
I 0.09952647 1.1637219 0.05113727
J 0.10224639 0.6797358 0.05159475
K 0.09195675 1.3755229 0.04646244

Model in JAGS

In JAGS there are more direct mechanisms to keeping parameters positive. However, there is one problem, an alternative set of parameters for subject A, Ka=0.053, Ke=1.77 does equally well. Combining MCMC samples from these two solutions just results in nonsense. To avoid this, I have placed the whole scaling in a parameter C0star, which is positive. In subsequent code individual CL and its hyper parameter are estimated from C0star. To keep this as a directed acyclic graph (DAG) additional variable z and parameters CLr are introduced. In addition, dose has been made a subject level variable, rather than associated with time points.

model1 <- function() {
  tau <- 1/pow(sigma,2)
  sigma ~ dunif(0,100)
  for (i in 1:n) {
    pred[i] <- c0star[subject[i]]*
        (exp(-ke[subject[i]]*time[i])-exp(-ka[subject[i]]*time[i]))
    conc[i] ~ dnorm(pred[i],tau)
  }
  for(i in 1:nsubject) {
    ka[i] ~ dlnorm(kam,katau) 
    ke[i] ~ dlnorm(kem,ketau) 
    c0star[i] ~ dunif(0,50)
# c0star=dose*((ke*ka)/CL)/(ka-ke)    
    CL[i] <- Dose[i]*ka[i]*ke[i]/((ka[i]-ke[i])*c0star[i])
    CLr[i] <- CL[i]-CLm
    z[i] ~ dnorm(CLr[i],CLtau)
  }
  kam ~ dunif(-10,10) 
  katau <- 1/pow(kasd,2) 
  kasd ~ dunif(0,100)
  Kam <- exp(kam)
#  ta0_5 <- log(2)/Kam
  
  kem ~ dunif(-10,10)
  ketau <- 1/pow(kesd,2)
  kesd ~ dunif(0,100)
  Kem <- exp(kem)
#  te0_5 <- log(2)/Kem
  
  CLm ~ dunif(0,100)
  CLtau <- 1/pow(CLsd,2)
  CLsd ~dunif(0,100)

The usual code for running the model. I have noticed quite some sensitivity to inits while writing this post. 
library(R2jags)
datain <-  list(
    time=Theoph$Time,
    conc=Theoph$conc,
    n=nrow(Theoph),
    subject =c(1:nlevels(Theoph$Subject))[Theoph$Subject],
    nsubject = nlevels(Theoph$Subject),
    z=rep(0, nlevels(Theoph$Subject)),
    Dose=unique(subset(Theoph,, c(Subject,Dose)))$Dose
)
parameters <- c('Kem','Kam','ke','ka','CL','CLm')
inits <- function() {
  list(kam = rnorm(1,log(1),.3),
      ka=   rnorm(12,1,.2), 
      kem=   rnorm(1,log(.08),.01),
      ke=    rnorm(12,.08,.01),
      c0star = runif(12,9,11)
  )}

jagsfit <- jags(datain, model=model1, 
    inits=inits,
    parameters=parameters,progress.bar="gui",
    n.iter=10000,
    n.chains=4,n.thin=2)

Results

It seems the estimated parameters are close to the parameters from nls(). This is for this blog post sufficient. The data has quite some shrinkage, almost all subjects have Ke close 0.86. CL and Ka also have shrinkage, but much less than Ke.
Inference for Bugs model at "C:/Users/.../modeld8059bf37f0.txt", fit using jags,
 4 chains, each with 10000 iterations (first 5000 discarded), n.thin = 2
 n.sims = 10000 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
CL[1]      0.026   0.002   0.022   0.025   0.026   0.028   0.030 1.003  1100
CL[2]      0.035   0.002   0.031   0.034   0.035   0.037   0.039 1.001  4200
CL[3]      0.051   0.004   0.044   0.048   0.051   0.053   0.058 1.002  2300
CL[4]      0.038   0.002   0.034   0.037   0.038   0.040   0.044 1.006   460
CL[5]      0.041   0.003   0.036   0.039   0.041   0.043   0.047 1.005   620
CL[6]      0.041   0.003   0.035   0.039   0.041   0.042   0.046 1.002  2200
CL[7]      0.037   0.003   0.033   0.036   0.037   0.039   0.043 1.003   990
CL[8]      0.043   0.003   0.038   0.041   0.043   0.044   0.048 1.004   960
CL[9]      0.046   0.004   0.039   0.044   0.046   0.048   0.053 1.002  1600
CL[10]     0.046   0.003   0.040   0.044   0.046   0.049   0.053 1.002  1600
CL[11]     0.044   0.003   0.038   0.042   0.044   0.046   0.051 1.002  2000
CL[12]     0.033   0.002   0.029   0.031   0.033   0.034   0.038 1.007   430
CLm        0.040   0.003   0.035   0.038   0.040   0.042   0.046 1.004  1200
Kam        1.673   0.450   0.998   1.386   1.616   1.885   2.700 1.001  6400
Kem        0.086   0.005   0.076   0.082   0.085   0.089   0.096 1.010   420
ka[1]      1.482   0.205   1.127   1.338   1.466   1.607   1.934 1.001  9300
ka[2]      0.656   0.089   0.507   0.593   0.649   0.710   0.854 1.006   510
ka[3]      3.964   0.860   2.657   3.372   3.836   4.407   5.973 1.001  5700
ka[4]      0.954   0.125   0.730   0.869   0.948   1.032   1.219 1.009   310
ka[5]      2.110   0.325   1.555   1.882   2.080   2.302   2.827 1.003  1100
ka[6]      2.385   0.441   1.671   2.073   2.336   2.637   3.411 1.002  2200
ka[7]      1.208   0.171   0.910   1.092   1.195   1.309   1.584 1.002  2000
ka[8]      1.502   0.183   1.171   1.376   1.493   1.615   1.890 1.003  1100
ka[9]      1.287   0.245   0.878   1.112   1.261   1.429   1.834 1.002  1600
ka[10]     0.787   0.139   0.551   0.690   0.774   0.871   1.094 1.004   910
ka[11]     1.446   0.248   1.021   1.274   1.422   1.593   2.006 1.002  2400
ka[12]     8.409   5.846   4.277   5.818   7.078   8.998  20.996 1.001  7100
ke[1]      0.080   0.008   0.061   0.075   0.081   0.085   0.093 1.004   990
ke[2]      0.084   0.007   0.069   0.080   0.084   0.089   0.099 1.003  1100
ke[3]      0.085   0.007   0.072   0.081   0.085   0.089   0.101 1.002  2500
ke[4]      0.089   0.009   0.076   0.083   0.088   0.094   0.110 1.017   210
ke[5]      0.089   0.008   0.076   0.083   0.087   0.093   0.109 1.007   440
ke[6]      0.085   0.007   0.071   0.081   0.085   0.089   0.100 1.003  1300
ke[7]      0.086   0.007   0.073   0.082   0.086   0.090   0.104 1.006   530
ke[8]      0.086   0.007   0.073   0.081   0.085   0.090   0.101 1.008   420
ke[9]      0.086   0.008   0.071   0.081   0.085   0.090   0.103 1.005   970
ke[10]     0.086   0.008   0.071   0.081   0.085   0.090   0.104 1.009   480
ke[11]     0.086   0.008   0.072   0.081   0.085   0.090   0.103 1.005   630
ke[12]     0.088   0.008   0.075   0.083   0.087   0.092   0.106 1.008   340
deviance 200.433   9.015 184.713 193.957 199.825 206.049 220.328 1.002  2200

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 40.6 and DIC = 241.0
DIC is an estimate of expected predictive error (lower deviance is better).

Sunday, March 29, 2015

Space Launch Sites over Time

Continuing from last weeks post, I am now looking at space launch sites.

Data

Data are from the main table. In addition, this sites table was manually browsed for interpretation of abbreviations.

List of most important sites

Just by running counts the most important locations (more that 100 counts) are visible. As can be seen, each site has two parts. LC is an abbreviation for launch complex.
                                Site Freq
1  NIIP-53  LC41/1                    310
2  NIIP-5   LC31                      308
3  NIIP-5   LC1                       264
4  NIIP-53  LC43/4                    243
5  NIIP-53  LC43/3                    193
6  GIK-5    LC1                       191
7  NIIP-53  LC132/2                   172
8  NIIP-53  LC132/1                   168
9  NIIP-53  LC133/3                   123
10 CSG      ELA2                      119
11 CC       LC17A                     118
12 CC       LC17B                     107
13 GIK-5    LC200/39                  106
14 NIIP-53  LC16/2                    103
For ease of analysis, the Site is reduced to the first part. In addition, some sites have been renamed. The most important sites:
#NIIP-5=GIK-5=Baykonur
#NIIP-53=GNIIP=GIK-1=Plesetsk
#CC=Cape Canaveral=KSC=John F. Kennedy Space Center, Florida
#V=Vandenberg
#CSG=Centre Spatial Guyanais, Kourou, Guyane Francaise
In addition, for each site the country is determined. This led me to some confusion. Some locations seemed to be on the moon. However, I could not find the launches of the Apollo astronauts back from the moon. Finally, to remove this confusion and for ease of depiction I have only taken the locations with more than 10 counts. Two locations are not countries. A sea platform (Odyssee) and plane launch (Lockheed L-1011). 
The plot shows the renames in Russia, but also the parallel usage of the names KSC and CC by the USA. VandenBerg gets used less over time. China and India got more active in the recent years. 

Code

library(dplyr)
library(ggplot2)
r1 <- readLines('launchlog.txt')
colwidth  <- gregexpr('#',r1[2])[[1]] %>%
    c(.,max(nchar(r1))) %>%
    diff(.)

cols <- read.fwf(textConnection(r1[1]),
        widths=colwidth,
        header=FALSE,
        comment.char='@') %>%
    unlist(.) %>%
    make.names(.) %>%
    gsub('\\.+$','',.) 

r2 <- read.fwf('launchlog.txt',
    widths=colwidth,
    col.names=cols) 
r3 <- filter(r2,!is.na(Suc))
Sys.setlocale(category = "LC_TIME", locale = "C")
r3$Launch.Date..UTC[1:3]
r3$Date <- as.Date(r3$Launch.Date..UTC,format='%Y %b %d')

xtabs(~ Site , r3) %>%
  as.data.frame(.) %>%
  arrange(.,-Freq) %>%
  filter(.,Freq>100)
#LC=launch complex
ll <- levels(r3$Site)
r3$loc1 <- factor(gsub('( |,)[[:print:]]+$','',ll)[r3$Site])
xtabs(~ loc1 , r3) %>%
    as.data.frame(.) %>%
   arrange(.,-Freq) %>%
   filter(.,Freq>100)
#http://planet4589.org/space/log/sites.txt
#NIIP-5=GIK-5=Baykonur
#NIIP-53=GNIIP=GIK-1=Plesetsk
#CC=Cape Canaveral=KSC=John F. Kennedy Space Center, Florida
#V=Vandenberg
#CSG=Centre Spatial Guyanais, Kourou, Guyane Francaise
#JQ=Jiuquan Space Center, Nei Monggol Zizhiqu, China
#XSC=Xichang Space Center, Sichuan, China
#GTsP-4, Kapustin Yar, Volgograd, Rossiya
#L1011 = modified Lockheed L-1011 TriStar aircraft 
#Odyssey = Sea launch from platform

locbycountry <- read.csv(text='
CC    , USA  
CSG   , France
GIK-1 , Russia
GIK-5 , Russia
GNIIPV ,Russia
GTsP-4 ,Russia
JQ     ,China
KASC   ,Japan
KSC    ,USA
L-1011 ,Air
MARS   ,USA
NIIP-5 ,Russia
NIIP-53 ,Russia
ODYSSEY ,Sea platform
PA      ,USA
SHAR    ,India
TNSC    ,Japan
TYSC    ,China
V       ,USA
WI      ,USA
XSC  , China',
skip=1,header=FALSE,
  col.names=c('loc1','Geography'),
  stringsAsFactors=FALSE,
  strip.white=TRUE) %>%
  arrange(.,Geography) 

r4 <- xtabs(~ loc1 , r3) %>%
    as.data.frame(.) %>%
#   arrange(.,-Freq) %>%
    filter(.,Freq>10) %>%
    select(.,loc1) %>%
    merge(.,r3) %>%
    mutate(.,loc1=as.character(loc1)) %>%
    merge(.,locbycountry) %>%
    mutate(.,loc1=factor(loc1,levels=rev(locbycountry$loc1)))

ggplot(r4, aes(loc1, Date,col=Geography)) +
    geom_point()+
    coord_flip()+
    geom_jitter(position = position_jitter(width = .25)) +
    xlab('Geography') +
    theme(legend.position="bottom")+
    guides(col=guide_legend(nrow=2))