Friday, December 27, 2013

Sinterklaas and Santa Claus gave presents

In December my nice little netbook acquired a shiny new openSUSE 13.1, including all the goodies I might want to use in 2014; R, Stan, Julia and Jags. So here is what you might get when you set up a computer for statistics today.

Jags

I have been using Jags for quite some time because I had the (incorrect) impression openBugs development stopped. I never noticed, but since September the most current version of Jags is 3.4. What I missed since 3.3? this is how the NEWS file starts.

New features

  • You can now set a monitor of type "mean" for any node, which records the running mean.
  • The order function in the bugs module returns a permutation that sorts  its argument in ascending order (the inverse of the permutation provided by the rank function).
  • The Windows installer now offers a choice of personal or global installation for users with Admin access.
  • The negative binomial distribution is extended to allow parameters size=0 and probability=1.

Bug fixes

  • LogicalNode::isClosed could throw a logic_error due to a bug in the checkLinear function (Linear.cc)
  • Setting a monitor with thinning interval 0 prints an error message but no longer throws a logic_error exception.
  • The command line interface could print a spurious warning that ".RNG.seed" or ".RNG.state" was unused.
  • A function or distribution with zero arguments caused a segmentation fault. This now gives a compilation error.
  • The "dround" function (bugs module) was incorrectly calling fprec (round to n significant figures) when it should have been calling fround (round to n decimal places).
  • The sd() function divided by n (sample size) instead of (n-1). 
  • The Windows uninstaller did not correctly uninstall personal installations for users with administrative access.

R

Nothing new there compared to my windows machine (version 3.0.2 (2013-09-25) -- "Frisbee Sailing"). It was built on the netbook rather than taken from the repository. Getting it running under Eclipse en StatET did take some effort, but it is the environment I prefer. 

Julia

In my mind Julia is the response to the lack of speed of R. This is a fresh start, a new language based on modern understanding how an interpreter should work, while appreciating some of the good things of R.
Julia code promises to be as fast as C code. Last time I looked was August 2012. Seeing changes is not a surprise, the amount of changes was. Version 0.2 is there, including windows installers (32 and 64 bits). There are packages to add to it. And there is an IPython interface which runs in the browser. This gives IJulia, see screen shot. While IPython came out of the box (repository) getting Julia running in it took some fiddling around. I Played around a bit further than 4+5=9, but still wonder how it will work when running real computations. I'll have to try some Julia pretty soon.

Stan

Stan is an alternative to Jags/Bugs. It is not compatible, Stan code is converted to C++ then compiled. This may be the solution for bigger or more complex models. Stan was one of the reasons for running a Linux after a windows period. I never could get a compiler to function with R on my windows 7. Looking on the Stan website, I must say there is a load of documentation (a 400+ page manual). Examples? All the Bugs examples plus all examples from the arm book. I'll have to convert some of my Jags code to Stan soon. The one example I ran to proof functioning software showed the compiling to give a bunch of overhead, but that is only to be expected.

Others

Engauge digitizer and plotdigitizer for converting plots to data. Saw them when browsing the repository. You never know when to extract data back from a plot, so these are installed too. Not tested yet though.
Numpy, SciPy and Octave. While I drew them from the repository just in case, they might sit there unused for a long time.
Openbugs. It seems development continued after all. It can be built under linux too. On 64 bit Linux, the necessary 32-bit C development packages are required, which is the one thing stopping me. Updating my windows computer seems to be in order though.

Friday, December 20, 2013

Merry Christmas

Merry Christmas for all readers. Please enjoy my tree. 

code

part <- list(x0=0,y0=0,x1=0,y1=1,
    branch1=NULL,branch2=NULL,extend=NULL,
    lwd=1,depth=0,col='springgreen')

par(mfrow=c(1,1),mar=c(5, 4, 4, 2) + 0.1)
segplot <- function(tree) {
  if (is.null(tree)) return()
  segments(tree$x0,tree$y0,tree$x1,tree$y1,
      col=tree$col,
      lwd=tree$lwd)
  segplot(tree$branch1)
  segplot(tree$branch2)
  segplot(tree$extend)
}
#segplot(part)

grow <- function(tree) {
  if (is.null(tree) ) return(NULL)
  
  tree$lwd=tree$lwd*1.2
  
  if (tree$lwd>2.5) tree$col <- 'brown'
  if (is.null(tree$extend)) {
    tree$extend <- list(
        x0=tree$x1,
        y0=tree$y1,
        x1=rnorm(1,1,.03)*(2*tree$x1-tree$x0),
        y1=(rnorm(1,.98,.02)+.02*(tree$x1==tree$x0))*(2*tree$y1-tree$y0),
        branch1=NULL,
        branch2=NULL,
        extend=NULL,
        lwd=1,
        depth=tree$depth,
        col=tree$col
    )
    length=sqrt((tree$x1-tree$x0)^2 + (tree$y1-tree$y0)^2)
    angle <- asin((tree$x1-tree$x0)/length)
    branch <- list(
        x0=(tree$x1+tree$x0)/2,
        y0=(tree$y1+tree$y0)/2,
        branch1=NULL,
        branch2=NULL,
        extend=NULL,
        lwd=1,
        depth=tree$depth,
        col=tree$col
    )
    shift <- rnorm(2,.5,.1)
    branch$x0 <- shift[1]*tree$x1+(1-shift[1])*tree$x0
    branch$y0 <- shift[1]*tree$y1+(1-shift[1])*tree$y0
    length=length*rnorm(1,.5,.05)
    co <- runif(1,.35,.45)
    branch$x1 <- branch$x0+sin(angle+co)*length
    branch$y1 <- branch$y0+cos(angle+co)*length
    tree$branch1 <- branch
    branch$x0 <- shift[2]*tree$x1+(1-shift[2])*tree$x0
    branch$y0 <- shift[2]*tree$y1+(1-shift[2])*tree$y0
    co <- runif(1,.35,.45)
    branch$x1 <- branch$x0+sin(angle-co)*length
    branch$y1 <- branch$y0+cos(angle-co)*length
    tree$branch2 <- branch    
  } else {
    tree$branch1 <- grow(tree$branch1)
    tree$branch2 <- grow(tree$branch2)
    tree$extend <- grow(tree$extend)
  }
  tree$depth <- tree$depth+1
  if (tree$depth>2)  tree$col <- 'green'
  if (tree$depth>4)  tree$col <- 'darkgreen'
  if (tree$depth>6)  tree$col <- 'brown'
  
  tree
}
tree <- part
for (i in 1:9) tree <- grow(tree) 
par(mar=c(0,0,0,0))
plot(x=c(-3,3),y=c(0,9),type='n',axes=FALSE,xlab='',ylab='')
segplot(tree)

Sunday, December 15, 2013

plotting y and log(y) in one figure

Sometimes I have the desire to plot both on the linear and on the log scale. To save space just two figures is not my solution. I want to reuse the x-axis, legend, title. This post examines possibilities to do so with standard plot tools, lattice and ggplot2.

Data

Data is completely artificial.
library(ggplot2)
library(lattice)

datastart <- data.frame(x=rep(1:5,2),
    y=c(1,2,10,50,1, .1,9,8,20,19),
    type=rep(c('a','b'),each=5))
datastart
   x    y type
1  1  1.0    a
2  2  2.0    a
3  3 10.0    a
4  4 50.0    a
5  5  1.0    a
6  1  0.1    b
7  2  9.0    b
8  3  8.0    b
9  4 20.0    b
10 5 19.0    b

standard plot tools

The trick here is to make two plots. The top plot has no x-axis, the bottom one no title. To make the two plot surfaces equal in size the room reserved for x-axis and title is the same. 

par(mfrow=c(2,1),mar=c(0,4.1,4,2))
plot(y~x,
    data=datastart,
    axes=FALSE,
    frame.plot=TRUE,
    xlab='',
    main='bla bla',
    col=c('red','green')[datastart$type])
legend(x='topleft',
    legend=c('a','b'),
    col=c('red','green'),
    pch=1)
axis(side=2,las=1)
par(mar=c(4,4.1,0,2))
plot(y~x,
    data=datastart,
    axes=FALSE,
    frame.plot=TRUE,
    xlab='x',
    log='y',
    ylab='log(y)',
    col=c('red','green')[datastart$type]
)
axis(side=2,las=1)
axis(side=1)

lattice

As far as I understand, lattice does not have the tools to fiddle this extreme with axis. The trick then is to add a copy of the data on logarithmic scale and manually control the labels. Lattice does not understand that the x-axis are same, but some suppression is possible.
data1=datastart
data2=datastart
data1$lab='linear'
data2$lab='log'
data2$y <- log10(data2$y)
dataplot <- rbind(data1,data2)

at2 <- axisTicks(range(data2$y),log=TRUE,nint=4)
at1 <- axisTicks(range(data1$y),log=FALSE,nint=5)
atx <- axisTicks(range(data1$x),log=FALSE,nint=5)
dataplot$lab <- factor(dataplot$lab,levels=c('linear','log'))
xyplot(y  ~ x | lab,groups=type,data=dataplot,layout=c(1,2),
    main='bla bla',
    as.table=TRUE,
    auto.key=TRUE,
    scales=list(relation='free',
        x=list(at=list(NULL,atx)),
        y=list(at=list(at1,log10(at2)),
            labels=list(format(at1),format(at2))
        )
    ))

ggplot2

The trick here is that ggplot2 can have a free y-axis, but you cannot set the labels per axis. Instead it is a common y-axis which has adapted labels. ggplot chooses the range for the y-axis itself, you have to make sure that the labels you feed it match that range. To make that fit in the end I just shifted the whole log part to a different range. Some of the code of lattice is re-used.

dataplot2 <- dataplot
offset <- -10
breaks=c(-11,-10,-9)
dataplot2$y[dataplot2$lab=='log'] <- dataplot2$y[dataplot2$lab=='log'] +offset 
p <- ggplot(dataplot2, aes(x, y,colour=type)) + geom_point()
p + facet_grid(lab ~ .,
        scales='free_y') +
    labs(title = 'bla bla') +
    scale_y_continuous(
        breaks = c(breaks,at1),
        labels=c(format(10^(breaks-offset)),format(at1)))

Sunday, December 8, 2013

Fukushima region radiation decrease

Fukushima is still a topic which gets headlines, and somewhere in a comment was a link to actual and historical radiation data: Fukushima prefecture radioactivity measurement map.  It takes a bit of time to load but then you have a map of the Fukushima region with current radiation levels at measurement stations. Somewhere on that website there is data from the past year, this post examines the decrease at four of the stations.

Data

If you look at the map, there are loads of blue dots, indicating low radiation, and a line of yellow, orange and red dots, striking north east from the reactor. It is an interactive map. You can click on any dot and see some details for that region. And at the bottom of that is a link to a detail page. On that page there are, among other things, graphs such as this one.
直近90日間の空間線量の推移
One of these plots contains 'transit of air dose rates in the most recent 365 days. It is a dynamic plot, the link to it contains the data, just like the plot above. It is pretty easy to extract the data from the link. Since getting the links was manual work, I decided to restrict my data to four dots. To discriminate between locations I copied some header and a number. For example 【相双地方】室原公民館の測定値 (Google translate: Measurements [phase] of bi-district community center Murohara) with number 704. The complete data with location information are at the bottom of the page.
There may be a more simple way to get the data, but since I don't speak Japanese, I cannot say either way.
Edit 9 December 2012: I got sent better translations from Ishigami, a Japanese reader, hence this post has been edited.

Analysis

Since the data looks pretty choppy, I wanted a robust model. To quote the robust task view:  "robust depends on robustbase, the former providing convenient routines for the casual user where the latter will contain the underlying functionality, and provide the more advanced statistician with a large range of options for robust modeling." Regarding robust I am certainly the casual user. Proportional decrease by using a linear model on log transformed data seemed an appropriate modelling strategy. 
The plot below contains the data and the predictions. For me, the predictions seem just like I would make them when ignoring some of the crazy jumps.
The data themselves, show a clear correlation. The same drop mid January, a jump in September, though I do not know an explanation.

Half lives can be estimated from the models. Two of them are around 2. Cesium-134 has a half live of 2.04 years and has apparently been released, it is easy to speculate Cesium-134 is part of the source of this radiation.
                                           Lower CI     mean Upper CI
Maenori meeting place in Soso district     1.922496 2.074042 2.251524
Murohara community center in Soso district 3.171938 4.463582 7.529780
Sugiuchi meeting place in Soso district    1.764994 1.817172 1.872530
township village center                    2.602491 2.854096 3.159558

code

#【相双地方】室原公民館の測定値       
#704 
# Measurements [phase] of bi-district community center Murohara
# Measurements of Murohara community center in Soso district.
ys704 <- "http://chart.apis.google.com/chart?chd=t:433,432,433,410,431,432,426,430,441,438,435,432,435,416,422,431,422,422,407,425,405,421,419,424,421,418,415,428,412,402,414,404,413,412,401,402,406,413,424,276,310,334,352,357,370,389,396,398,402,405,404,396,397,405,404,404,407,417,419,411,413,411,332,409,398,400,409,388,402,402,403,394,387,400,408,394,394,389,405,384,391,387,399,404,407,416,407,403,409,409,412,423,428,415,411,408,416,431,403,411,415,420,436,426,420,407,416,414,411,411,404,402,415,406,405,393,420,403,389,401,403,390,397,403,410,388,393,388,395,412,403,408,413,391,395,383,377,392,401,399,402,386,394,392,405,403,387,392,394,404,406,411,391,406,401,408,387,389,392,401,399,391,390,405,402,411,408,409,408,407,404,411,413,417,405,390,399,393,392,401,402,405,411,410,405,414,414,411,405,404,417,395,396,401,403,396,406,398,401,403,407,408,388,390,393,396,398,402,405,392,404,409,406,410,405,419,423,420,417,400,402,398,400,387,395,389,397,399,386,395,386,396,396,386,392,383,392,390,380,379,391,392,385,390,400,405,408,416,417,415,415,415,417,417,419,419,420,405,411,405,401,406,408,411,403,403,408,417,402,401,395,405,395,389,394,395,381,389,392,395,399,390,396,388,370,380,383,383,405,391,387,384,389,382,369,374,377,384,379,382,375,374,372,365,366,374,379,388,378,378,373,366,368,343,355,353,358,361,347,351,334,346,340,334,327,332,334,343,331,337,335,338,344,331,333,336,341,331,336,338,322,320,322,331,330,330,331,335,326,331,325,325,327,328,331,325,331,324,324,320,323,322,327,324,323&chxl=0:|365|330|300|270|240|210|180|150|120|90|60|30|1|2:|%CE%BCSv%2Fh&chxp=0|2,0&chxr=0,0,100|1,0,50&chxs=0,676767,9,0,lt,676767|1,676767,9,0,l,676767&chxtc=0,3&chxt=x,y,t&chs=300x150&cht=lc&chco=0067B7&chds=0,5000&chls=1&chma=1,1,1,1&chg=8.333,10,1,1"
#【相双地方】杉内集会所の測定値       
#628 
# Measured value of the bi-phase region] Sugiuchi meeting place
# Measured value of Sugiuchi meeting place in Soso district.
ys628 <- "http://chart.apis.google.com/chart?chd=t:280,281,281,269,278,279,277,279,282,278,281,273,275,275,274,276,267,267,269,270,271,272,261,267,269,269,270,272,270,266,269,270,270,268,270,269,268,270,268,183,183,197,201,203,208,216,223,234,243,246,253,250,251,254,255,255,256,262,265,266,258,263,201,242,245,250,255,256,255,258,259,252,253,256,258,260,259,258,259,256,257,258,261,261,262,258,263,262,263,264,264,268,269,265,265,265,265,268,265,265,267,268,267,266,265,265,266,265,263,261,262,247,253,254,253,247,253,244,235,246,249,232,242,247,250,239,245,240,246,251,250,251,254,246,244,234,234,240,243,233,242,230,239,242,244,243,234,241,244,247,248,248,246,247,248,248,227,231,236,238,240,236,239,243,241,242,242,246,245,245,245,245,247,247,239,221,231,230,227,233,235,237,238,238,227,236,237,238,226,224,232,218,223,226,225,220,224,216,217,224,227,230,217,218,220,220,224,226,230,218,224,227,227,228,222,230,231,235,236,222,223,222,224,225,217,218,222,223,214,218,211,216,215,210,214,209,214,214,207,209,214,215,210,211,217,221,223,228,229,231,232,232,235,235,237,238,238,226,227,227,225,228,230,233,221,225,226,231,222,223,223,228,215,214,215,216,215,216,217,219,221,213,218,210,204,209,211,212,215,217,215,214,216,203,205,207,209,211,208,206,201,205,202,197,197,201,204,207,206,205,206,206,206,190,194,196,197,198,187,190,185,188,183,185,186,190,191,193,194,195,195,195,196,192,193,195,194,193,194,194,188,190,191,192,192,191,193,193,194,194,194,194,194,194,195,189,190,191,190,190,191,190,191,188,188&chxl=0:|365|330|300|270|240|210|180|150|120|90|60|30|1|2:|%CE%BCSv%2Fh&chxp=0|2,0&chxr=0,0,100|1,0,50&chxs=0,676767,9,0,lt,676767|1,676767,9,0,l,676767&chxtc=0,3&chxt=x,y,t&chs=300x150&cht=lc&chco=0067B7&chds=0,5000&chls=1&chma=1,1,1,1&chg=8.333,10,1,1"
# 【相双地方】前乗集会所の測定値   
#741 
# Measurement of the meeting place to the power [bi-phase region] before
# Measurement of Maenori meeting place in Soso district.
ys741 <- 'http://chart.apis.google.com/chart?chd=t:114,113,112,110,98,101,102,105,109,112,115,114,114,114,105,105,104,105,102,92,91,93,92,95,99,100,100,103,99,98,97,100,99,98,99,98,98,99,100,77,56,56,56,51,48,49,50,48,48,49,49,46,43,45,45,46,46,49,66,67,68,70,59,68,64,65,68,66,67,64,66,63,61,62,65,64,65,63,66,64,57,58,59,64,68,75,77,78,81,84,90,96,99,98,98,97,100,101,99,100,101,103,104,103,105,101,103,102,102,102,100,100,103,102,101,99,102,100,94,97,99,96,96,98,99,98,98,95,97,101,100,101,102,97,98,92,64,90,96,96,97,94,97,98,98,99,95,96,97,98,99,100,98,100,101,102,95,98,99,100,100,97,98,100,98,99,99,100,100,100,100,101,101,101,96,93,96,96,95,97,98,99,100,99,93,96,98,97,97,96,97,90,91,93,93,90,92,92,93,93,94,94,87,90,90,90,91,92,92,88,90,90,91,91,88,91,92,92,92,87,89,87,90,85,86,88,89,90,87,90,87,88,88,85,86,85,85,86,84,84,87,88,85,84,88,89,89,91,91,92,92,93,94,94,94,94,95,88,90,91,89,90,87,87,87,89,90,91,87,86,87,87,87,84,86,86,83,85,86,86,87,87,88,84,83,85,85,86,88,88,87,86,87,86,83,84,85,86,85,85,83,83,83,81,81,82,84,85,85,86,84,85,85,79,80,80,81,82,77,79,77,78,78,78,78,79,79,80,79,79,80,80,81,79,79,80,79,79,79,78,76,76,76,78,78,77,78,79,78,79,78,78,78,78,75,77,77,76,76,76,76,76,77,76,76&chxl=0:|365|330|300|270|240|210|180|150|120|90|60|30|1|2:|%CE%BCSv%2Fh&chxp=0|2,0&chxr=0,0,100|1,0,50&chxs=0,676767,9,0,lt,676767|1,676767,9,0,l,676767&chxtc=0,3&chxt=x,y,t&chs=300x150&cht=lc&chco=0067B7&chds=0,5000&chls=1&chma=1,1,1,1&chg=8.333,10,1,1'
# 【相双地方】町区集落センターの測定値  
#1376 
# Measured value of the bi-phase region] township village center
ys1376 <- 'http://chart.apis.google.com/chart?chd=t:1023,1010,991,987,953,1027,1007,1008,1006,957,963,980,1000,899,1003,994,920,995,886,1008,884,1003,910,915,918,904,977,1006,881,993,888,984,908,976,986,861,869,995,986,612,651,683,711,709,728,733,770,813,834,853,849,851,981,969,994,968,963,986,978,988,967,965,742,967,833,855,946,973,978,853,970,836,821,840,975,835,841,977,854,971,970,824,982,849,984,885,971,861,976,985,883,902,985,878,865,983,986,992,846,867,987,901,909,987,887,984,966,875,983,957,849,841,953,855,849,814,875,956,827,844,849,825,830,849,865,826,838,825,834,869,853,872,949,854,834,1003,1004,1041,1042,1012,1024,865,989,1009,1015,1003,981,997,1024,902,1029,1027,1005,899,894,908,975,986,1000,1009,1014,996,985,998,1003,999,991,1008,1020,988,1003,1016,919,921,984,847,872,981,980,966,996,991,898,890,977,988,984,979,875,874,989,861,862,873,965,858,867,846,844,860,960,972,845,847,854,855,869,869,879,850,865,867,872,877,859,881,873,884,885,849,853,850,849,828,837,829,841,849,826,840,823,833,839,818,834,814,829,830,816,813,827,831,825,822,837,845,856,866,930,860,861,862,869,873,872,878,881,847,851,855,847,850,854,861,846,847,855,870,862,858,860,870,849,840,837,841,818,833,839,844,851,834,848,828,803,814,820,821,854,834,836,826,837,811,796,799,809,825,817,813,801,803,797,784,790,808,813,820,787,795,780,772,774,743,739,743,750,754,729,817,791,793,795,797,769,782,784,808,793,805,790,799,812,783,779,789,805,779,793,800,747,752,748,771,768,769,772,785,766,777,782,779,774,767,777,763,761,758,759,747,764,758,763,746,745&chxl=0:|365|330|300|270|240|210|180|150|120|90|60|30|1|2:|%CE%BCSv%2Fh&chxp=0|2,0&chxr=0,0,100|1,0,50&chxs=0,676767,9,0,lt,676767|1,676767,9,0,l,676767&chxtc=0,3&chxt=x,y,t&chs=300x150&cht=lc&chco=0067B7&chds=0,5000&chls=1&chma=1,1,1,1&chg=8.333,10,1,1'
linktodata <- function(ys,i) {
  ys <- gsub('.*t:','',ys)
  ys <- gsub('&chxl.*','',ys)
  dose <- scan(textConnection(ys),sep=',')
  data.frame(dose=dose,station=i,relday = -(length(dose):1))
}

locations <-data.frame(station=c(704,628,741,1376),
    loc=factor(c('Murohara community center in Soso district','Sugiuchi meeting place in Soso district',
            'Maenori meeting place in Soso district', 'township village center')))

r1 <- rbind(linktodata(ys704,704),
    linktodata(ys628,628),
    linktodata(ys1376,1376),
    linktodata(ys741,741))
r1$doseus <- r1$dose/100
r1$day <- as.Date("05-12-2013",format='%d-%m-%Y')+r1$relday
Sys.setlocale("LC_TIME",'C')

r1 <- merge(r1,locations)
r1$status <- 'Observed'
library(robust)

models <- lapply(levels(r1$loc),function(st) {
      r2 <- r1[r1$loc==st,]
      lmrob(log10(doseus) ~ relday,data=r2)
    })

topred <- r1[r1$station==r1$station[1],c('relday','day')]

preds <- lapply(1:nlevels(r1$loc),function(i) {
      preds <- topred
      preds$doseus <- 10^predict(models[[i]],preds)
      preds$status <- 'Predicted'
      preds$loc <- levels(r1$loc)[i]
      preds
    }
)
preds <- do.call(rbind,preds)
preds <- merge(preds,locations)
all <- rbind(subset(r1,select=-dose),preds)

library(lattice)    
png('fuk2.png')
xyplot(doseus ~ day | loc,groups= status,data=all,type='l',
    scales=list(y=list(log=10)),
        par.strip.text=list(cex=.7),
    ylab=expression(mu * "Sv per hour" ))
dev.off()    

halflives <-     t(sa <- sapply(models,function(x) {
          point <- coef(x)[2]
          sd <- sqrt(vcov(x)[2,2])
          yy <- c(log10(.5)/c(`0.025`=point-1.96*sd,mean=point,`0.975`=point+1.96*sd))/365
          names(yy) <- c('Lower CI','mean','Upper CI')
          yy
        }))
rownames(halflives) <- levels(r1$loc)
halflives

Sunday, December 1, 2013

JAGS model Fe concentration in rainwater including values below detection level

In my previous post I ignored the fact that some data are below the detection level. I would not know how to handle those in a mixed model from lme4 or nlme. However, JAGS can handle these values. Next to that I kept the usual independent variables, such as random effects for location, machine etc.

Data

Data has been acquired and processed similar to last week. For script at the bottom of the page.

detection level

In the spreadsheet a '4' is used to indicate a value below detection limit. (* een " 4 "-teken in kolom C geeft aan dat de analyseresultaten beneden de detectielimiet liggen). There is something strange there. The detection limit seems to be 0.4, but there is data under 0.4 which is not flagged. In addition, some of the data is below 0, which seems improbable. In the end I flagged all data under 0.4 as NA. 
Fe <- rf2[rf2$Compound==levels(rf2$Compound)[9],]
library(lattice)
densityplot( ~ Amount |  machine + Status,
    groups=Location,data=Fe[!is.na(Fe$Status) 
            & Fe$Amount<.8,],
    panel=function(...) {
      panel.densityplot(...)
      panel.abline(v=c(0,0.4))
    },adjust=.5
)

Analysis

It is a reasonable standard JAGS model. The only tricky thing is initialization. When the parameters are too far away an error is thrown. Hence 'intercept' is initialized as -1. In text the model and inits are given, at the end the full code.
FeModel <- function() {
  for (i in 1:nobs) {
    BLZ[i] ~ dinterval( LAmount[i] , -0.4034029 )
    ExpLAmount[i] <- intercept + 
        rate*time[i] + 
        rateL[Location[i]]*time[i]+
        FMach[Machine[i]] + 
        FLoc[Location[i]]
    LAmount[i] ~ dnorm(ExpLAmount[i],tau[Machine[i]])
  }
  for (i in 1:2) {
    tau[i] <- pow(sd[i],-2)
    sd[i] ~ dunif(0,1000)
  }
  intercept ~ dnorm(0,.0001)
  FMach[1] ~dnorm(0,tauMachShift)
  FMach[2] <- -FMach[1]
  tauMachShift <- pow(sdMachShift,-2)
  sdMachShift ~ dunif(0,10)
  for (iL in 1:nloc) {
    FLoc[iL] ~ dnorm(0,tauLoc)
    rateL[iL] ~dnorm(0,tauRate)
    yearlyL[iL] <- pow(pow(10,rateL[iL]+rate),365)
  }
  tauLoc <- pow(sdLoc,-2)
  sdLoc ~ dunif(0,100)
  tauRate <- pow(sdRate,-2)
  sdRate ~ dunif(0,100)
  rate ~ dnorm(0,.0001)
  dailydec <- pow(10,rate)
  yearlydec <- pow(dailydec,365)
}

inits <- function() {
  list(intercept = -1,#rnorm(1,.45,.04),
      rate=rnorm(0,.01),
      sd=runif(2,0,50),
      sdMachShift=runif(1,0,1),
      sdLoc=runif(1,0,1))
}

Results

The model shows the decrease of 5% per year which was expected. 
Machine 1 (Eigenbrodt NSA 181 KHT) is slightly less accurate (was also observed last week) and has slightly lower values than 2 (Van Essen/ECN vanger). Variation between locations is less than variation between machines.
Inference for Bugs model at "C:/.../AppData/Local/Temp/Rtmpma5UEx/modelc40761211ba.txt", fit using jags,
 4 chains, each with 5000 iterations (first 2500 discarded), n.thin = 2
 n.sims = 5000 iterations saved
             mu.vect sd.vect     2.5%      25%      50%      75%    97.5%
FLoc[1]        0.023   0.070   -0.110   -0.022    0.021    0.065    0.169
FLoc[2]        0.139   0.081   -0.038    0.091    0.147    0.195    0.280
FLoc[3]       -0.041   0.069   -0.180   -0.083   -0.041    0.002    0.098
FLoc[4]       -0.103   0.080   -0.289   -0.147   -0.094   -0.048    0.032
FLoc[5]       -0.072   0.072   -0.209   -0.117   -0.075   -0.027    0.078
FLoc[6]       -0.036   0.070   -0.173   -0.079   -0.036    0.008    0.102
FLoc[7]       -0.016   0.069   -0.154   -0.058   -0.016    0.027    0.117
FLoc[8]        0.278   0.079    0.127    0.229    0.274    0.323    0.453
FLoc[9]       -0.061   0.089   -0.243   -0.118   -0.060    0.000    0.109
FLoc[10]       0.052   0.069   -0.091    0.009    0.052    0.095    0.186
FLoc[11]       0.040   0.079   -0.131   -0.008    0.048    0.096    0.172
FLoc[12]      -0.102   0.077   -0.245   -0.153   -0.106   -0.054    0.064
FLoc[13]      -0.054   0.082   -0.210   -0.107   -0.056   -0.005    0.117
FLoc[14]       0.027   0.067   -0.112   -0.013    0.028    0.070    0.155
FLoc[15]      -0.031   0.070   -0.164   -0.076   -0.033    0.011    0.116
FLoc[16]       0.088   0.070   -0.050    0.045    0.087    0.131    0.227
FLoc[17]      -0.154   0.076   -0.293   -0.203   -0.157   -0.106    0.006
FMach[1]      -0.053   0.017   -0.087   -0.065   -0.053   -0.041   -0.018
FMach[2]       0.053   0.017    0.018    0.041    0.053    0.065    0.087
intercept      0.401   0.081    0.240    0.345    0.402    0.455    0.559
sd[1]          0.436   0.018    0.402    0.423    0.436    0.448    0.471
sd[2]          0.386   0.007    0.372    0.381    0.386    0.391    0.401
sdLoc          0.135   0.036    0.077    0.110    0.130    0.154    0.216
sdMachShift    1.854   2.456    0.033    0.170    0.675    2.587    8.820
yearlyL[1]     0.944   0.007    0.931    0.940    0.944    0.948    0.957
yearlyL[2]     0.950   0.007    0.938    0.945    0.949    0.954    0.965
yearlyL[3]     0.945   0.006    0.932    0.941    0.945    0.949    0.958
yearlyL[4]     0.949   0.007    0.936    0.944    0.948    0.953    0.963
yearlyL[5]     0.943   0.007    0.929    0.939    0.944    0.948    0.956
yearlyL[6]     0.945   0.006    0.932    0.941    0.945    0.949    0.957
yearlyL[7]     0.946   0.006    0.933    0.941    0.946    0.950    0.958
yearlyL[8]     0.945   0.006    0.932    0.941    0.945    0.949    0.957
yearlyL[9]     0.944   0.007    0.929    0.940    0.945    0.949    0.958
yearlyL[10]    0.946   0.006    0.933    0.942    0.946    0.950    0.958
yearlyL[11]    0.950   0.006    0.938    0.946    0.950    0.954    0.964
yearlyL[12]    0.942   0.007    0.928    0.938    0.943    0.947    0.955
yearlyL[13]    0.944   0.007    0.930    0.939    0.944    0.948    0.956
yearlyL[14]    0.947   0.006    0.935    0.943    0.947    0.951    0.959
yearlyL[15]    0.944   0.006    0.931    0.940    0.944    0.948    0.956
yearlyL[16]    0.946   0.006    0.934    0.942    0.946    0.950    0.958
yearlyL[17]    0.942   0.007    0.926    0.938    0.943    0.947    0.956
yearlydec      0.945   0.005    0.936    0.942    0.945    0.949    0.955
deviance    1729.603  52.705 1627.020 1693.551 1728.751 1765.143 1831.422
             Rhat n.eff
FLoc[1]     1.002  2200
FLoc[2]     1.001  4000
FLoc[3]     1.001  5000
FLoc[4]     1.001  5000
FLoc[5]     1.001  5000
FLoc[6]     1.002  1500
FLoc[7]     1.001  5000
FLoc[8]     1.001  4400
FLoc[9]     1.002  1600
FLoc[10]    1.001  5000
FLoc[11]    1.001  2800
FLoc[12]    1.002  2200
FLoc[13]    1.001  5000
FLoc[14]    1.001  5000
FLoc[15]    1.001  2800
FLoc[16]    1.001  4800
FLoc[17]    1.004   740
FMach[1]    1.004   760
FMach[2]    1.004   760
intercept   1.001  3100
sd[1]       1.007   380
sd[2]       1.002  1500
sdLoc       1.003  1600
sdMachShift 1.001  5000
yearlyL[1]  1.001  3000
yearlyL[2]  1.001  4100
yearlyL[3]  1.002  1600
yearlyL[4]  1.001  2800
yearlyL[5]  1.002  1400
yearlyL[6]  1.002  2300
yearlyL[7]  1.001  5000
yearlyL[8]  1.001  5000
yearlyL[9]  1.001  4000
yearlyL[10] 1.001  3500
yearlyL[11] 1.001  5000
yearlyL[12] 1.002  1400
yearlyL[13] 1.001  5000
yearlyL[14] 1.002  1400
yearlyL[15] 1.001  3100
yearlyL[16] 1.002  1500
yearlyL[17] 1.004   850
yearlydec   1.002  1800
deviance    1.003   900

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 1385.1 and DIC = 3114.7
DIC is an estimate of expected predictive error (lower deviance is better).

Code

library(R2jags)

# http://www.lml.rivm.nl/data/gevalideerd/data/lmre_1992-2005.xls
# http://www.r-bloggers.com/read-excel-files-from-r/
library(XLConnect)

readsheet <- function(wb,cursheet) {
  df = readWorksheet(wb, sheet = cursheet , header = TRUE)
  topline <- 8
  test <- which(df[6,]=='C')+1
  numin <- df[topline:nrow(df),test]
  names(numin) <- make.names( gsub(' ','',df[6,test]))
  for(i in 1:ncol(numin)) numin[,i] <- as.numeric(gsub(',','.',numin[,i]))
  status = df[topline:nrow(df),test-1]
  names(status) <- paste('C',make.names( gsub(' ','',df[6,test])),sep='')
  
  numin$StartDate <- as.Date(substr(df[topline:nrow(df),1],1,10))
  numin$EndDate <-  as.Date(substr(df[topline:nrow(df),2],1,10))
  numin <- cbind(numin,status)
  tall <- reshape(numin,
      varying=list(make.names( gsub(' ','',df[6,test])),paste('C',make.names( gsub(' ','',df[6,test])),sep='')),
      v.names=c('Amount','Status'),
      timevar='Compound',
      idvar=c('StartDate','EndDate'),
      times=paste(df[6,test],'(',df[5,test],')',sep=''),
      direction='long')
  tall$Compound <- factor(gsub(' )',')',gsub(' +',' ',tall$Compound)))
  row.names(tall)<- NULL
  tall$Location <- cursheet
  tall
}

readfile <- function(fname) {
  wb <- loadWorkbook(fname)
  print(wb)
  sheets <- getSheets(wb)[-1]
  tt <- lapply(sheets,readsheet,wb=wb)
  tt2 <- do.call(rbind,tt) 
  tt2$Location <- factor(tt2$Location)
  tt2$fname <- fname
  tt2
}

fnames <- dir(pattern='*.xls') #.\\. ?
rf <- lapply(fnames,readfile)
rf2 <- do.call(rbind,rf)

rf2$machine <- factor(ifelse(rf2$fname=="lmre_1992-2005.xls",
        'Van Essen/ECN vanger',
        ' Eigenbrodt NSA 181 KHT'))

Fe <- rf2[rf2$Compound==levels(rf2$Compound)[9],]
library(lattice)
densityplot( ~ Amount |  machine + Status,
    groups=Location,data=Fe[!is.na(Fe$Status) 
            & Fe$Amount<.8,],
    panel=function(...) {
      panel.densityplot(...)
      panel.abline(v=c(0,0.4))
    },adjust=.5
)

Fe$LAmount <- log10(Fe$Amount)
Fe$LAmount[Fe$Amount<0.4] <- NA
Fe2 <- Fe[!is.na(Fe$Status),]

datain <- list(
    LAmount = Fe2$LAmount,
    BLZ = !is.na(Fe2$LAmount),
    Machine=(Fe2$machine=='Van Essen/ECN vanger')+1,
    Location=(1:nlevels(Fe2$Location))[Fe2$Location],
    time=as.numeric(Fe2$StartDate),
    nobs =nrow(Fe2),
    nloc=nlevels(Fe2$Location)
)

FeModel <- function() {
  for (i in 1:nobs) {
    BLZ[i] ~ dinterval( LAmount[i] , -0.4034029 )
    ExpLAmount[i] <- intercept + 
        rate*time[i] + 
        rateL[Location[i]]*time[i]+
        FMach[Machine[i]] + 
        FLoc[Location[i]]
    LAmount[i] ~ dnorm(ExpLAmount[i],tau[Machine[i]])
  }
  for (i in 1:2) {
    tau[i] <- pow(sd[i],-2)
    sd[i] ~ dunif(0,1000)
  }
  intercept ~ dnorm(0,.0001)
  FMach[1] ~dnorm(0,tauMachShift)
  FMach[2] <- -FMach[1]
  tauMachShift <- pow(sdMachShift,-2)
  sdMachShift ~ dunif(0,10)
  for (iL in 1:nloc) {
    FLoc[iL] ~ dnorm(0,tauLoc)
    rateL[iL] ~dnorm(0,tauRate)
    yearlyL[iL] <- pow(pow(10,rateL[iL]+rate),365)
  }
  tauLoc <- pow(sdLoc,-2)
  sdLoc ~ dunif(0,100)
  tauRate <- pow(sdRate,-2)
  sdRate ~ dunif(0,100)
  rate ~ dnorm(0,.0001)
  dailydec <- pow(10,rate)
  yearlydec <- pow(dailydec,365)
}

parameters <- c('intercept','yearlydec','sd',
    'sdMachShift','FMach',
    'sdLoc','FLoc',
    'yearlyL'    )
inits <- function() {
  list(intercept = -1,#rnorm(1,.45,.04),
      rate=rnorm(0,.01),
      sd=runif(2,0,50),
      sdMachShift=runif(1,0,1),
      sdLoc=runif(1,0,1))
}

# estimate        
jj <- jags(FeModel, data=datain,
    parameters=parameters, init=inits,n.iter=5000,
    progress.bar="gui",n.chains=4)
jj

Sunday, November 24, 2013

Dutch Rainwater Composition 1992-2011

Last week I examined rainwater composition 1992 to 2005. There is additional data, but National Institute for Public Health has changed equipment in 2005. This week I will add those data.

Data

Data is in a number of spreadsheets. The script is a variation on last week, with an extra wrapper for spreadsheets. The final parts are merging locations (Station Leiduin and Witteveen have been replaced by De Zilk and Valthermond respectively) and adding the equipment variable. The code is at the bottom of the post.

pH

It seems pH is still slightly increasing. It also seems there are like two levels of pH. For instance, in Gilze-Rijen the pH is either around 5 or around 6. But 5.5 is less probable. 

Iron

For Iron there are just four locations with complete data. Hence these are displayed. It would seem the trend downward continues.

Iron model

Since I wanted to give the two machines each a separate variance, I opted to use nlme in this particular calculation. In addition, rather than following the plot, all locations are used, and the replacement stations are not merged together. The machines seem to give similar variation, which probably means variation in iron concentration is much larger than machine accuracy. The decrease estimate is still around 5% per year.
Linear mixed-effects model fit by REML
 Data: Fe2 
       AIC      BIC    logLik
  3135.969 3184.729 -1559.984

Random effects:
 Formula: ~StartDate | Location
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev       Corr  
(Intercept) 1.018836e-05 (Intr)
StartDate   6.680830e-06 -0.05 

 Formula: ~1 | machine %in% Location
        (Intercept)  Residual
StdDev:  0.09556604 0.3791318

Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | machine 
 Parameter estimates:
   Van Essen/ECN vanger  Eigenbrodt NSA 181 KHT 
               1.000000                1.049395 
Fixed effects: LAmount ~ StartDate 
                 Value  Std.Error   DF    t-value p-value
(Intercept)  0.4373752 0.06188798 3258   7.067208       0
StartDate   -0.0000640 0.00000549 3258 -11.645303       0
 Correlation: 
          (Intr)
StartDate -0.887

Standardized Within-Group Residuals:
         Min           Q1          Med           Q3          Max 
-3.752718189 -0.655477969 -0.003628072  0.645709629  4.226135302 

Number of Observations: 3280
Number of Groups: 
             Location machine %in% Location 
                   17                    21 
#decrease estimate
> (10^fixef(lme1)['StartDate'])^365
StartDate 
0.9476403 

Code

Import data

library(XLConnect)

readsheet <- function(wb,cursheet) {
  df = readWorksheet(wb, sheet = cursheet , header = TRUE)
  topline <- 8
  test <- which(df[6,]=='C')+1
  numin <- df[topline:nrow(df),test]
  names(numin) <- make.names( gsub(' ','',df[6,test]))
  for(i in 1:ncol(numin)) numin[,i] <- as.numeric(gsub(',','.',numin[,i]))
  status = df[topline:nrow(df),test-1]
  names(status) <- paste('C',make.names( gsub(' ','',df[6,test])),sep='')
  
  numin$StartDate <- as.Date(substr(df[topline:nrow(df),1],1,10))
  numin$EndDate <-  as.Date(substr(df[topline:nrow(df),2],1,10))
  numin <- cbind(numin,status)
  tall <- reshape(numin,
      varying=list(make.names( gsub(' ','',df[6,test])),paste('C',make.names( gsub(' ','',df[6,test])),sep='')),
      v.names=c('Amount','Status'),
      timevar='Compound',
      idvar=c('StartDate','EndDate'),
      times=paste(df[6,test],'(',df[5,test],')',sep=''),
      direction='long')
  tall$Compound <- factor(gsub(' )',')',gsub(' +',' ',tall$Compound)))
  row.names(tall)<- NULL
  tall$Location <- cursheet
  tall
}

readfile <- function(fname) {
  wb <- loadWorkbook(fname)
  print(wb)
  sheets <- getSheets(wb)[-1]
  tt <- lapply(sheets,readsheet,wb=wb)
  tt2 <- do.call(rbind,tt) 
  tt2$Location <- factor(tt2$Location)
  tt2$fname <- fname
  tt2
}

fnames <- dir(pattern='*.xls') 
rf <- lapply(fnames,readfile)
rf2 <- do.call(rbind,rf)

(ll <- levels(rf2$Location))
ll[c(13,17)] <- 'V&W'
ll[c(9,4)] <- 'L&DZ'
rf2$Location2 <- factor(ll[rf2$Location])
head(rf2)
rf2$machine <- factor(ifelse(rf2$fname=="lmre_1992-2005.xls",
        'Van Essen/ECN vanger',
        ' Eigenbrodt NSA 181 KHT'))

pH plot


library(ggplot2)
library(scales)
levels(rf2$Compound)
(cc <- levels(rf2$Compound)[22])
rf2$Amount2 <- rf2$Amount
rf2$Amount2[rf2$Amount<1 & rf2$Compound==cc] <- NA
rf2[rf2$Amount < 1 & rf2$Compound==cc & !is.na(rf2$Amount),]
pp <- ggplot(rf2[rf2$Compound==cc & !(rf2$Location2 %in% ll[c(2,5,7,15)]),], 
    aes( StartDate,Amount2,col=machine))
pp + geom_point() +
    facet_wrap(~Location2) +
    ylab(cc) + xlab('Year') + ylim(c(4,7)) +
    scale_x_date(breaks=as.Date(paste(c('1992','2000','2010'),'-01-01',sep=''),
            format=('%Y-%m-%d')),
        labels = date_format("%Y")) +
     theme(legend.position = "bottom")


Fe Plot


(cc <- levels(rf2$Compound)[9])
#levels(rf2$Location2)
pp <- ggplot(rf2[rf2$Compound==cc & (rf2$Location2 %in% 
            levels(rf2$Location2)[c(7,8,10,13)]),], 
  aes( StartDate,Amount,color=machine))
pp + geom_point() +
    scale_y_log10() +
    facet_wrap(~Location2) +
    ylab(cc) + xlab('Year') +
    scale_x_date(breaks=as.Date(paste(c('1992','2000','2010'),'-01-01',sep=''),
            format=('%Y-%m-%d')),
        labels = date_format("%Y")) +
        theme(legend.position = "bottom")

Fe model


Fe <- rf2[rf2$Compound==levels(rf2$Compound)[9],]
# plot(density(Fe$Amount[!is.na(Fe$Status) & Fe$Status!=0],adjust=.5))
Fe$LAmount <- log10(Fe$Amount)
Fe$LAmount[Fe$Amount<=0] <- NA
Fe$Location <- factor
#nlme
library(nlme)
Fe2 <- Fe[complete.cases(Fe),]
lme1 <- lme(LAmount ~  StartDate,
 #   random= ~ 1|machine, # machine = groups
 #   random=~StartDate | Location, # location=groups
     random=list(Location=~StartDate, machine=~1),   
    weights=varIdent(form=~1 |  machine),
    data=Fe2 )    
summary(lme1)