tag:blogger.com,1999:blog-35246178920040558302024-03-14T01:34:07.042+01:00WiekvoetWingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.comBlogger182125tag:blogger.com,1999:blog-3524617892004055830.post-12709887963652391082016-03-13T22:53:00.002+01:002016-03-13T22:53:33.398+01:00Happy PI dayI have never done a post for PI day. This year I want to do so.<br />
<br />
So, we all know the simple estimation of PI based on random numbers. The code used here is chosen for speed in R.<br />
<div>
<div>
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">pi2d <- function(N=1000) {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> 4*sum(rowSums(matrix(runif(N*2)^2,ncol=2))<1)/N</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">}</span></div>
</div>
<div>
What irritates me, is the low efficiency of this estimate. What do you get for 10 000 simulations? Probably, but not even certain, the first two digits.</div>
<div>
<div>
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">summary(sapply(1:1000,function(x) pi2d(10000)))</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> Min. 1st Qu. Median Mean 3rd Qu. Max. </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> 3.080 3.130 3.141 3.141 3.153 3.189 </span></div>
</div>
<div>
In the past years I have been thinking how to get that more efficient, but that is not obvious. For instance, it is possible to use the three dimensional equivalent, a ball:</div>
<div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">pi3d <- function(N=1000) {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> 6*sum(rowSums(matrix(runif(N*3)^2,ncol=3))<1)/N</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">}</span></div>
</div>
<div>
<div style="font-size: small;">
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">summary(sapply(1:1000,function(x) pi3d(10000)))</span><br />
<span style="background-color: white;"><span style="font-family: Courier New, Courier, monospace;"> Min. 1st Qu. Median Mean 3rd Qu. Max. </span></span></div>
<div>
<div>
<span style="background-color: white; font-family: Courier New, Courier, monospace;"> 3.052 3.121 3.140 3.142 3.161 3.243 </span></div>
</div>
<span style="font-family: "courier new" , "courier" , monospace;">
</span>This is even worse, the variation is higher.<br />
<br />
At some point I thought this is due to the limited information in such a calculation, it is binomial and one simulation gives one bit of information. And it could be more simple. If the first random number is known, say <i>y</i>, then all second random numbers over sqrt(1-<i>y</i><sup>2</sup>) give distance larger than 1, while the remainder gives distance less than 1. Thus should pi be equal to the mean of random numbers transformed like sqrt(1-<i>y</i><sup>2</sup>)?<br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">pin <- function(N=1000) {</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> 4*sum(sqrt(1-runif(N)^2))/N</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">}</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">summary(sapply(1:1000,function(x) pin(10000)))</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> Min. 1st Qu. Median Mean 3rd Qu. Max. </span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> 3.113 3.135 3.142 3.141 3.147 3.171 </span><br />
These numbers are closer, but there are additional calculations. Hence the number of simulations should be adapted to reflect the work done. Luckily we have microbenchmark() to calibrate this. After a bit of experimenting, these are the number of simulations giving roughly equivalent computation times.<br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace;"><span style="font-size: x-small;">microbenchmark(pi2d(10000),pi3d(6666),pin(22000))</span></span></div>
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">Unit: milliseconds</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> expr min lq mean median uq max neval</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> pi2d(10000) 2.419106 2.436333 2.630764 2.458325 2.500477 5.253860 100</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> pi3d(6666) 2.361928 2.382820 2.557150 2.418006 2.467855 4.970898 100</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> pin(22000) 2.448429 2.468954 2.555823 2.485815 2.517703 5.023678 100</span></div>
</div>
As can be seen, the third calculation actually has more simulations. Hence it is much more efficient to obtain the estimate.<br />
<div>
<div style="font-size: small;">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span style="background-color: #f3f3f3;">summary(sapply(1:100,function(x) pi2d(10000)))</span></span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace;"> Min. 1st Qu. Median Mean 3rd Qu. Max. </span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace;"> 3.111 3.132 3.141 3.142 3.152 3.175 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span style="background-color: #f3f3f3;">summary(sapply(1:100,function(x) pi3d(6666)))</span></span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace;"> Min. 1st Qu. Median Mean 3rd Qu. Max. </span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace;"> 3.046 3.116 3.142 3.139 3.165 3.230 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span style="background-color: #f3f3f3;">summary(sapply(1:100,function(x) pin(22000)))</span></span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace;"> Min. 1st Qu. Median Mean 3rd Qu. Max. </span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace;"> 3.130 3.137 3.141 3.142 3.146 3.161 </span></div>
<span style="font-family: "courier new" , "courier" , monospace;">
</span>It could obviously be thought that the random numbers are not needed. An integration can be done. But that is much less fun.<span style="font-family: "courier new" , "courier" , monospace;"></span><br />
<div>
<span style="font-family: "courier new" , "courier" , monospace;"><span style="background-color: #f3f3f3; font-size: x-small;">integrate(function(x) 4*sqrt(1-x^2),0,1)</span></span></div>
<span style="font-family: "courier new" , "courier" , monospace;">
<div>
<div>
3.141593 with absolute error < 0.00016</div>
</div>
<div>
<br /></div>
</span></div>
Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com0tag:blogger.com,1999:blog-3524617892004055830.post-19105957882753678372016-02-14T19:51:00.001+01:002016-02-14T19:51:09.195+01:00Confidence intervals for ProportionsSince I read documents with Clopper-Pearson a number of times the last weeks, I thought it a good idea to play around with confidence intervals for proportions a bit; to examine how intervals differ between various approaches. From a frequentist side Clopper-Pearson, which is described as the frequentist's gold standard and secondly the easy way normal approximation. From the Bayesian side, binomial with beta Beta prior. Obviously, the intervals have completely different interpretation in the frequentist and Bayesian framework, but that is a different discussion. There will be no data in this analysis, I am just making intervals based on possible results<br />
<h3>
Code</h3>
There are many ways to set this up. I wanted some plots. My first approach; given an observed proportion of 'correct', how does the total of trials change the intervals? The second approach; given that a certain number of trials is done, how do the intervals change as the number correct changes?<br />
<br />
Since I want to repeat many of these calculations, I first made some supporting functions. This is because I am trying to write more clear code, where as much as possible code is not repeated but rather delegated to some sort of function. That may not result in the shortest or fastest code, but at this point neither is required.<br />
<h4>
Intervals</h4>
<div>
The first functions create the intervals from n (observed) and N (total). Clopper-Pearson is extracted from binom.test(). Normal approximation is based on an internet example. Beta-Binomial has three functions, one for the actual work, two to set up the desired priors and adapt the naming. A final function calls all these. </div>
<div>
<div>
<br /></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">clopper.pearson <- function(n,N,conf.level=0.95) {</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> limits <- as.numeric(binom.test(n,N,conf.level=conf.level)$conf.int)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> names(limits) <- c('cp_low','cp_high')</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> limits</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">}</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">binom.norm.app <- function(n,N,conf.level=0.95) {</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> # based on http://www.r-tutor.com/elementary-statistics/interval-estimation/interval-estimate-population-proportion</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> phat <- n/N</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> shat <- sqrt(phat*(1-phat)/N)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> limit <- (1-conf.level)/2</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> zlim <- qnorm( c(limit,1-limit))*shat</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> limits <- phat+zlim</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> names(limits) <- c('na_low','na_high')</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> limits </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">}</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">beta.binomial <- function(n,N,conf.level=0.95,prior=c(1,1)){</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> limit <- (1-conf.level)/2</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> limits <- qbeta(c(limit,1-limit),n+prior[1],N-n+prior[2])</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> names(limits) <- c('bb_low','bb_high')</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> limits</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">}</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">beta.binomial11 <- function(n,N,conf.level=0.95) {</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> limits <- beta.binomial(n,N,conf.level=conf.level,prior=c(1,1))</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> names(limits) <- c('bb11_low','bb11_high')</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> limits</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">}</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">beta.binomial.5.5 <- function(n,N,conf.level=0.95) {</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> limits <- beta.binomial(n,N,conf.level=conf.level,prior=c(0.5,0.5))</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> names(limits) <- c('bb.5_low','bb.5_high')</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> limits</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">}</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">all.intervals <- function(n,N,conf.level=0.95) {</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> c(n=n,N=N,conf.level=conf.level,</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> clopper.pearson(n,N,conf.level=conf.level),</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> binom.norm.app(n,N,conf.level=conf.level),</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> beta.binomial11(n,N,conf.level=conf.level),</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> beta.binomial.5.5(n,N,conf.level=conf.level))</span></div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">}</span></div>
<h4>
Post processing</h4>
<div>
Just doing an sapply() on all.intervals() gives a matrix. It needs to be processed a bit to get a nice data.frame which ggplot likes. Hence a function in which it is transposed, reshaped and names of the intervals are split. Naming is adapted for display purposes.</div>
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">postprocessing <- function(have1outN) {</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> have1outN <- as.data.frame(t(have1outN))</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> have1outN <- reshape(have1outN,</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> varying=list(names(have1outN)[-1:-3]),</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> idvar=c('n','N','conf.level'),</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> timevar='statistic',</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> times=names(have1outN)[-1:-3],</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> v.names='limit',</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> direction='long')</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> have1outN$direction <- </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> sub('^.+_','',have1outN$statistic)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> have1outN$Method <- </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> sub('_.+$','',have1outN$statistic)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> have1outN$Method[have1outN$Method=='cp']<- 'Clopper Pearson'</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> have1outN$Method[have1outN$Method=='na']<-'Normal Approximation'</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> have1outN$Method[have1outN$Method=='bb.5']<- 'Beta Bionomial prior 0.5 0.5'</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> have1outN$Method[have1outN$Method=='bb11']<- 'Beta Bionomial prior 1 1'</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> have1outN</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">}</span></div>
</div>
<h3>
Results</h3>
<h4>
Results for a proportion correct</h4>
<div>
The codes are variations on this example for 50% correct. As most of the work is done in the supporting functions, there is no need to repeat the code:</div>
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">have1outN <- sapply(1:20,function(x) all.intervals(1*x,2*x))</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">have1outN <- postprocessing(have1outN)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">ggplot(have1outN,aes(x=limit,y=N,col=Method,l=direction)) + </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> geom_path() +</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> xlim(c(min(0,have1outN$limit),max(1,have1outN$limit))) +</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> ggtitle('Interval at 1/2 correct') +</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> theme(legend.position="bottom") +</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> guides(col=guide_legend(ncol=2))</span></div>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgntz5BTXRfJRjHrAMZmHpILEUKXYbQpS7UQne6OXE-xO__h7vIQ6VtYGGx9IIpwC-K0tCoL4dJes7xWuKsFcDejofe0GYaTaOPW6Scz3KWDY23tNBqOtdTiLsvnjgoxR7dVhA587c-ftQ/s1600/prop12.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgntz5BTXRfJRjHrAMZmHpILEUKXYbQpS7UQne6OXE-xO__h7vIQ6VtYGGx9IIpwC-K0tCoL4dJes7xWuKsFcDejofe0GYaTaOPW6Scz3KWDY23tNBqOtdTiLsvnjgoxR7dVhA587c-ftQ/s1600/prop12.png" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgivRcFQzztofECKkRn7X8dyErSGLiR_CoIa5F7cREnI7Zma8nu8Iegz9jxUNzB5Gpg7bXrqSXtwTaXNLZlzNyHGRXUIxWQfCpNJ7iUMmzm2Awxd89EsktwxatiW_UO-4gud7FgigSsBjM/s1600/prop13.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgivRcFQzztofECKkRn7X8dyErSGLiR_CoIa5F7cREnI7Zma8nu8Iegz9jxUNzB5Gpg7bXrqSXtwTaXNLZlzNyHGRXUIxWQfCpNJ7iUMmzm2Awxd89EsktwxatiW_UO-4gud7FgigSsBjM/s1600/prop13.png" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgiFH_NxjtUM_jSnl28Zu0b9ZU4l5JLJMZF1do6wx44mUcmjk9qw-NuRQLIKYLOmzZLYANqqnhnnzxqtPEB1dQu-SdthQEK22tO6S9dofh38AVh8nQfxzv6CtaHO9k_DUDm4q5oefkiFJY/s1600/prop15.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgiFH_NxjtUM_jSnl28Zu0b9ZU4l5JLJMZF1do6wx44mUcmjk9qw-NuRQLIKYLOmzZLYANqqnhnnzxqtPEB1dQu-SdthQEK22tO6S9dofh38AVh8nQfxzv6CtaHO9k_DUDm4q5oefkiFJY/s1600/prop15.png" /></a></div>
<div>
<br /></div>
<div>
It seems that especially at lower N the Normal approximation is not advisable. Having an interval stick outside the range 0-1 is obviously a dead giveaway that something is not correct. But even if that does not happen, the lines are pretty far of the remainder of the methods. The difference between the two Beta Binomials is surprisingly small and only visible when very few observations are made. Clopper-Pearson seems to give slightly wider intervals than Beta Binomial.</div>
<h4>
Results for a fixed N</h4>
<div>
Again, the code is variations on a theme, with the work being done by the supporting functions.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEju8t6uUhC60GZ5mUVJWs8qltVZhGV0UcCSXFN5whSq6mtxfnUH59ku2vUzpb40znIeg5CxTYVIiUnJBXtHNEXr_hbkYxPR2yWknXZmAXpUdWVFdq4FTWFb679EheLJIk83ZrCxBRnZo-g/s1600/prop20.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEju8t6uUhC60GZ5mUVJWs8qltVZhGV0UcCSXFN5whSq6mtxfnUH59ku2vUzpb40znIeg5CxTYVIiUnJBXtHNEXr_hbkYxPR2yWknXZmAXpUdWVFdq4FTWFb679EheLJIk83ZrCxBRnZo-g/s1600/prop20.png" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjXLI3wKnH66P-VGCWA4-GmIgRSquYuSqBtdm7PKKFg2Lr_Emy3IY8fPBTvt57JWvKdKErdFrF82MG1NiRPWQUUydmTdH8ar5LZJ2m0iROH6Isac40hG-QDW8s0_FSHCIdIWJ7Yr6TJIGo/s1600/prop80.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjXLI3wKnH66P-VGCWA4-GmIgRSquYuSqBtdm7PKKFg2Lr_Emy3IY8fPBTvt57JWvKdKErdFrF82MG1NiRPWQUUydmTdH8ar5LZJ2m0iROH6Isac40hG-QDW8s0_FSHCIdIWJ7Yr6TJIGo/s1600/prop80.png" /></a></div>
<div>
Again the normal approximation is the odd out. It also seems to degenerate at n=0 and n=N. Other than that the choice of prior in the Beta Binomial is more expressed that the previous plots.</div>
Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com0tag:blogger.com,1999:blog-3524617892004055830.post-81045148861451033582016-02-02T20:50:00.002+01:002016-02-02T20:50:48.211+01:00Unemployment in EuropeA couple of years I have made plots of unemployment and its change over the years. At first this was a bigger and complex piece of code. As things have progressed, the code can now become pretty concise. There are just plenty of packages to do the heavy lifting. So, this year I tried to make the code easy to read and reasonably documented.<br />
<h3>
Data</h3>
Data is from Eurostat. Since we have the joy of the Eurostat package, suffice to say this is dataset une_rt_m. Since the get_eurostat function gave me codes for things such as country and gender, the first step is to use a dictionary to decode. Subsequently, the country names are a bit sanitized and data is selected.<br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">library(eurostat)</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">library(ggplot2)</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">library(KernSmooth)</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">library(plyr)</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">library(dplyr)</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">library(scales) # to access breaks/formatting functions</span><br />
<div>
<br /></div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">r1 <- get_eurostat('une_rt_m')%>%</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> mutate(.,geo=as.character(geo)) # character preferred for merge</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">r2 <- get_eurostat_dic('geo') %>%</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> rename(.,geo=V1) %>%</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> mutate(.,</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"># part of country name within braces removed </span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> country=gsub('\\(.*$','',V2),</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> country=gsub(' $','',country),</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> country=ifelse(geo=='EA19',paste(country,'(19)'),country)) %>%</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> select(.,geo,country) %>%</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> right_join(.,r1) %>%</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"># keep only total, drop sexes</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> filter(.,sex=='T') %>%</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"># filter out old Euro area and keep only EU28 , EA19 </span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> filter(.,!grepl('EA..',geo)| geo=='EA19') %>% </span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> filter(.,!(geo %in% c('EU15','EU25','EU27')) ) %>% </span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"># SA is seasonably adjusted </span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> filter(.,s_adj=='SA') %>% </span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> mutate(.,country=factor(country)) %>%</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> select(.,-sex,-s_adj)</span><br />
<h3>
Plots</h3>
<div>
To make plots I want to have smoothed data. Ggplot will do this, but it is my preference to have the same smoothing for all curves, hence it is done before entering ggplot. There are a bit many countries, hence the number is reduced to 36, which are displayed in three plots of 3*4, for countries with low, middle and high maximum unemployment respectively. Two smoothers are applied, once for the smoothed data, the second for its first derivative. The derivative has forced more smooth, to avoid extreme fluctuation.</div>
<div>
<div>
<br /></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"># add 3 categories for the 3 3*4 displays</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">r3 <- aggregate(r2$values,by=list(geo=r2$geo),FUN=max,na.rm=TRUE) %>%</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> mutate(.,class=cut(x,quantile(x,seq(0,3)/3),</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> include.lowest=TRUE,</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> labels=c('low','middle','high'))) %>%</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> select(.,-x) %>% # maxima not needed any more</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> right_join(.,r2)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">#locpoly to make smooth same for all countries</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">Perc <- ddply(.data=r3,.variables=.(age,geo), </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> function(piece,...) {</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> piece <- piece[!is.na(piece$values),]</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> lp <- locpoly(x=as.numeric(piece$time),y=piece$values,</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> drv=0,bandwidth=90)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> sdf <- data.frame(Date=as.Date(lp$x,origin='1970-01-01'),</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> sPerc=lp$y,</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> age=piece$age[1],</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> geo=piece$geo[1],</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> country=piece$country[1],</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> class=piece$class[1])}</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> ,.inform=FALSE</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">)</span></div>
</div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"></span><br />
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"># locpoly for deriviative too</span></div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">
<div>
dPerc <- ddply(.data=r3,.variables=.(age,geo), </div>
<div>
function(piece,...) {</div>
<div>
piece <- piece[!is.na(piece$values),]</div>
<div>
lp <- locpoly(x=as.numeric(piece$time),y=piece$values,</div>
<div>
drv=1,bandwidth=365/2)</div>
<div>
sdf <- data.frame(Date=as.Date(lp$x,origin='1970-01-01'),</div>
<div>
dPerc=lp$y, </div>
<div>
age=piece$age[1],</div>
<div>
geo=piece$geo[1],</div>
<div>
country=piece$country[1],</div>
<div>
class=piece$class[1])}</div>
<div>
,.inform=FALSE</div>
<div>
)</div>
</span>The plots are processed by subsection.</div>
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">for (i in c('low','middle','high')) {</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> png(paste(i,'.png',sep=''))</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> g <- filter(Perc,class==i) %>%</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> ggplot(.,</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> aes(x=Date,y=sPerc,colour=age)) +</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> facet_wrap( ~ country, drop=TRUE) +</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> geom_line() +</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> theme(legend.position = "bottom")+</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> ylab('% Unemployment') + xlab('Year') +</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> scale_x_date(breaks = date_breaks("5 years"),</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> labels = date_format("%y")) </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> print(g)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> dev.off()</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">}</span></div>
</div>
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">for (i in c('low','middle','high')) {</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> png(paste('d',i,'.png',sep=''))</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> g <- filter(dPerc,class==i) %>%</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> ggplot(.,</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> aes(x=Date,y=dPerc,colour=age)) +</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> facet_wrap( ~ country, drop=TRUE) +</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> geom_line() +</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> theme(legend.position = "bottom")+</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> ylab('Change in % Unemployment') + xlab('Year')+</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> scale_x_date(breaks = date_breaks("5 years"),</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> labels = date_format("%y"))</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> print(g)</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> dev.off()</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">}</span></div>
</div>
<h3>
Results</h3>
<div>
In general, things are improving, which is good news, though there is still ways to go. As always, Eurostat has a nice <a href="http://ec.europa.eu/eurostat/statistics-explained/index.php/Unemployment_statistics">document</a> are certainly more knowledgeable than me on this topic. </div>
<h4>
Average unemployment</h4>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzmhptRdSqDXLRTs5NVwJOdv7mXYFiIPbkExWoQd1ndz_34sS9Cr-YHD7qVIfGSMiLlCDkTM8O-FiCECxB-1UGPHX4rb7yXU5NMpBEAeMdqztpbGPzqRCh-KmnHivkXqnMhv2vggijkd4/s1600/low.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzmhptRdSqDXLRTs5NVwJOdv7mXYFiIPbkExWoQd1ndz_34sS9Cr-YHD7qVIfGSMiLlCDkTM8O-FiCECxB-1UGPHX4rb7yXU5NMpBEAeMdqztpbGPzqRCh-KmnHivkXqnMhv2vggijkd4/s1600/low.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0kcBzjzeeA3R5J4F1Jz7s4ilxuLCv_4UR-XUUaoKJ7JT6p3uKGJTEBnbXPw7Vot3OQzgr5s85N8Dzco8SoE9lZQ9j2WXrmBZD8kRSv7Dpuxw-vty2zjT72FXTKwbQD7VumyNzYlK5IZc/s1600/middle.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0kcBzjzeeA3R5J4F1Jz7s4ilxuLCv_4UR-XUUaoKJ7JT6p3uKGJTEBnbXPw7Vot3OQzgr5s85N8Dzco8SoE9lZQ9j2WXrmBZD8kRSv7Dpuxw-vty2zjT72FXTKwbQD7VumyNzYlK5IZc/s1600/middle.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSDhK_PRcJAcGnohWW2_xZvgJJL19vbFi4ZfPF9U2gQy_A8TyM-tlWB3jMZ63fUHU_ADv3HwlSq9svJSJhJ0nKHM96Wo0QfpdrQ_Xnb3i-kRsjYyz-9dzyEwpek9MWzSP4dSi8NnXKeAU/s1600/high.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSDhK_PRcJAcGnohWW2_xZvgJJL19vbFi4ZfPF9U2gQy_A8TyM-tlWB3jMZ63fUHU_ADv3HwlSq9svJSJhJ0nKHM96Wo0QfpdrQ_Xnb3i-kRsjYyz-9dzyEwpek9MWzSP4dSi8NnXKeAU/s1600/high.png" /></a></div>
<h4>
First derivative</h4>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLiIsg7nONpg3HHRZhAFT-eSmgUCNL6aCu81P0bI7mvKEQv7Kn4ruXLnBqf_7Wmov-d0d_RFKE4Iw2wPWYkQd9YcsL1y-3D-mMBFolOvxeIFF3ucoN-onev80K5Gp2JArfqH8L-TO9rdQ/s1600/dlow.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLiIsg7nONpg3HHRZhAFT-eSmgUCNL6aCu81P0bI7mvKEQv7Kn4ruXLnBqf_7Wmov-d0d_RFKE4Iw2wPWYkQd9YcsL1y-3D-mMBFolOvxeIFF3ucoN-onev80K5Gp2JArfqH8L-TO9rdQ/s1600/dlow.png" /></a></div>
<div>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiZkilRkmqhW3U52k-iqr8f80afA5OYb_rYkZ5dAdnLKhy4Xyry17Hx8hX9y_xNwGpAYmbFh7vqO__PbhRWUyriuc3CDSFelUdLY2Qs9v_21RLRKrJ_DdFFocgYk3O7TPvWuzDwyiLBz0/s1600/dmiddle.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiZkilRkmqhW3U52k-iqr8f80afA5OYb_rYkZ5dAdnLKhy4Xyry17Hx8hX9y_xNwGpAYmbFh7vqO__PbhRWUyriuc3CDSFelUdLY2Qs9v_21RLRKrJ_DdFFocgYk3O7TPvWuzDwyiLBz0/s1600/dmiddle.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiW_0gaRDz8Z9VMv0bSz_0HDwKB4fR-nxEH6kC0cVZ-X1y0Hru-PSyopc3ZJ_UHFEQJSWO3qBs4-lRa3vVM50KyiCOxmEmLFbvkfwLzg90b2cjp3g2htWvyM2khReVN3NBOdYMSPGq8-zM/s1600/dhigh.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiW_0gaRDz8Z9VMv0bSz_0HDwKB4fR-nxEH6kC0cVZ-X1y0Hru-PSyopc3ZJ_UHFEQJSWO3qBs4-lRa3vVM50KyiCOxmEmLFbvkfwLzg90b2cjp3g2htWvyM2khReVN3NBOdYMSPGq8-zM/s1600/dhigh.png" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div>
<br /></div>
Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com4tag:blogger.com,1999:blog-3524617892004055830.post-18506905792198400122016-01-17T17:38:00.000+01:002016-01-17T17:38:16.234+01:00A simple ANOVAI was browsing Davies <i>Design and Analysis of Industrial Experiments</i> (second edition, 1967). Published by for ICI in times when industry did that kind of thing. It is quite an applied book. On page 107 there is an example where the variance of a process is estimated.<br />
<h3>
Data</h3>
Data is from nine batches from which three samples were selected (A, B and C) and each a duplicate measurement. I am not sure about copyright of these data, so I will not reprint the data here. The problem is to determine the measurement ans sampling error in a chemical process.<br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">ggplot(r4,aes(x=Sample,y=x))+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> geom_point()+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> facet_wrap(~ batch )</span><br />
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrZ1GwnrnvdtsoNOgQNlWIbjDf0887Zksm7QBLS7B1g7LgJIe69qTaMctMT2S3EEOGUEhGGFvUw6FvSs2213LMSmZFo95pHmKRKDw261tNcFUkZ9pGtDSKwljHBO7UY6xrhfL2_tU9cC8/s1600/p1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrZ1GwnrnvdtsoNOgQNlWIbjDf0887Zksm7QBLS7B1g7LgJIe69qTaMctMT2S3EEOGUEhGGFvUw6FvSs2213LMSmZFo95pHmKRKDw261tNcFUkZ9pGtDSKwljHBO7UY6xrhfL2_tU9cC8/s1600/p1.png" /></a></div>
<br />
<br />
<h3>
Analysis</h3>
At the time of writing the book, the only approach was to do a classical ANOVA and calculate the estimates from there.<br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace;">aov(x~ batch + batch:Sample,data=r4) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace;"> anova</span><br />
<span style="font-family: Courier New, Courier, monospace;">Analysis of Variance Table</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">Response: x</span><br />
<span style="font-family: Courier New, Courier, monospace;"> Df Sum Sq Mean Sq F value Pr(>F) </span><br />
<span style="font-family: Courier New, Courier, monospace;">batch 8 792.88 99.110 132.6710 < 2e-16 ***</span><br />
<span style="font-family: Courier New, Courier, monospace;">batch:Sample 18 25.30 1.406 1.8818 0.06675 . </span><br />
<span style="font-family: Courier New, Courier, monospace;">Residuals 27 20.17 0.747 </span><br />
<span style="font-family: Courier New, Courier, monospace;">---</span><br />
<span style="font-family: Courier New, Courier, monospace;">Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1</span><br />
<span style="font-family: inherit;">In this case the residual variation is 0.75. The batch:Sample variation estimates is, due to the design, twice the sapling variation plus residual variation. Hence it is estimated as 0.33. How lucky we are to have tools (lme4) which can do this estimate directly. In this case, as it was a well designed experiment, these estimates are the same as from the ANOVA. </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">l1 <- lmer(x ~1+ (1 | batch) + (1|batch:Sample) ,data=r4 )</span><br />
<br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">summary(l1)</span><br />
<span style="font-family: Courier New, Courier, monospace;">Linear mixed model fit by REML ['lmerMod']</span><br />
<span style="font-family: Courier New, Courier, monospace;">Formula: x ~ 1 + (1 | batch) + (1 | batch:Sample)</span><br />
<span style="font-family: Courier New, Courier, monospace;"> Data: r4</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">REML criterion at convergence: 189.4</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">Scaled residuals: </span><br />
<span style="font-family: Courier New, Courier, monospace;"> Min 1Q Median 3Q Max </span><br />
<span style="font-family: Courier New, Courier, monospace;">-1.64833 -0.50283 -0.06649 0.55039 1.57801 </span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">Random effects:</span><br />
<span style="font-family: Courier New, Courier, monospace;"> Groups Name Variance Std.Dev.</span><br />
<span style="font-family: Courier New, Courier, monospace;"> batch:Sample (Intercept) 0.3294 0.5739 </span><br />
<span style="font-family: Courier New, Courier, monospace;"> batch (Intercept) 16.2841 4.0354 </span><br />
<span style="font-family: Courier New, Courier, monospace;"> Residual 0.7470 0.8643 </span><br />
<span style="font-family: Courier New, Courier, monospace;">Number of obs: 54, groups: batch:Sample, 27; batch, 9</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">Fixed effects:</span><br />
<span style="font-family: Courier New, Courier, monospace;"> Estimate Std. Error t value</span><br />
<span style="font-family: Courier New, Courier, monospace;"></span><br />
<span style="font-family: Courier New, Courier, monospace;">(Intercept) 47.148 1.355 34.8</span><br />
<span style="font-family: inherit;">A next step is confidence intervals around the estimates. Davies uses limits from a Chi-squared distribution for the residual variation, leading to a 90% interval 0.505 to 1.25. In contrast lme4 has two estimators, profile (</span><i>computing a likelihood profile and finding the appropriate cutoffs based on the likelihood ratio test;</i>) <span style="font-family: inherit;">and bootstrap (</span><i>perform parametric bootstrapping with confidence intervals computed from the bootstrap distribution according to boot.type</i><span style="font-family: inherit;">). Each of these takes one or few second on my laptop, not feasible for the pre computer age. The estimates are different, to my surprise more narrow:</span><br />
<span style="font-family: Courier New, Courier, monospace;">Computing profile confidence intervals ...</span><br />
<span style="font-family: Courier New, Courier, monospace;"> 5 % 95 %</span><br />
<span style="font-family: Courier New, Courier, monospace;">.sig01 0.0000000 0.9623748</span><br />
<span style="font-family: Courier New, Courier, monospace;">.sig02 2.6742109 5.9597328</span><br />
<span style="font-family: Courier New, Courier, monospace;">.sigma 0.7017849 1.1007261</span><br />
<span style="font-family: Courier New, Courier, monospace;">(Intercept) 44.8789739 49.4173227</span><br />
<br />
<span style="font-family: Courier New, Courier, monospace;">Computing bootstrap confidence intervals ...</span><br />
<span style="font-family: Courier New, Courier, monospace;"> 5 % 95 %</span><br />
<span style="font-family: Courier New, Courier, monospace;">sd_(Intercept)|batch:Sample 0.000000 0.8880414</span><br />
<span style="font-family: Courier New, Courier, monospace;">sd_(Intercept)|batch 2.203608 5.7998348</span><br />
<span style="font-family: Courier New, Courier, monospace;">sigma 0.664149 1.0430984</span><br />
<br />
<span style="font-family: Courier New, Courier, monospace;">(Intercept) 45.140652 49.4931109</span><br />
<span style="font-family: inherit;">Davies continues to estimate the ratio to residual for sampling variation, which was the best available for that time. This I won't repeat.</span>Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com0tag:blogger.com,1999:blog-3524617892004055830.post-10311199559270446402016-01-03T10:42:00.001+01:002016-01-03T10:42:47.738+01:00A plot of 'Who works at home'I ran across <a href="https://public.tableau.com/profile/shanwowww#!/vizhome/Whoworksathome/Whoworksathome">this post</a> containing displays on who works from home. I must say it looks great and is interactive but it did not help me understand the data. So I created this post to display the same data with a boring plot which might help me. For those really interested in this topic, census.gov created a <a href="http://www.census.gov/hhes/commuting/files/2010/P70-132.pdf">.pdf</a> which contains a full report with much more information than here.<br />
<h3>
Data</h3>
Data is from <a href="http://www.census.gov/hhes/commuting/data/workathome.html">census.gov</a>. I have taken the first spreadsheet. It is one of those spreadsheets with counts and percentages and empty lines to display categories. Very nice to check some numbers, horrible to process. So, a bit of code to extract the numbers.<br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(gdata)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r1 <- read.xls('2010_Table_1.xls',stringsAsFactors=FALSE)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># throw out percentages</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r2 <- r1[,r1[4,]!='Percent']</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># put all column names in one row</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r2$X.6[2] <- r2$X.6[3]</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r2$X.8[2] <- r2$X.8[3]</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># select part with data</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r3 <- r2[2:61,c(1,3,5,6)]</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">names(r3)[1] <- r3[1,1]</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r4 <-r3[c(-1:-3),]</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#eliminate one row with mean income. </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r4 <- r4[-grep('$',r4[,2],fixed=TRUE),]</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#reshape in long form</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r5 <- reshape(r4,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> varying=list(names(r4)[-1]),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> v.names='count',</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> direction='long',</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> idvar='Characteristic',</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> timevar='class',</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> times=r3[1,2:4])</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">row.names(r5) <- 1:nrow(r5)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># remove ',' from numbers and make numerical values. </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># units are in 1000, so update that too</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r5$count <- as.numeric(gsub(',','',r5$count))*1000</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># clean up numbers used for footnotes</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r5$class <- gsub('(1|2|3)','',r5$class)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#some upfront '.' removed.</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r5$Characteristic <- gsub('^\\.+','',r5$Characteristic)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># create a factor</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r5$Characteristic <- factor(r5$Characteristic,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> levels=rev(r5$Characteristic[r5$class=='Home Workers']))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># and create a higher level factor</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r5$Mchar=r5$Characteristic</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">for (i in 1:nrow(r5)) r5$Mchar[i] <- </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> if(is.na(r5$count[i]) | r5$Mchar[i]=='Total') r5$Mchar[i] </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> else r5$Mchar[i-1]</span><br />
<h3>
Plot</h3>
The plot is made using old style graphics. I could not get either ggplot2 or lattice to provide the plot I wanted.<br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># prepare for axis labels</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">index <- subset(r5,r5$class=='Home Workers',c(Characteristic,Mchar))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">index$y=56:1</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">index2 <- index[index$Characteristic!=index$Mchar | index$Characteristic=='Total',]</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">index3 <- index[index$Characteristic==index$Mchar & index$Characteristic!='Total',]</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r6 <- merge(r5,index)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r6$class <- factor(r6$class)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">par(mar=c(5,18,4,2)+.1,cex=.7)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">plot(x=r6$count,y=r6$y,axes=FALSE,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> xlab='Count',</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ylab='',</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> col=c('red','green','blue')[r6$class],</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> frame.plot=TRUE,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> # log='x',</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ylim=c(2,58))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">axis(1)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">axis(2,at=index2$y,labels=index2$Characteristic,las=1)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">text(y=index3$y-.1,x=30000,labels=index3$Characteristic,adj=0)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">legend('topleft',legend=levels(r6$class),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ncol=3,col=c('red','green','blue'),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> border=NULL,pch=1,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> yjust=0)</span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh72eIJXur8qUGXU8rJpmttEjvkplUKasdluPpxmdbumABDoHtM01IYiw4WljimGP2dd1AoFWREzTxFL10gh2GeQUh3xQ9QqS1LFjoFZ9MMYDk2Mv8qJKFCIMsa2j3sJp6azLEJhwnOSBI/s1600/plot1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh72eIJXur8qUGXU8rJpmttEjvkplUKasdluPpxmdbumABDoHtM01IYiw4WljimGP2dd1AoFWREzTxFL10gh2GeQUh3xQ9QqS1LFjoFZ9MMYDk2Mv8qJKFCIMsa2j3sJp6azLEJhwnOSBI/s1600/plot1.png" /></a></div>
<h4>
Why I did not use ggplot2?</h4>
The ideal solution for ggplot2 might look something like this:<br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r7 <- r5[!is.na(r5$count),]</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r7$Mchar <- factor(r7$Mchar,levels=unique(r7$Mchar))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">ggplot(data=r7,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> aes(x=Characteristic,y=count,col=class)) +</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> geom_point()+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> coord_flip()+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> xlab('')+ylab('')+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ylim(0,max(r5$count))+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> facet_wrap(~Mchar,scales='free_x',ncol=2)+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> theme(legend.position="bottom")</span><br />
However, this throws an error:<br />
<span style="background-color: white; color: red; font-family: Courier New, Courier, monospace; font-size: x-small;">Error in facet_render.wrap(plot$facet, panel, plot$coordinates, theme, : </span><br />
<span style="background-color: white; color: red; font-family: Courier New, Courier, monospace; font-size: x-small;"> ggplot2 does not currently support free scales with a non-cartesian coord or coord_flip.</span><br />
<div>
I also tried the system described here: <a href="http://wresch.github.io/2014/05/22/aligning-ggplot2-graphs.html">http://wresch.github.io/2014/05/22/aligning-ggplot2-graphs.html</a>, but I think width has changed in content, could not get that to be satisfactory.</div>
<div>
<div>
<br /></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(gtable)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(gridExtra)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt <- as.data.frame(table(r7$Mchar))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Var1</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Freq[12] <- tt$Freq[12] +15</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">la <- lapply(tt$Var1,function(x) {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> r8 <- r5[r5$Mchar==as.character(x) ,]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> r8 <- r8[ !is.na(r8$count),]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ggplot(data=r8,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> aes(x=Characteristic,y=count,col=class)) +</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> geom_point()+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> coord_flip()+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> xlab('')+ylab('')+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ylim(0,max(r5$count))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> })</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># http://wresch.github.io/2014/05/22/aligning-ggplot2-graphs.html</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">lax <- lapply(la,function(x) x$widths[2:3])</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">maxwidths <- do.call(grid::unit.pmax,lax)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">for(i in 1:12) la[[i]]$widths <- as.list(maxwidths)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">la[[12]] <- la[[12]] + </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> theme(legend.position="bottom", </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> plot.margin = unit(c(0.01, 0.1, 0.02, 0.1), "null"))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">for (i in 1:11) la[[i]] <- la[[i]] +</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> theme(legend.position="none",</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> axis.text.x = element_blank(),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> axis.title.x = element_blank(), </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> axis.ticks.x = element_blank(),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> plot.margin = unit(c(0.01, 0.1, 0.02, 0.1), "null"))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">lag <- lapply(la,ggplotGrob)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">g <- gtable_matrix(name = "demo",</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> grobs = matrix(lag, nrow = 12), </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> widths = unit(9, "null"),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> heights = unit(tt$Freq, "null"))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">grid.newpage()</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">grid.draw(g)</span></div>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiZls-mUSzbxJuSM78R4JRHKT6RprtuhGl-607dvO7TfLgNZfzn2hiwA2cwNjHRCmYvEWtW8Cchy6fPhTKnmTaOf2CVbIoT0umD2Q4YI7GXA33iapvtag9hyphenhyphenlxUQHG9roWhsUqSwxi5eE/s1600/plot2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiZls-mUSzbxJuSM78R4JRHKT6RprtuhGl-607dvO7TfLgNZfzn2hiwA2cwNjHRCmYvEWtW8Cchy6fPhTKnmTaOf2CVbIoT0umD2Q4YI7GXA33iapvtag9hyphenhyphenlxUQHG9roWhsUqSwxi5eE/s1600/plot2.png" /></a></div>
<div>
<br /></div>
Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com1tag:blogger.com,1999:blog-3524617892004055830.post-74666045035352353392015-12-12T18:00:00.000+01:002015-12-12T18:00:33.302+01:00Vacancies in the NetherlandsOver the last couple of years, each weekend I have registering how many vacancies websites claim to have. This post shows some of the observations one may draw from the plots.<br />
<h3>
Data</h3>
Data is from general and more specialized websites. The first observations started in 2010. Since that time there were a number of changes. In general, the most useful sites are uitzendbureau.nl, nuwerk.nl nationalevacaturebank.nl, indeed.nl and jobbird.nl. These all cover the whole market.<br />
Uitzendbureau (employment agency) will cater mostly for the short term solutions. As an employer, wants an employee quickly, without much hassle, which can let go just as easy, an uitzendbureau is the port of call. The website covers many of these agencies, from what I have seen, I can confirm the bigger agencies. This makes the number of vacancies they have a marker from the general work market. The other three cover more the jobs at employers themselves. I seem to remember that nationalevacaturebank (national vacancies bank) did a lot of advertisement at the time. Nuwerk.nl is a subsite of nu.nl, a popular news site in the Netherlands. The others I added later to the collection.<br />
In addition, there is VK banen, Intermediair and Stepstone. VK banen (Volkskrant) and Intermediair are the traditional sites for higher educated employees. Back in the time when internet was not a factor of importance, Intermediair came for free for high educated people starting just before graduation and ending at age 40 or 45. Thus traditionally it was one of the best places for vacancies. Volkskrant had somewhat lesser amount of vacancies, and many in education. At some point within the data they merged their vacancy activity. Stepstone is an international entity, which tries to get hold in the Dutch market.<br />
Finally, BCFJobs is a website which caters Bio (life sciences), Chemistry and Food jobs. It does jobs from medium level upwards. It got into this list, because this covers my personal background.<br />
<h4>
Data import</h4>
Over the years I have used several (spreadsheet) programs to store the data. Currently it sits in Libreoffice, thus I will extract the data from an ODS file. read.ods resulted in a list (for each page in the spreadsheet) with an entry with text columns for each column in the page, I did not notice any option to directly employ the column names which were in row 1. Hence there is a bit of processing to obtain a data.frame.<br />
<h3>
Plots</h3>
<h4>
General observations </h4>
In the first figure there are uitzendbureaus, nuwerk and nationalevacaturebank. The line for nuwerk is terminated. At that point they switched to vacancies supplied by monsterboard. The numbers were totally different. and I stopped registering them.<br />
Uitzendbureaus has quite fluctuating numbers. Especially in 2014, but to a lesser extend in other years, there is a spike just before summer holidays. Much of the work which needs to be done when people go on vacation is supplied via uitzendbureaus. A valley at the end of each year is caused by Christmas vacation. Many business close between Christmas and new year, somewhat depending on the actual weekdays on which they fall. As a result there is a dip in both uitzendbureau and nationalevacaturebank.nl. A final observation, which I find it difficult to explain is that the beginning of just many year shows some optimism. The number of vacancies increases. Then, after the summer vacation, things are more pessimistic. In the crisis years the second half of the year gave an actual decrease, post crisis it is a flat line. One possible explanation may be in the end of the school year just before summer. This means summer and autumn there may be a fresh batch out of school. Most of these may be entering their names at the uitzendbureau, one is required to search for a job, and this is one of the ways to go about that. By the end of the year these will all been supplied. But I could think of other reasons too (e.g. head count low at end of year, optimism/pessimism due to seasons).<br />
Regarding the crisis, starting mid 2011 and continuing till end of 2012 there are less and less vacancies. 2013 is the year things stabilized, while in 2014 things started to get better. In terms of vacancies, we are getting to the level of the beginning of the series, before the crisis hit hard.<br />
Nuwerk was actually able to weather the crisis reasonably well. However, subsequently they did lose a bit of market and the associated uptake in vacancies in 2014 was not captured. I imagine this is part of the reason they started using monsterboard.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEheC8Wl6QfoT6gvlmrNNbm3GPTrfCf1u7lhJle-jRPw-qIQYeZukWfw2beNGe697G4sFi8pdQcOEox-F3Xm-3LUxD4epd756-nwl3zHkjxp_QgCW24vJ88N00hQaZ81YCYQ4h7Hc4iew34/s1600/fig1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEheC8Wl6QfoT6gvlmrNNbm3GPTrfCf1u7lhJle-jRPw-qIQYeZukWfw2beNGe697G4sFi8pdQcOEox-F3Xm-3LUxD4epd756-nwl3zHkjxp_QgCW24vJ88N00hQaZ81YCYQ4h7Hc4iew34/s1600/fig1.png" /></a></div>
<h4>
Extending the web sites used</h4>
In this extended plot there a few extra sources. They are added for completeness sake. In addition, since the number of vacancies covers a larger scale, it is plotted on logarithmic scale. Werk is the government website. On this website one is required to register to which vacancies one has applied. Insufficient activity on the job acquisition will result in cuts in benefits. It also has vacancies. Unfortunately it is not a very good website and quite often in the weekend it is down for maintenance. I gave up on retrying later in the weekend, fortunately I was never in the situation of having to use this website.<br />
Jobbird did quite some advertising at some point. However, there are some odd spikes. I am not sure what that is, but this is for me a reason to doubt if their number of vacancies is a good indicator of what is happening on the jobs market.<br />
Indeed has a number for the new vacancies. This is therefor even more fluctuating. Every school vacation has a bigger or smaller dip. It is also one of the last websites which I added. Especially the large fluctuation was a big reason not to plot these data in the first figure.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiNaqZzA3MVnuV5KShj3vB3jr2eAgpXuVNC5dynpMRNcjDqHE5b-o9G_2PFR-yVGFyavaIRIZGadDA0jcI_p29Thi8K40roEanqJDXpWHgUKH957CoAvbm0MHivwhcZl2sbG0C-__wamzQ/s1600/fig2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiNaqZzA3MVnuV5KShj3vB3jr2eAgpXuVNC5dynpMRNcjDqHE5b-o9G_2PFR-yVGFyavaIRIZGadDA0jcI_p29Thi8K40roEanqJDXpWHgUKH957CoAvbm0MHivwhcZl2sbG0C-__wamzQ/s1600/fig2.png" /></a></div>
<h4>
Higher education</h4>
VK (Volkskrant) was traditionally the website where school teachers were recruited. As a consequence, somewhere end of April, beginning of May, a large number of vacancies appeared for the next school year. These are the peaks visible in the three years for which VK has data. A similar but much less pronounced peak is visible in 2013 and 2014. In the section above, I stated that 2013 was the year things stabilized. This is not true in this plot. Intermediair showed a decrease in 2013. Stepstone was able to gain some market, which they subsequently lost in 2014. Then 2014 was the year things stabilized and 2015 saw increases in jobs. What is happening end of 2015 is something new. Intermediair made a big jump in one week (2125 to 2645 from 12 to 19 September), stepstone followed a few weeks later with some jumps. This really feels like a change in the market.<br />
In contrast, BCFJobs did fairly well through the crisis. However, during and after the summer vacation it did have a loss in vacancies. It is almost as if vacancies have been pulled from BCFJobs and placed into Intermediair and stepstone. As explained above, Intermediair is like the best known place for higher educated people to start their job search, likewise it may be the best place for employers to go to when it gets less easy to find employees. As the more and more people get employed again, this may be forcing the change. But that is speculation.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQ8XfONfukykX_RRfxLZBn5o0fNbzWKBAU5lqUP0sX_fNtmeJCUquh_woEWYmZv3f3LnzOMGZlYOSq70UrdWmcOsDGjDw7rYCU6m-7nVpGL0vNSfbiCM8npXLIKakIWxdHsG0cDIkgmkY/s1600/fig3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQ8XfONfukykX_RRfxLZBn5o0fNbzWKBAU5lqUP0sX_fNtmeJCUquh_woEWYmZv3f3LnzOMGZlYOSq70UrdWmcOsDGjDw7rYCU6m-7nVpGL0vNSfbiCM8npXLIKakIWxdHsG0cDIkgmkY/s1600/fig3.png" /></a></div>
<h3>
Conclusion</h3>
The number of jobs is fluctuating, depending on vacations, season and progress through the crisis. Regarding the crisis, 2013 was the year things stabilized and 2014 saw an increase in vacancies. For higher educated personnel, this change happened about a year later.<br />
<h3>
Code</h3>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">library(readODS)</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">library(ggplot2)</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">library(dplyr)</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">r1 <- read.ods('banen.aantal.clean.ods')</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">mynames <- sapply(1:11,function(i) r1[[1]][1,i])</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">mycols <- lapply(1:11,function(i) {</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> if (i==1) as.Date(r1[[1]][-1,1],format='%b %e, %Y')</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> else as.numeric(r1[[1]][-1,i])})</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">r2 <- as.data.frame(mycols)</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">mynames</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">names(r2) <- mynames</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">r3 <- mutate(r2,</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> Intermediair_VK=ifelse(!is.na(VK),NA,Intermediair),</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> Intermediair=ifelse(is.na(VK),NA,Intermediair))</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">l1 <- reshape(r3,</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> idvar='date',</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> direction='long',</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> varying=list(names(r3)[-1]),</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> timevar='source',</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> v.names='count',</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> times=names(r3)[-1])</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">l1 %>% </span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> filter(.,source %in% c('nationalevacaturebank.nl','uitzendbureaus',</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> 'nuwerk')) %>%</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> ggplot(.,aes(y=count,x=Date,col=source)) +</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> geom_line()+</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> ylim(0,NA)+</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> theme(legend.position="bottom")</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">l1 %>% </span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> filter(.,source %in% c('nationalevacaturebank.nl','uitzendbureaus',</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> 'indeed','werk','jobbird','nuwerk'),</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> Date>as.Date('2012-01-01')) %>%</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> ggplot(.,aes(y=count,x=Date,col=source)) +</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> geom_line()+</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> scale_y_log10()+</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> theme(legend.position="bottom")+</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> labs(y='Count (log scale)')</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">l1 %>% </span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> filter(.,source %in% c('VK','Intermediair','stepstone',</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> 'Intermediair_VK','BCFJobs')) %>%</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> ggplot(.,aes(y=count,x=Date,col=source)) +</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> geom_line()+</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> ylim(0,NA)+</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> theme(legend.position="bottom")</span><br />
<br />
<br />Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com1tag:blogger.com,1999:blog-3524617892004055830.post-9098257483850759752015-11-29T09:41:00.000+01:002015-11-29T09:41:49.297+01:00Wind in Netherlands II<a href="http://wiekvoet.blogspot.nl/2015/11/wind-in-netherlands.html">Two weeks ago</a> I plotted how wind measurements on the edge of the North Sea changed in the past century. This week the same dataset is used for hypothesis testing.<br />
<h3>
Data</h3>
The most important things to reiterate from previous post is that the data is from <a href="http://www.knmi.nl/home">KNMI</a> and they come with a comment: "<i>These time series are inhomogeneous because of station relocations and changes in observation techniques. As a result, these series are not suitable for trend analysis. For climate change studies we refer to the </i><a href="http://www.knmi.nl/kennis-en-datacentrum/achtergrond/gehomogeniseerde-reeks-maandtemperaturen-de-bilt" style="font-style: italic;">homogenized series of monthly temperatures of De Bilt</a><i> or the </i><a href="http://www.knmi.nl/kennis-en-datacentrum/achtergrond/centraal-nederland-temperatuur-cnt" style="font-style: italic;">Central Netherlands Temperature</a>"<i>. </i><br />
Data reading has slighlty changed, mostly because I needed different variables. In addition, for testing I wanted some categorical variables, these are Month and year. For year I have chosen five chunks of 22 years, 22 was chosen since it seemed large enough and resulted in approximately equal size chunks. Finally, for display purposes, wind direction was categorized in 8 directions according to the compass rose (North, North-East, East etc.).<br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">library(circular)</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">library(dplyr)</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">library(ggplot2)</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">library(WRS2)</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span>
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">r1 <- readLines('etmgeg_235.txt')</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">r2 <- r1[grep('^#',r1):length(r1)]</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">explain <- r1[1:(grep('^#',r1)-1)]</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">explain</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">r2 <- gsub('#','',r2)</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">r3 <- read.csv(text=r2)</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span>
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">r4 <- mutate(r3,</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> Date=as.Date(format(YYYYMMDD),format='%Y%m%d'),</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> year=floor(YYYYMMDD/1e4),</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> rDDVEC=as.circular(DDVEC,units='degrees',template='geographics'),</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> # Vector mean wind direction in degrees </span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> # (360=north, 90=east, 180=south, 270=west, 0=calm/variable)</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> DDVECf=as.character(cut(DDVEC,breaks=c(0,seq(15,330,45),361),left=TRUE,</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> labels=c('N','NE','E','SE','S','SW','W','NW','N2'))),</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> DDVECf=ifelse(DDVECf=='N2','N',DDVECf),</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> DDVECf=factor(DDVECf,levels=c('N','NE','E','SE','S','SW','W','NW')),</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> rFHVEC=FHVEC/10, # Vector mean windspeed (in 0.1 m/s)</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> yearf=cut(year,seq(1905,2015,22),labels=c('05','27','49','71','93')),</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> month=factor(format(Date,'%B'),levels=month.name),</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> tcat=interaction(month,yearf)</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> ) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> select(.,YYYYMMDD,Date,year,month,DDVEC,rDDVEC,DDVECf,rFHVEC,yearf,tcat)</span><br />
<h3>
Analysis</h3>
The circular package comes with an aov.circular() function, which can do one way analysis. Since I am a firm believer that direction varies according to the seasons, the presence of a time effect (the five categories) has been examined by Month. To make result compact, only p-values are displayed, they are all significant.<br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">sapply(month.name,function(x) {</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> aa <- filter(r4,month==x)</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> bb <- aov.circular(aa$rDDVEC,aa$yearf,method='F.test')</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> format.pval(bb$p.value,digits=4,eps=1e-5)</span><br />
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> }) %>% as.data.frame</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: medium;">January 4.633e-05</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: medium;">February < 1e-05</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: medium;">March < 1e-05</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: medium;">April < 1e-05</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: medium;">May < 1e-05</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: medium;">June 0.00121</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: medium;">July 0.000726</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: medium;">August 0.0001453</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: medium;">September 0.02316</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: medium;">October < 1e-05</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: medium;">November 0.0001511</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: medium;">December 0.003236</span><span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"></span><br />
The associated plot with this data shows frequency of directions by year and Month. The advantage here being that the time axis is the x-axis, so changes are more easily visible. <br />
<div>
<div>
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">ggplot(r4[complete.cases(r4),], aes(x=yearf))+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> geom_histogram()+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> facet_grid(DDVECf ~ month)+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> ggtitle('Frequency of Wind Direction')</span></div>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgd7RxXB1MohodLMK0XF4K0c5EripkgL77sKdyk3ekG06Z0s6gyGCZpcOiIffDUBrtqtZypVx6Wvnbcus16gpUMWd1x5kC1MqsfTRbyLX4qCxoNGneg30Ia50iObRnauNiiuv9I1-WZWyc/s1600/winddir.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgd7RxXB1MohodLMK0XF4K0c5EripkgL77sKdyk3ekG06Z0s6gyGCZpcOiIffDUBrtqtZypVx6Wvnbcus16gpUMWd1x5kC1MqsfTRbyLX4qCxoNGneg30Ia50iObRnauNiiuv9I1-WZWyc/s1600/winddir.png" /></a></div>
<div>
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
The other part of wind is strength. Two weeks ago I saw clear differences. However, since that may also be effect of instrument or location change. The test I am interested here is therefore not the main effect of year categories but rather the interaction Month*Year. In the objective of robustness I wanted to go nonparametric with this. However, since I did not find anything regarding two factor interaction in my second edition of <a href="http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470387378.html">Hollander and Wolfe</a> I googled for robust interaction. This gave a hit on <a href="http://rcompanion.org/rcompanion/d_08a.html">rcompanion</a> for the WRS2 package.<br />
<div>
<div>
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;">t2way(rFHVEC ~ yearf + month + yearf:month, </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> data = r4)</span></div>
</div>
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: medium;"> value p.value</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: medium;">yearf 1063.0473 0.001</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: medium;">month 767.5687 0.001</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: medium;">yearf:month 169.4807 0.001</span></div>
</div>
<h3>
Conclusion</h3>
<div>
The data seems to show a change in wind measurements over these 110 years. This can be due to changes in wind or measurement instrument or instrument location. The statistical testing was chosen such as to counter some effects of these changes, hence it can be thought that the change is due to changes in wind itself.</div>
<div>
<br /></div>
Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com1tag:blogger.com,1999:blog-3524617892004055830.post-56806844260630296532015-11-15T15:20:00.000+01:002015-11-15T15:20:30.192+01:00Wind in NetherlandsIn climate change discussions, everybody talks about temperature. But weather is much more than that. There is at least rain and wind as directly experienced quality, and air pressure as measurable quantity. In the Netherlands, some observation stations have more than a century of daily data on these things. The data may be broken in the sense that equipment and location can have changed. To quote: "<i>These time series are inhomogeneous because of station relocations and changes in observation techniques. As a result, these series are not suitable for trend analysis. For climate change studies we refer to the homogenized series of monthly temperatures of De Bilt <a href="http://www.knmi.nl/kennis-en-datacentrum/achtergrond/gehomogeniseerde-reeks-maandtemperaturen-de-bilt">link</a> or the Central Netherlands Temperature <a href="http://www.knmi.nl/kennis-en-datacentrum/achtergrond/centraal-nederland-temperatuur-cnt">link</a>.</i>" Since I am not looking at temperature but wind, I will keep to this station's data.<br />
<div>
<h3>
Data</h3>
</div>
<div>
Data are from <a href="https://www.knmi.nl/nederland-nu/klimatologie/daggegevens">daily observations from KNMI</a>. I have chosen station De Kooy. For those less familiar with Dutch geography, this is close to Den Helder, in the tip North West of Netherlands. This means pretty close to the North Sea, Wadden Sea and Lake IJssel. Wind should be relatively unhindered there. The data themselves are daily observations. For wind there are:</div>
<div>
DDVEC Vector mean wind direction in degrees<br />
(360=north, 90=east, 180=south, 270=west, 0=calm/variable)<br />
FHVEC Vector mean windspeed (in 0.1 m/s)<br />
FG Daily mean windspeed (in 0.1 m/s)<br />
FHX Maximum hourly mean windspeed (in 0.1 m/s)<br />
FHXH Hourly division in which FHX was measured<br />
FHN Minimum hourly mean windspeed (in 0.1 m/s)<br />
FHNH Hourly division in which FHN was measured<br />
FXX Maximum wind gust (in 0.1 m/s)<br />
FXXH Hourly division in which FXX was measured<br />
The header of the data downloaded contains this, and much more information. I am sure there are good reasons to do speed in 0.1 m/s, but personally I find m/s more easy.<br />
The two first variables are 'vector means'. It is obvious that one cannot simply average directions. Luckily there is the circular package, which does understand direction.<br />
Thus the data reading script becomes:<br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">r1 <- readLines('etmgeg_235.txt')</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">r2 <- r1[grep('^#',r1):length(r1)]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">explain <- r1[1:(grep('^#',r1)-1)]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"># explain</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">r2 <- gsub('#','',r2)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">r3 <- read.csv(text=r2)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">library(dplyr)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">library(circular)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">methods(sd)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">r4 <- mutate(r3,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Date=as.Date(format(YYYYMMDD),format='%Y%m%d'),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> year=floor(YYYYMMDD/1e4),</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;"> month=factor(format(Date,'%B'),levels=month.name),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> rDDVEC=as.circular(DDVEC,units='degrees',template='geographics'),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> # Vector mean wind direction in degrees </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> # (360=north, 90=east, 180=south, 270=west, 0=calm/variable)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> rFHVEC=FHVEC/10, # Vector mean windspeed (in 0.1 m/s)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> rFG=FG/10, # Daily mean windspeed (in 0.1 m/s) </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> rFHX=FHX/10, # Maximum hourly mean windspeed (in 0.1 m/s)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> rFHN=FHN/10, # Minimum hourly mean windspeed (in 0.1 m/s)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> rFXX=FXX/10 # Maximum wind gust (in 0.1 m/s)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ) %>%</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> select(.,YYYYMMDD,Date,year,month,rDDVEC,rFHVEC,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> rFG,rFHX,rFHN,rFXX)</span><br />
<div>
<br /></div>
<h3>
Plots</h3>
Plot of mean wind speed shows several effects. There is an equipment change just before year 2000. At the beginning of the curve the values are lowest, while in the sixties there is a bit more wind, as was n the nineties. I wonder about that. Is that equipment? I can imagine that hundred years ago there was lesser equipment giving such a change, but fifty or twenty years ago? Finally, close to the end of the war there is missing data.<br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">ggplot(data=r4,aes(y=rFG,x=Date))+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> geom_smooth()+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> geom_point(alpha=.03) +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ylab('Mean wind speed x (m/s)')+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> xlab('Year')</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBjfF1dkNaQE5INYOp8iM_sLliUwdnvvRxqI-iiZnedBJhJ-j_FTV6GsMjoup3W6cSYLz6C-hmF033KB5ypHYpTmA5Tna0cxDaGRCyqPoZB_VktK5XWeUJvi5VetHxOtjx6d56pG4S4Zo/s1600/meansmooth.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBjfF1dkNaQE5INYOp8iM_sLliUwdnvvRxqI-iiZnedBJhJ-j_FTV6GsMjoup3W6cSYLz6C-hmF033KB5ypHYpTmA5Tna0cxDaGRCyqPoZB_VktK5XWeUJvi5VetHxOtjx6d56pG4S4Zo/s1600/meansmooth.png" /></a></div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
A second plot is by month. This shows somewhat different patterns. There is still most wind in the middle of last century. However, September and October have the most wind just before 1950, while November and December have most wind after 1950. Such a pattern cannot be attributed to changes in equipment. It would seem there is some kind of change in wind speeds then.</div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">r5 <- group_by(r4,month,year) %>%</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> summarise(.,mFG=mean(rFG),mFHX=max(rFHX),mFXX=max(rFXX))</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">ggplot(data=r5,aes(y=mFG,x=year)) +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> geom_smooth(method='loess') +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> geom_point(alpha=.5)+</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> facet_wrap(~ month)</span></div>
<div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8pt4ox4EIStKdV02IypqBgsoJ6WsCPK4-zFR-Y8rTrrrQ75GcAu_-q8hscyM1W5zJM6_X0i1hN8rASWpVF2vRy0bUZ2QDQVo7EXmLAarGXrJJ8S1LWfBw8725CHz9tF9WBwgEiJfjNUs/s1600/mymonth.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8pt4ox4EIStKdV02IypqBgsoJ6WsCPK4-zFR-Y8rTrrrQ75GcAu_-q8hscyM1W5zJM6_X0i1hN8rASWpVF2vRy0bUZ2QDQVo7EXmLAarGXrJJ8S1LWfBw8725CHz9tF9WBwgEiJfjNUs/s1600/mymonth.png" /></a></div>
</div>
</div>
<h4>
Wind direction</h4>
<div>
In the Netherlands there is a clear connection between wind and the remainder of the weather. Most of the wind is from the SW (south west, I will be using N, E, S, W to abbreviate directions from here on). N, NW, W and SW winds take humidity from the North Sea and Atlantic Ocean, which in turn will bring rain. In winter, the SW wind will also bring warmth, there will be no frost with W and SW wind. In contrast, N, NE and E will bring cold. A winter wind from Siberia will bring skating fever. In summer, the nice and sunny weather is associated with S to E winds the E wind in May is associated with nice spring weather. SE is by far the least common direction. </div>
<div>
The circular package has a both density and plot functions. Combining these gets the following directions for the oldest part of the data. </div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">par(mfrow=c(3,4),mar=c(0,0,3,0))</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">lapply(month.name,function(x) {</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> xx <- r4$rDDVEC[r4$year<1921 & r4$month==x]</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> xx <- xx[!is.na(xx)]</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> density(xx,bw=50) %>% </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> plot(main=x,xlab='',ylab='',shrink=1.2)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 1</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> })</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">title("1906-1920", line = -1, outer = TRUE)</span></div>
</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIKelY3EnVqmqT4NtuGEvdea8TuErXST21TSOOqqS05X3XrUXvqoYNJMCzUBFCPhKHMuqGomERoH6kISDm1pbbTNNBtx53C4jzW1HfSZ9C02S_5tu9uf-lMJX9vFVUafJBIitTlhhEAdg/s1600/direction1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIKelY3EnVqmqT4NtuGEvdea8TuErXST21TSOOqqS05X3XrUXvqoYNJMCzUBFCPhKHMuqGomERoH6kISDm1pbbTNNBtx53C4jzW1HfSZ9C02S_5tu9uf-lMJX9vFVUafJBIitTlhhEAdg/s1600/direction1.png" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
I would be hard pressed to see significant differences between old and recent data. The densities are slightly different, but not really impressive. Note the lack of E wind in summer, indicating that recent summers have been not been very spectacular. </div>
<div class="separator" style="clear: both;">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">par(mfrow=c(3,4),mar=c(0,0,2,0))</span></div>
<div class="separator" style="clear: both;">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">lapply(month.name,function(x) {</span></div>
<div class="separator" style="clear: both;">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> xx <- r4$rDDVEC[r4$year>=2000 & r4$month==x]</span></div>
<div class="separator" style="clear: both;">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> xx <- xx[!is.na(xx)]</span></div>
<div class="separator" style="clear: both;">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> density(xx,bw=50) %>% </span></div>
<div class="separator" style="clear: both;">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> plot(main=x,xlab='',ylab='',shrink=1.2)</span></div>
<div class="separator" style="clear: both;">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 1</span></div>
<div class="separator" style="clear: both;">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> })</span></div>
<div class="separator" style="clear: both;">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">title("2000-now", line = -1, outer = TRUE)</span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjm1hplOrAQEpmXhi0kJmgnsNx5nQr3qjrz-NfY8v8-1unFV6vIqqHdg2BV_1lV3v6kOhzjjFluUcW2-tDlDePbFTelzTd85BqL7QMb1nqIZxpHbF43tL2a1MYgaz06_c62qpmN7obDa2I/s1600/direction2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjm1hplOrAQEpmXhi0kJmgnsNx5nQr3qjrz-NfY8v8-1unFV6vIqqHdg2BV_1lV3v6kOhzjjFluUcW2-tDlDePbFTelzTd85BqL7QMb1nqIZxpHbF43tL2a1MYgaz06_c62qpmN7obDa2I/s1600/direction2.png" /></a></div>
<div>
<br /></div>
Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com1tag:blogger.com,1999:blog-3524617892004055830.post-7495482612560490772015-11-01T10:01:00.000+01:002015-11-01T10:01:30.624+01:00Vacancies in EuropeI like playing around with data from <a href="http://ec.europa.eu/eurostat/">Eurostat</a>. At this time the tools to do so are just so easy. There are tools to pull the data directly from the data base in R (eurostat package). Process it a bit using dplyr and before you know it, ggplot makes a plot.<br />
<h3>
Data</h3>
My starting point to examine data is the <a href="http://ec.europa.eu/eurostat/data/database">database page</a>. From there I can browse for the correct table and view its contents. Having done that, I can take the name of the table and pull that in R. The name of the vacancy database I chose (J<span class="leaf">ob vacancy statistics - quarterly data (from 2001 onwards), NACE Rev. 2) </span>is jvs_q_nace2, hence with<br />
<span style="font-family: Courier New, Courier, monospace;">library(eurostat)</span><br />
<span style="font-family: Courier New, Courier, monospace;">library(dplyr)</span><br />
<span style="font-family: Courier New, Courier, monospace;">library(ggplot2)</span><br />
<span style="font-family: Courier New, Courier, monospace;">library(scales)</span><br />
<span style="font-family: Courier New, Courier, monospace;">r1 <- get_eurostat('jvs_q_nace2')</span><br />
<div>
I have all packages needed and the data in R. One of the properties of the data is that everything is coded. Hence the next step is to merge the codes. The following code pulls the country codes and does a bit of post processing on the names to get them a bit nicer. Subsequently, the variously combinations of countries determined by expanding of the EU and Euro area at various time points are removed. These data have the property that they are too abundant, some data removal is needed. Finally, seasonably adjusted data is selected and all company sizes are used.</div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace;"># add country names</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">r2 <- get_eurostat_dic('geo') %>%</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> mutate(.,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> geo=V1,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> country=V2,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> country=gsub('\\(.*$','',country),</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> country=gsub(' $','',country)) %>%</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> select(.,geo,country) %>%</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> merge(.,r1) %>%</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"># filter countries</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> filter(.,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> !grepl('EA.*',geo),</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> !grepl('EU.*',geo),</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> s_adj=='SA' ,# seas. adj.</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> sizeclas!='Total') %>% # all company sizes</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> mutate(.,country=factor(country)) %>%</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> select(.,-geo,-s_adj,-sizeclas)</span></div>
</div>
<div>
For other variables, it is more or less the same. get_eurostat_dic() pulls the coding and they can be merged. The text in <i>nace </i>is a bit long, so I shortened it.</div>
<span style="font-family: Courier New, Courier, monospace;">r3 <- get_eurostat_dic('nace_r2') %>%</span><br />
<span style="font-family: Courier New, Courier, monospace;"> rename(.,</span><br />
<span style="font-family: Courier New, Courier, monospace;"> nace_r2=V1, # add NACE </span><br />
<span style="font-family: Courier New, Courier, monospace;"> nace=V2) %>%</span><br />
<span style="font-family: Courier New, Courier, monospace;"> mutate(.,</span><br />
<span style="font-family: Courier New, Courier, monospace;"> nace=substr(nace,1,110)) %>%</span><br />
<span style="font-family: Courier New, Courier, monospace;"> merge(.,r2) %>%</span><br />
<span style="font-family: 'Courier New', Courier, monospace;"> mutate(nace=factor(nace))</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">r4 <- get_eurostat_dic('indic_em') %>%</span><br />
<span style="font-family: Courier New, Courier, monospace;"> rename(.,</span><br />
<span style="font-family: Courier New, Courier, monospace;"> indic_em=V1) %>%</span><br />
<span style="font-family: Courier New, Courier, monospace;"> merge(.,r3) %>%</span><br />
<span style="font-family: Courier New, Courier, monospace;"> mutate(.,</span><br />
<span style="font-family: Courier New, Courier, monospace;"> property=factor(V2)) %>%</span><br />
<span style="font-family: Courier New, Courier, monospace;"> select(.,-V2)</span><br />
<h3>
Plots</h3>
Since the data is now prepared, the next step is to plot. There are actually far too many categories in <i>nace</i> and a selection to be displayed is needed. If you want know what different categories are, use<br />
<div>
<span style="font-family: Courier New, Courier, monospace;">nace <- select(r4,nace_r2,nace) %>% unique()</span> </div>
<div>
to display what each category represents. I chose to select a number of industry related categories. In addition some countries have very limited data, they are eliminated. <br />
<div>
<div>
<span style="font-family: Courier New, Courier, monospace;">filter(r4,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> property=='Job vacancy rate',</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> nace_r2 %in% c('A-S','B-E','B-S','B-F'),</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> !(country %in% </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> c('Croatia', 'Greece','Portugal',# limited years</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> 'Switzerland')), # limited classes</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> time>as.Date('01-01-2006',format='%d-%m-%Y'),</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> !is.na(values)) %>%</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> mutate(.,country=factor(country)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> ,nace_r2=factor(nace_r2)) %>%</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> ggplot(.,aes(x=time,y=values,color=nace)) +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> geom_line() +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> facet_wrap( ~ country )+</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> ylab('Job vacancy rate')+</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> guides(color=guide_legend(ncol=1))+</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace;"> scale_x_date(labels=date_format("%y"))+</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> xlab('Year')+</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> theme(legend.position="bottom", legend.title=element_blank())</span></div>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGbeq1IeGKt5ic_Hk5uSLZvhyphenhyphenc_rT2iNgiFLn6JHp-4Kog2KtS-sHOCEKDkJWypkNGvDw3ZVpys07yZW1kG0cV1OkFQRkC6NsYfogrL-pi-KqUfaFrZP0U0VArcaPntElYyzpiuZ5F_3M/s1600/jvra.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGbeq1IeGKt5ic_Hk5uSLZvhyphenhyphenc_rT2iNgiFLn6JHp-4Kog2KtS-sHOCEKDkJWypkNGvDw3ZVpys07yZW1kG0cV1OkFQRkC6NsYfogrL-pi-KqUfaFrZP0U0VArcaPntElYyzpiuZ5F_3M/s1600/jvra.png" /></a></div>
In the plot the enormous drops for Cyprus, Czech Republic and Estonia are clearly visible. The Czech Republic is also rebounding quite steeply. UK had a smaller drop in 2008, but is now at pre-crisis job vacancy rates. In fact many countries show increases in job vacancy rate.<br />
<div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div>
Getting a different display is just very easy. Below the call to get number of vacancies in education, information and communication and research. Since the number of vacancies is really dependent on country size, a logarithmic scale is chosen. The countries displayed are slightly different, it appears not all countries have all data. But the trends are similar as the previous plot.</div>
</div>
<div>
<span style="font-family: Courier New, Courier, monospace;"></span><br />
<div>
<span style="font-family: Courier New, Courier, monospace;">filter(r4,</span></div>
<span style="font-family: Courier New, Courier, monospace;">
<div>
property=='Number of job vacancies',</div>
<div>
nace_r2 %in% c('J','M','M_N','P'),</div>
<div>
!(country %in% # limited years</div>
<div>
c('Croatia', 'Greece','Portugal','Sweden')),</div>
<div>
!is.na(values)) %>%</div>
<div>
mutate(.,country=factor(country)</div>
<div>
,nace_r2=factor(nace_r2)) %>%</div>
<div>
</div>
<div>
ggplot(.,aes(x=time,y=values,color=nace )) +</div>
<div>
geom_line() +</div>
<div>
facet_wrap( ~ country )+</div>
<div>
ylab('Number of job vacancies')+</div>
<div>
guides(color=guide_legend(ncol=1))+</div>
<div>
scale_x_date(labels=date_format("%y"))+</div>
<div>
xlab('Year')+</div>
<div>
scale_y_log10()+</div>
<div>
theme(legend.position="bottom", legend.title=element_blank())</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3-sLBbwBA1pvCgW7ssvJ2CAR0asdt1HMtCkjIt73BWbbDJTbsIvKW1OolTtSCefFsaLYj71zobmPY4uR-rzBfnYyasEq7X0SHxQ-usCmDHVhbwcEtL8IILdWFmLjC3LkdwtYlyEQoE7E/s1600/numind2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3-sLBbwBA1pvCgW7ssvJ2CAR0asdt1HMtCkjIt73BWbbDJTbsIvKW1OolTtSCefFsaLYj71zobmPY4uR-rzBfnYyasEq7X0SHxQ-usCmDHVhbwcEtL8IILdWFmLjC3LkdwtYlyEQoE7E/s1600/numind2.png" /></a></div>
<div>
<br /></div>
</span></div>
Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com1tag:blogger.com,1999:blog-3524617892004055830.post-6410782473214315072015-10-18T11:36:00.002+02:002015-10-18T11:36:20.144+02:00Trying to optimizeI wanted to try some more machine learning. On Kaggle there is a competition <a href="https://www.kaggle.com/c/how-much-did-it-rain-ii">How Much Did It Rain? II</a>. This is quite a bigger data set than Titanic. To quote from Kaggle:<br />
<i>Rainfall is highly variable across space and time, making it notoriously
tricky to measure. Rain gauges can be an effective measurement tool for
a specific location, but it is impossible to have them everywhere. In
order to have widespread coverage, data from weather radars is used to
estimate rainfall nationwide. Unfortunately, these predictions never
exactly match the measurements taken using rain gauges.</i><br />
<h3>
Data</h3>
On the data themselves:<br />
<i>To understand the data, you have to realize that there are multiple
radar observations over the course of an hour, and only one gauge
observation (the 'Expected'). That is why there are multiple rows with
the same 'Id'.</i><br />
I have downloaded the data and at this point am just experimenting with them. It is quite a big data set: there are 9125329 rows in the training set. My idea was to do 'something' per record, combine the records of one hour to get a prediction. The 'something' is as yet undefined. The idea to combine by Id is supposed to be retained.<br />
What became clear pretty quickly is that everything is slow with this amount of data. Hence for now I will use only 10% of the training data. For ease of access the data are sitting in a R data set.<br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">load('aaa3.RData')</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">###</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"># take 10% of data</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">rawdata <- rawdata[rawdata$Id < quantile(rawdata$Id,.1),]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"># extract keys per hour . Id & Expected</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"># rawdata$Id!=c(0,rawdata$Id[1:(nrow(rawdata)-1) </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"># is the R way to write the SAS code: by Id; If first.Id;</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">r1b <- rawdata[rawdata$Id!=c(0,rawdata$Id[1:(nrow(rawdata)-1)]),c(1,24)]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Id <- factor(rawdata$Id)</span><br />
<h3>
Model</h3>
To get but to get an idea of the process I started with linear regression, but that is just a temporary approach. For linear regression there are 22 parameters, for 21 observed values and an intercept. Prediction per row follows from a simple matrix multiplication. The model including estimation of the error in the fit sits in small R function. As preparation a column of ones is added to the x data. The summary per Id can be done pretty quickly and easy via the group_by() and summarise() functions from dplyr.<br />
Based on the current results I have decided that such a function will have to be transferred to C++ or such in order to have a decent computation time. But that is for a future time, it has been quite some years that I programmed in C or Fortran, I'll need a refresher first, luckily <a href="https://www.edx.org/">edX</a> has a course 'Introduction to C++' running right now.<br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">r1m <- as.matrix(rawdata[,c(-1,-2,-24)])</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">rm(rawdata) # control memory usage</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">r1m <- cbind(rep(1,nrow(r1m)),r1m) # add column of 1</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">r1m[is.na(r1m) ] <- 0</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">betas <- rep(1,ncol(r1m))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">#myerr calculated mean prediction per Id and compares with Expected values</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">myerr <- function(betas) {</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> pred <- data.frame(Id=Id,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> pred=as.numeric(tcrossprod(betas,r1m))) %>%</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> group_by(.,Id) %>%</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> summarise(.,m=mean(pred)) </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> sum(abs(pred$m-r1b$Expected))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">}</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">#mmyerr is myerr for maximization</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">mmyerr <- function(betas) -myerr(betas) </span><br />
<h3>
Parameter estimation & optimization</h3>
The problem has now been reduced to getting the parameters which give the lowest prediction error. just throwing this in optim() did not lead to satisfactory results. So this post gives some experiments with alternate approaches. So, I played with some of these, and for this post ran it all to get decent data. The table shows the quick summary.<br />
<br />
<table border="0" cellspacing="0">
<colgroup width="85"></colgroup>
<colgroup width="97"></colgroup>
<colgroup width="55"></colgroup>
<colgroup width="99"></colgroup>
<colgroup width="85"></colgroup>
<tbody>
<tr>
<td align="left" height="17"><b>package</b></td>
<td align="left"><b>function</b></td>
<td align="right"><b>time</b></td>
<td align="right"><b>result</b></td>
<td align="right"><b>converged</b></td>
</tr>
<tr>
<td align="left" height="17">stats</td>
<td align="left">optim </td>
<td align="right" sdnum="1033;" sdval="771">771</td>
<td align="right" sdnum="1033;" sdval="1805459">1805459</td>
<td align="right">No</td>
</tr>
<tr>
<td align="left" height="17"><br /></td>
<td align="left">optim (BFGS)</td>
<td align="right" sdnum="1033;" sdval="729">729</td>
<td align="right" sdnum="1033;" sdval="1678736">1678736</td>
<td align="right">Yes</td>
</tr>
<tr>
<td align="left" height="17"><br /></td>
<td align="left">optim(CG)</td>
<td align="right" sdnum="1033;" sdval="5527">5527</td>
<td align="right" sdnum="1033;" sdval="1678775">1678775</td>
<td align="right">Yes</td>
</tr>
<tr>
<td align="left" height="17">adagio</td>
<td align="left">simpleEA</td>
<td align="right" sdnum="1033;" sdval="623">623</td>
<td align="right" sdnum="1033;" sdval="1722928">1722928</td>
<td align="right">No</td>
</tr>
<tr>
<td align="left" height="17">dfoptim</td>
<td align="left">hjk</td>
<td align="right" sdnum="1033;" sdval="4289">4289</td>
<td align="right" sdnum="1033;" sdval="1678734">1678734</td>
<td align="right">Yes</td>
</tr>
<tr>
<td align="left" height="17">GA</td>
<td align="left">ga</td>
<td align="right" sdnum="1033;" sdval="589">589</td>
<td align="right" sdnum="1033;" sdval="1775910">1775910</td>
<td align="right">NA</td>
</tr>
</tbody></table>
It appears that optim() (Nelder Mead) did not function at all. In contrast, using the BFGS option gave quite an improvement. Using that would imply that a fast differentiation could improve the result quite a bit. The quite decent result for simpleEA() is actually a bit of a disappointment. It started with a box and converged to the center of that box. Whatever I changed in the options, it would always end in the same center. hjk() functions but is quite slow. Finally, ga() is just a bit too slow and has difficulty finding the actual optimum.<br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">> system.time(</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ ooptim <- optim(betas,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ myerr,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ control=list(maxit=5000)))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> user system elapsed </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">771.204 0.216 770.743 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">> ooptim</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$par</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> [1] 5.91962946 -0.26798071 -1.19797656 -0.13297181 0.51015756 1.49454289</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> [7] 0.75377772 0.47774932 -0.54917591 -1.36237144 9.32248883 -4.58509259</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[13] 1.53973413 -1.28367152 1.57070010 2.80960763 -1.53639162 0.24043815</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[19] 0.55385066 0.65754031 2.25820673 -0.09389256</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$value</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1] 1805459</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$counts</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">function gradient </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 5001 NA </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$convergence</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1] 1</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$message</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">NULL</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">> system.time(</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ ooptimBFGS <- optim(betas,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ myerr,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ control=list(maxit=5000),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ method='BFGS'))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> user system elapsed </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">730.253 0.226 729.741 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">> ooptimBFGS</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$par</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> [1] -0.193572528 0.036277278 0.018915001 0.055201253 -0.005343143</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> [6] 0.048512761 0.018770974 -0.067885397 0.006576984 0.009960527</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[11] 0.354908920 -0.417610915 0.004300646 -0.313622718 0.013003956</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[16] 0.030448388 0.004622354 0.026076960 0.022466472 0.015661149</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[21] 0.093649287 -0.056203122</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$value</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1] 1678736</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$counts</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">function gradient </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 626 94 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$convergence</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1] 0</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$message</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">NULL</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">> system.time(</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ ooptimCG <- optim(betas,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ myerr,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ control=list(maxit=5000),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ method='CG'))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> user system elapsed </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">5531.910 1.397 5527.053 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">> ooptimCG</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$par</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> [1] -0.112575961 0.030875504 0.019111803 0.06Yes0134187 -0.009593463</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> [6] 0.051076794 0.019737360 -0.077191287 0.016508372 0.003658111</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[11] -0.071932138 -0.036570553 -0.051100594 -0.179869515 -0.010209693</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[16] 0.066388807 0.005459227 0.039691325 0.020881573 0.020044547</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[21] 0.068064112 -0.051825924</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$value</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1] 1678775</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$counts</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">function gradient </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 2283 772 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$convergence</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1] 0</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$message</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">NULL</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">> library(adagio)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">> system.time(</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ osimpleEA <- simpleEA(myerr,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ lower=rep(-5,ncol(r1m)),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ upper=rep(5,ncol(r1m))))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> user system elapsed </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">624.432 0.176 623.857 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">> osimpleEA</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$par</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$val</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1] 1722928</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$fun.calls</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1] 4060</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$rel.scl</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1] 0.625</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$rel.tol</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1] 0</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">> library(dfoptim)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">> system.time(</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ ohjk <- hjk(betas,myerr)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ )</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> user system elapsed </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">4291.935 2.036 4289.022 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">> ohjk</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$par</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> [1] -0.188190460 0.035865784 0.020156860 0.056446075 -0.005027771</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> [6] 0.046699524 0.018516541 -0.069412231 0.006889343 0.010589600</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[11] 0.257308960 -0.412174225 0.065429688 -0.279411316 0.014663696</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[16] 0.027637482 0.011859894 0.020298004 0.023769379 0.013179779</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[21] 0.092247009 -0.056926727</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$value</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1] 1678734</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$convergence</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1] 0</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$feval</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1] 28245</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">$niter</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1] 19</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">> library(GA)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Loading required package: foreach</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">foreach: simple, scalable parallel programming from Revolution Analytics</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Use Revolution R for scalability, fault tolerance and more.</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">http://www.revolutionanalytics.com</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Loading required package: iterators</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Package 'GA' version 2.2</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Type 'citation("GA")' for citing this R package in publications.</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">> system.time(</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ oga <- ga(type='real-valued',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ fitness=mmyerr,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ min=rep(-5,ncol(r1m)),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+ max=rep(5,ncol(r1m))))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 1 | Mean = -10979924 | Best = -2576076 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 2 | Mean = -7162806 | Best = -2576076 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 3 | Mean = -4932565 | Best = -2318262 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 4 | Mean = -4639204 | Best = -2237286 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 5 | Mean = -3638851 | Best = -2237286 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 6 | Mean = -3689674 | Best = -2068472 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 7 | Mean = -3226292 | Best = -1990599 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 8 | Mean = -2977074 | Best = -1876791 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 9 | Mean = -2775401 | Best = -1876791 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 10 | Mean = -2332140 | Best = -1865152 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 11 | Mean = -2437267 | Best = -1817242 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 12 | Mean = -2471635 | Best = -1799720 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 13 | Mean = -2168155 | Best = -1799720 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 14 | Mean = -2096314 | Best = -1799720 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 15 | Mean = -2253981 | Best = -1799720 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 16 | Mean = -2049947 | Best = -1799720 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 17 | Mean = -2089075 | Best = -1789813 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 18 | Mean = -2093853 | Best = -1789813 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 19 | Mean = -1976973 | Best = -1789813 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 20 | Mean = -1933447 | Best = -1789813 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 21 | Mean = -2000920 | Best = -1789813 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 22 | Mean = -1950218 | Best = -1788022 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 23 | Mean = -1886263 | Best = -1788022 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 24 | Mean = -1914422 | Best = -1788022 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 25 | Mean = -2024395 | Best = -1788022 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 26 | Mean = -2016097 | Best = -1788022 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 27 | Mean = -1947956 | Best = -1788022 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 28 | Mean = -1900507 | Best = -1787331 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 29 | Mean = -2066703 | Best = -1787331 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 30 | Mean = -1835251 | Best = -1787331 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 31 | Mean = -1942032 | Best = -1787331 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 32 | Mean = -2019920 | Best = -1787331 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 33 | Mean = -1891563 | Best = -1787331 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 34 | Mean = -1838287 | Best = -1787331 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 35 | Mean = -1925757 | Best = -1787331 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 36 | Mean = -1972783 | Best = -1787331 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 37 | Mean = -2053707 | Best = -1787331 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 38 | Mean = -1999298 | Best = -1787331 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 39 | Mean = -2057161 | Best = -1787331 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 40 | Mean = -2016452 | Best = -1787331 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 41 | Mean = -2106619 | Best = -1787331 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 42 | Mean = -1848483 | Best = -1787331 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 43 | Mean = -1952123 | Best = -1784611 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 44 | Mean = -1911939 | Best = -1784611 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 45 | Mean = -1867947 | Best = -1784611 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 46 | Mean = -1961794 | Best = -1784611 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 47 | Mean = -1878957 | Best = -1784611 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 48 | Mean = -2004529 | Best = -1784611 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 49 | Mean = -1819421 | Best = -1784611 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 50 | Mean = -1904183 | Best = -1784611 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 51 | Mean = -1870062 | Best = -1783617 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 52 | Mean = -1948733 | Best = -1782194 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 53 | Mean = -2023928 | Best = -1782194 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 54 | Mean = -2008359 | Best = -1782194 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 55 | Mean = -2022090 | Best = -1782194 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 56 | Mean = -2086763 | Best = -1782194 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 57 | Mean = -1959158 | Best = -1782194 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 58 | Mean = -1849578 | Best = -1782194 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 59 | Mean = -2119746 | Best = -1782194 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 60 | Mean = -2218175 | Best = -1780252 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 61 | Mean = -2096596 | Best = -1780252 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 62 | Mean = -1979876 | Best = -1779764 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 63 | Mean = -2172198 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 64 | Mean = -1876837 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 65 | Mean = -1880262 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 66 | Mean = -1830083 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 67 | Mean = -1908915 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 68 | Mean = -1994229 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 69 | Mean = -2090335 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 70 | Mean = -2314393 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 71 | Mean = -2072167 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 72 | Mean = -1901014 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 73 | Mean = -1805347 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 74 | Mean = -1868138 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 75 | Mean = -2054748 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 76 | Mean = -1894803 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 77 | Mean = -1789384 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 78 | Mean = -1919701 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 79 | Mean = -1885417 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 80 | Mean = -1911716 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 81 | Mean = -1951938 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 82 | Mean = -1968106 | Best = -1779187 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 83 | Mean = -2118552 | Best = -1778650 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 84 | Mean = -1891108 | Best = -1778650 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 85 | Mean = -1967131 | Best = -1778650 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 86 | Mean = -1971338 | Best = -1778650 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 87 | Mean = -1798790 | Best = -1778508 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 88 | Mean = -1990970 | Best = -1778440 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 89 | Mean = -2103851 | Best = -1778440 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 90 | Mean = -2176462 | Best = -1778440 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 91 | Mean = -1852161 | Best = -1778440 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 92 | Mean = -1913977 | Best = -1778440 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 93 | Mean = -2110709 | Best = -1778440 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 94 | Mean = -1948281 | Best = -1778440 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 95 | Mean = -2221610 | Best = -1777905 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 96 | Mean = -2276691 | Best = -1775910 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 97 | Mean = -2141834 | Best = -1775910 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 98 | Mean = -1871241 | Best = -1775910 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 99 | Mean = -1829631 | Best = -1775910 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iter = 100 | Mean = -1914998 | Best = -1775910 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> user system elapsed </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">590.256 0.418 589.855 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">> summary(oga)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+-----------------------------------+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">| Genetic Algorithm |</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">+-----------------------------------+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">GA settings: </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Type = real-valued </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Population size = 50 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Number of generations = 100 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Elitism = 2 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Crossover probability = 0.8 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Mutation probability = 0.1 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Search domain </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Min -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Max 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> x22</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Min -5</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Max 5</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">GA results: </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Iterations = 100 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Fitness function value = -1775910 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Solution = </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> x1 x2 x3 x4 x5 x6 x7</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1,] 0.2099863 0.3481724 0.3910016 -1.416855 -0.3929375 0.5143907 0.319798</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> x8 x9 x10 x11 x12 x13 x14</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1,] 1.795368 -0.445552 -0.8136325 -0.01177531 -0.823179 0.744795 -0.7005772</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> x15 x16 x17 x18 x19 x20 x21</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1,] 0.3549757 0.4885278 0.3535035 -0.04773755 -0.5155096 0.3308792 0.1921899</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> x22</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1,] 0.3446834</span><br />
<div>
<br /></div>
Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com3tag:blogger.com,1999:blog-3524617892004055830.post-21018100756028742962015-10-04T11:39:00.000+02:002015-10-04T11:39:07.076+02:00Predicting Titanic deaths on Kaggle VII: More Stan<a href="http://wiekvoet.blogspot.nl/2015/09/predicting-titanic-deaths-on-kaggle-vi.html">Two weeks ago</a> I used STAN to create predictions after just throwing in all independent variables. This week I aim to refine the STAN model. For this it is convenient to use the loo package (Efficient Leave-One-Out Cross-Validation and WAIC for Bayesian Models). See also the <a href="http://arxiv.org/abs/1507.04544">paper</a> by Aki Vehtari, Andrew Gelman and Jonah Gabry.<br />
Since the package does the heavy lifting, it only remains to wrap it a function so I can quickly compare some models. A potential next step is to automate some more, but I did not do that pending current results.<br />
<h3>
Data</h3>
Same as last time.<br />
<h3>
Model</h3>
To keep the model similar as last time, I need to get a full design matrix for each independent variable in the model. So I made a mechanism which takes a model formulation and creates both the design matrix and a bunch of indices to keep track which column corresponds to which part of the model. To be specific, terms contains 1 to <i>nterm</i> if the corresponding column in xmat is from variable 1 (intercept) to the last variable. This sits in the function des.matrix.<br />
The generated quantities block is purely for the LOO statistic.<br />
It is preferred to compile the model only once, hence fit1 is calculated beforehand. Having done that preparation, MySmodel is a function which does model fitting, LOO statistic and output it all in one step. In this function I can just drop in the formula and get something usable as output, so I can easily examine a bunch of models. It seemed to me that forward selection was a suitable way to examine the model space. I know it is not ideal, but at this point I mainly want to know if this actually will function.<br />
<h3>
Results</h3>
To my surprise, Title was the parameter which gave the best predictions. I had expected sex to play that role.<br />
<span style="font-family: Courier New, Courier, monospace;">Survived ~ Title -445.4972 16.46314 </span><br />
<span style="font-family: Courier New, Courier, monospace;">Computed from 4000 by 891 log-likelihood matrix</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"> Estimate SE</span><br />
<span style="font-family: Courier New, Courier, monospace;">elpd_loo -445.5 16.5</span><br />
<span style="font-family: Courier New, Courier, monospace;">p_loo 4.1 0.2</span><br />
<span style="font-family: Courier New, Courier, monospace;">looic 891.0 32.9</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">All Pareto k estimates OK (k < 0.5)</span><br />
<span style="font-family: inherit;">The next variable was passenger class</span><br />
<span style="font-family: inherit;"></span><br />
<span style="font-family: Courier New, Courier, monospace;">Survived ~ Title + Pclass -395.5926 17.42705 </span><br />
<div>
Unfortunately after adding a few independent variables things gave only minor improvements. This os not because of anything faulty, I made a classical mechanism to leave 10% out and predict the remainder. Those results were similar, but took more time and showed more run to run variation in the results. The only true advantage was that it gave results on the same scale as previous cross validations. </div>
<div>
I expanded the model formula to about 10 terms. At that point, the expected prediction error decreased so slow that I decided on an eight term model. (<span style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace; font-size: x-small;">Title + Pclass + sibsp + Title:Pclass + Embarked + oe + Title:sibsp + parch</span>). The functions myPmodel and mySpred refit the model and perform the actual predictions. The result was a disappointing 0.78 on Kaggle. A minor improvement on the previous STAN result, but boosting is still better.</div>
<h3>
Code</h3>
<div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(rstan)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">rstan_options(auto_write = TRUE)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">options(mc.cores = parallel::detectCores())</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(loo)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># read and combine</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">train <- read.csv('train.csv')</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">train$status <- 'train'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">test <- read.csv('test.csv')</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">test$status <- 'test'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">test$Survived <- NA</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt <- rbind(test,train)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># generate variables</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Embarked[tt$Embarked==''] <- 'S'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Embarked <- factor(tt$Embarked)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Pclass <- factor(tt$Pclass)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Survived <- factor(tt$Survived)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$age <- tt$Age</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$age[is.na(tt$age)] <- 999</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$age <- cut(tt$age,c(0,2,5,9,12,15,21,55,65,100,1000))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title <- sapply(tt$Name,function(x) strsplit(as.character(x),'[.,]')[[1]][2])</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title <- gsub(' ','',tt$Title)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title[tt$Title=='Dr' & tt$Sex=='female'] <- 'Miss'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title[tt$Title %in% c('Capt','Col','Don','Sir','Jonkheer','Major','Rev','Dr')] <- 'Mr'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title[tt$Title %in% c('Lady','Ms','theCountess','Mlle','Mme','Ms','Dona')] <- 'Miss'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title <- factor(tt$Title)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># changed cabin character</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$cabchar <- substr(tt$Cabin,1,1)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$cabchar[tt$cabchar %in% c('F','G','T')] <- 'X';</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$cabchar <- factor(tt$cabchar)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ncabin <- nchar(as.character(tt$Cabin))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$cn <- as.numeric(gsub('[[:space:][:alpha:]]','',tt$Cabin))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$oe <- factor(ifelse(!is.na(tt$cn),tt$cn%%2,-1))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Fare[is.na(tt$Fare)]<- median(tt$Fare,na.rm=TRUE)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket <- sub('[[:digit:]]+$','',tt$Ticket)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket <- toupper(gsub('(\\.)|( )|(/)','',tt$ticket))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket[tt$ticket %in% c('A2','A4','AQ3','AQ4','AS')] <- 'An'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket[tt$ticket %in% c('SCA3','SCA4','SCAH','SC','SCAHBASLE','SCOW')] <- 'SC'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket[tt$ticket %in% c('CASOTON','SOTONO2','SOTONOQ')] <- 'SOTON'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket[tt$ticket %in% c('STONO2','STONOQ')] <- 'STON'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket[tt$ticket %in% c('C')] <- 'CA'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket[tt$ticket %in% c('SOC','SOP','SOPP')] <- 'SOP'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket[tt$ticket %in% c('SWPP','WC','WEP')] <- 'W'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket[tt$ticket %in% c('FA','FC','FCC')] <- 'F'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket[tt$ticket %in% c('PP','PPP','LINE','LP','SP')] <- 'PPPP'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket <- factor(tt$ticket)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$fare <- cut(tt$Fare,breaks=c(min(tt$Fare)-1,quantile(tt$Fare,seq(.2,.8,.2)),max(tt$Fare)+1))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$sibsp=factor(c(1:4,rep(4,6))[tt$SibSp+1])</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$parch=factor(c(1:4,rep(4,6))[tt$Parch+1])</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">train <- tt[tt$status=='train',]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">test <- tt[tt$status=='test',]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#end of preparation and data reading</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">options(width=90)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#################4444444444444444444##############################</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">des.matrix <- function(formula,data) {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> form2 <- strsplit(as.character(formula),'~',fixed=TRUE)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> resp <- form2[[length(form2)]]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> form3 <- strsplit(resp,'+',fixed=TRUE)[[1]]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> la <- lapply(form3,function(x) </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> model.matrix(as.formula(paste('~' , x, '-1' )),data) )</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> nterm <- c(1,sapply(la,ncol))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> terms <- rep(1:length(nterm),nterm)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ntrain <- nrow(data)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mat <- do.call(cbind,la)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mat <- cbind(rep(1,ntrain),mat)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> np <- ncol(mat)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> list( </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> survived = c(0,1)[data$Survived],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> np=np,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ntrain=nrow(data),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> terms=terms,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> nterm=max(terms),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> tx=mat)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">}</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">datain <- des.matrix(~ Sex+Pclass,data=train)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">parameters=c('std','f','log_lik')</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">my_code <- ' </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int<lower=0> ntrain;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int survived[ntrain];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int<lower=1> np;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int<lower=1> nterm;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int terms[np];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> matrix <lower=0,upper=1> [ntrain,np] tx;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> parameters {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> vector[np] f;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real <lower=0> std[nterm];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real <lower=0> stdhyp;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> model { </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> stdhyp ~ normal(0,2);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> std ~ normal(0,stdhyp);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> for (i in 1:np) {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> f[i] ~ normal(0,std[terms[i]]);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> survived ~ bernoulli_logit(tx*f);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> generated quantities {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> vector [ntrain] log_lik;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> for (i in 1:ntrain) {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> log_lik[i] <- bernoulli_logit_log(survived[i], tx[i]*f);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> '</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">fit1 <- stan(model_code = my_code, </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data = datain, </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> pars=parameters,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> iter = 1000, </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> chains = 4,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> open_progress=FALSE)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#fit1</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#log_lik1 <- extract_log_lik(fit1)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#loo1 <- loo(log_lik1)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#print(loo1, digits = 3)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">print.mySmodel <- function(x) {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> print(x$loo1)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cat('\n')</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> invisible(x)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">}</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">mySmodel <- function(formula,data) {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> datain <- des.matrix(formula,data)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fitx <- </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> stan(model_code = my_code, </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data = datain, </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> pars=parameters,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fit=fit1,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> iter = 2000, </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> chains = parallel::detectCores(),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> open_progress=FALSE)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> log_lik1 <- extract_log_lik(fitx)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> loo1 <- loo(log_lik1)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ll <- list(myform=formula,fitx=fitx,loo1=loo1)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> class(ll) <- 'mySmodel'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cat(format(formula),loo1$elpd_loo,loo1$se_elpd_loo,'\n')</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ll</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">}</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">mySmodel(Survived ~ </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Title ,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data=train)</span></div>
</div>
<div>
</div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"></span><br />
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span style="background-color: #f3f3f3;">################# prediction functions</span></span></div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">
<div>
<span style="background-color: #f3f3f3;"> </span></div>
<div>
<span style="background-color: #f3f3f3;">myPmodel <- function(formula,data) {</span></div>
<div>
<span style="background-color: #f3f3f3;"> datain <- des.matrix(formula,data)</span></div>
<div>
<span style="background-color: #f3f3f3;"> </span></div>
<div>
<span style="background-color: #f3f3f3;"> fitx <- </span></div>
<div>
<span style="background-color: #f3f3f3;"> stan(model_code = my_code, </span></div>
<div>
<span style="background-color: #f3f3f3;"> data = datain, </span></div>
<div>
<span style="background-color: #f3f3f3;"> pars=parameters,</span></div>
<div>
<span style="background-color: #f3f3f3;"> fit=fit1,</span></div>
<div>
<span style="background-color: #f3f3f3;"> iter = 2000, </span></div>
<div>
<span style="background-color: #f3f3f3;"> chains = parallel::detectCores(),</span></div>
<div>
<span style="background-color: #f3f3f3;"> open_progress=FALSE)</span></div>
<div>
<span style="background-color: #f3f3f3;"> list(myform=formula,fitx=fitx)</span></div>
<div>
<span style="background-color: #f3f3f3;"> } </span></div>
<div>
<span style="background-color: #f3f3f3;"> </span></div>
<div>
<span style="background-color: #f3f3f3;">PredM <- myPmodel(~ Title + Pclass + sibsp + Title:Pclass + Embarked + oe + Title:sibsp + parch</span></div>
<div>
<span style="background-color: #f3f3f3;"> ,data=train)</span></div>
<div>
<br /></div>
<div>
<span style="background-color: #f3f3f3;">mySpred <- function(mymodel,newdata) {</span></div>
<div>
<span style="background-color: #f3f3f3;"> pfit <- as.matrix(mymodel$fitx)</span></div>
<div>
<span style="background-color: #f3f3f3;"> fmat <- pfit[,grep('^f\\[',colnames(pfit))]</span></div>
<div>
<span style="background-color: #f3f3f3;"> px <- des.matrix(mymodel$myform,data=newdata)$tx</span></div>
<div>
<span style="background-color: #f3f3f3;"> mylpred <- tcrossprod(fmat,px)</span></div>
<div>
<span style="background-color: #f3f3f3;"> mpred <- apply(mylpred,2,function(x) mean(x))</span></div>
<div>
<span style="background-color: #f3f3f3;"> pred <- as.numeric(gtools::inv.logit(mpred)>.5)</span></div>
<div>
<span style="background-color: #f3f3f3;"> factor(pred)</span></div>
<div>
<span style="background-color: #f3f3f3;">}</span></div>
<div>
<span style="background-color: #f3f3f3;">preds <- mySpred(PredM,test)</span></div>
<div>
<span style="background-color: #f3f3f3;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3;">out <- data.frame(</span></div>
<div>
<span style="background-color: #f3f3f3;"> PassengerId=test$PassengerId,</span></div>
<div>
<span style="background-color: #f3f3f3;"> Survived=as.numeric(as.character(preds)),</span></div>
<div>
<span style="background-color: #f3f3f3;"> row.names=NULL)</span></div>
<div>
<span style="background-color: #f3f3f3;">write.csv(x=out,</span></div>
<div>
<span style="background-color: #f3f3f3;"> file='stanqua.csv',</span></div>
<div>
<span style="background-color: #f3f3f3;"> row.names=FALSE,</span></div>
<div>
<span style="background-color: #f3f3f3;"> quote=FALSE)</span></div>
</span></div>
Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com1tag:blogger.com,1999:blog-3524617892004055830.post-40087600663739738922015-09-19T10:32:00.001+02:002015-09-19T10:32:09.676+02:00Predicting Titanic deaths on Kaggle VI: StanIt is a bit a contradiction. Kaggle provides competitions on data science, while <a href="http://mc-stan.org/">Stan</a> is clearly part of the (Bayesian) statistics. Yet after using random forests, boosting and bagging, I also think this problem has a suitable size for Stan, which I understand can handle larger problems than older Bayesian software such as JAGS.<br />
<div>
What I aim to do is enter a load of variables in the Stan model. Aliasing will be ignored, and I hope the hierarchical model will provide suitable shrinkage for terms which are not relevant. </div>
<h3>
Data</h3>
<div>
The data has been mildly adapted from previously. Biggest change is that I have decided to make age into a factor, based on the value when present, with a special level for missing. The alternative in context of a Bayesian model would be to make fitting age part of the model, but that seems more complex than I am willing to go at this point.</div>
<h3>
Model</h3>
<div>
The idea of the model is pretty simple. it will be logistic regression, with only factors in the independent variables. All levels in all factors are used. So when Sex is entered in the model, it will add two parameter values, one for male, one for female (see code below, variable fsex). When passenger class is entered in the model, it has three values (variabe fpclass). Upon the parameter values there is a prior distribution. I have chosen a normal prior distribution, around zero with standard deviation sdsex and sdpclass respectively. These have a common prior (sd1). I have used both half normal with standard deviation 3 as below and uniform (0,3) for the prior.</div>
<div>
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;"> parameters {</span></div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> real fsex[2];</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> real intercept;</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> real fpclass[3];</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> real <lower=0> sdsex;</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> real <lower=0> sdpclass;</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> real <lower=0> sd1;</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> transformed parameters {</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> real expect[ntrain];</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> for (i in 1:ntrain) {</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> expect[i] <- inv_logit(</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> intercept+</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> fsex[sex[i]]+</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> fpclass[pclass[i]]</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> );</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span><span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;"> </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> model { </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> fsex ~ normal(0,sdsex);</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> fpclass ~ normal(0,sdpclass);</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> sdsex ~ normal(0,sd1);</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> sdpclass ~ normal(0,sd1);</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> sd1 ~ normal(0,3);</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> intercept ~ normal(0,1);</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> survived ~ bernoulli(expect);</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span></div>
</div>
<h4>
Predictions</h4>
<div>
In this phase I have decided to make the predictions within the Stan program. The way which seemed to work is to duplicate all independent variables and do the predictions in the generated quantities section of the program. These predictions are actually probabilities of survival for each passenger for each MCMC sample. Hence a bit of post processing is used, the mean probability is calculated and a cut off of 0.5 is used to decide the final classification.</div>
<div>
Since the Stan code runs pretty quickly after it has been compiled it is feasible to run this as a two stage process. A first run to examine the model parameters. A second run just to obtain the predictions.</div>
<h3>
Model 1</h3>
<div>
This is the parameter output of a model with just age and passenger class. It has been added here mostly to show a simple full coded example what the code looks like. With sdsex bigger than sdpclass it follows that sex had a bigger influence than class on the chance of survival.</div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">Inference for Stan model: 8fb625a6ccf29aab919e1dcd494247aa.</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">4 chains, each with iter=1000; warmup=500; thin=1; </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">post-warmup draws per chain=500, total post-warmup draws=2000.</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">intercept -0.07 0.03 0.83 -1.68 -0.65 -0.07 0.49 1.61 670 1.00</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fsex[1] 1.40 0.04 0.89 -0.46 0.83 1.40 1.98 3.07 533 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fsex[2] -1.24 0.04 0.89 -3.11 -1.81 -1.23 -0.65 0.45 529 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fpclass[1] 0.94 0.03 0.70 -0.40 0.50 0.93 1.33 2.34 536 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fpclass[2] 0.12 0.03 0.69 -1.24 -0.31 0.11 0.52 1.55 527 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fpclass[3] -0.93 0.03 0.69 -2.25 -1.35 -0.94 -0.50 0.47 499 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">sdsex 2.06 0.05 1.20 0.76 1.23 1.74 2.49 5.36 648 1.00</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">sdpclass 1.40 0.04 0.90 0.50 0.84 1.15 1.67 3.86 661 1.00</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">sd1 2.41 0.04 1.27 0.71 1.45 2.18 3.11 5.48 1071 1.00</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">lp__ -421.36 0.09 2.12 -426.25 -422.56 -421.00 -419.82 -418.19 616 1.00</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">Samples were drawn using NUTS(diag_e) at Sun Sep 13 12:28:49 2015.</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">For each parameter, n_eff is a crude measure of effective sample size,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">and Rhat is the potential scale reduction factor on split chains (at </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">convergence, Rhat=1).</span><br />
<br />
<h3>
Model 2</h3>
Model 2 has all linear effects in it. Predictions using this model were submitted, it gave a score of 0.78947 which was not an improvement over previous scores. I have one submission using bagging which did better, got same result with boosting and worse results with random forest. </div>
</div>
<div>
Note that while this model looks pretty complex, this score was obtained without any interactions. </div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">Inference for Stan model: 6278b3ebade9802ad9544b1242bada20.</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">4 chains, each with iter=1000; warmup=500; thin=1; </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">post-warmup draws per chain=500, total post-warmup draws=2000.</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">intercept 0.28 0.05 0.74 -1.20 -0.16 0.28 0.77 1.68 229 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">sd1 0.80 0.03 0.23 0.43 0.63 0.77 0.97 1.31 63 1.05</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fsex[1] 0.24 0.03 0.56 -0.64 -0.01 0.13 0.39 1.85 353 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fsex[2] -0.05 0.04 0.50 -1.29 -0.24 0.00 0.22 0.89 131 1.02</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fpclass[1] 0.54 0.07 0.57 -0.39 0.15 0.53 0.90 1.77 60 1.03</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fpclass[2] 0.15 0.06 0.50 -0.68 -0.27 0.16 0.50 1.15 76 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fpclass[3] -0.78 0.05 0.48 -1.66 -1.10 -0.79 -0.47 0.18 102 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fembarked[1] 0.26 0.02 0.33 -0.29 0.06 0.22 0.43 1.02 298 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fembarked[2] 0.08 0.02 0.32 -0.55 -0.07 0.06 0.20 0.78 456 1.00</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fembarked[3] -0.18 0.02 0.31 -0.80 -0.33 -0.18 -0.01 0.45 378 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">foe[1] -0.32 0.08 0.53 -1.63 -0.65 -0.28 0.04 0.54 41 1.07</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">foe[2] 0.04 0.02 0.43 -0.73 -0.11 0.01 0.19 1.02 433 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">foe[3] 0.34 0.03 0.48 -0.41 0.04 0.26 0.57 1.54 227 1.03</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fcabchar[1] -0.25 0.18 0.46 -1.16 -0.49 -0.13 0.03 0.45 7 1.17</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fcabchar[2] -0.04 0.03 0.35 -0.86 -0.16 -0.04 0.10 0.62 143 1.03</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fcabchar[3] 0.07 0.01 0.27 -0.47 -0.06 0.05 0.18 0.75 325 1.03</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fcabchar[4] -0.09 0.04 0.31 -0.85 -0.23 -0.03 0.08 0.48 55 1.05</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fcabchar[5] 0.22 0.02 0.33 -0.32 0.00 0.17 0.40 1.03 219 1.02</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fcabchar[6] 0.31 0.04 0.36 -0.25 0.02 0.25 0.59 1.12 76 1.03</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fcabchar[7] -0.21 0.06 0.40 -1.03 -0.43 -0.11 0.04 0.45 44 1.05</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fage[1] 0.14 0.06 0.32 -0.38 -0.04 0.04 0.28 0.78 27 1.10</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fage[2] 0.45 0.14 0.59 -0.12 0.02 0.17 0.77 1.68 19 1.17</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fage[3] 0.10 0.15 0.53 -0.74 -0.15 -0.01 0.12 1.37 13 1.24</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fage[4] -0.10 0.01 0.31 -0.86 -0.25 -0.05 0.03 0.48 558 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fage[5] 0.17 0.10 0.43 -0.52 -0.04 0.04 0.25 1.07 20 1.14</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fage[6] 0.00 0.01 0.21 -0.42 -0.09 -0.01 0.09 0.47 250 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fage[7] 0.09 0.03 0.20 -0.32 -0.01 0.05 0.22 0.47 50 1.05</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fage[8] -0.22 0.03 0.31 -1.06 -0.34 -0.15 -0.01 0.17 129 1.03</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fage[9] -0.02 0.07 0.39 -0.98 -0.16 -0.01 0.12 0.60 27 1.09</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fage[10] 0.00 0.01 0.19 -0.43 -0.06 0.01 0.06 0.40 813 1.00</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fticket[1] 0.07 0.03 0.21 -0.33 -0.06 0.04 0.22 0.51 54 1.05</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fticket[2] -0.04 0.03 0.34 -0.87 -0.17 -0.01 0.17 0.59 108 1.02</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fticket[3] -0.17 0.05 0.43 -1.23 -0.38 -0.09 0.06 0.60 69 1.04</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fticket[4] 0.01 0.05 0.31 -0.48 -0.19 0.01 0.19 0.69 40 1.06</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fticket[5] 0.02 0.02 0.36 -0.64 -0.13 -0.02 0.15 0.95 210 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fticket[6] 0.03 0.05 0.32 -0.63 -0.15 0.02 0.21 0.55 37 1.07</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fticket[7] 0.14 0.02 0.38 -0.40 -0.05 0.05 0.26 1.18 245 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fticket[8] 0.01 0.01 0.35 -0.84 -0.12 0.02 0.20 0.73 929 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fticket[9] 0.01 0.01 0.32 -0.69 -0.12 0.03 0.14 0.65 766 1.00</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fticket[10] -0.23 0.04 0.46 -1.53 -0.39 -0.08 0.03 0.38 149 1.02</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fticket[11] -0.02 0.01 0.32 -0.66 -0.16 -0.04 0.11 0.70 1001 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fticket[12] 0.37 0.05 0.47 -0.16 0.03 0.20 0.57 1.58 82 1.04</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fticket[13] -0.20 0.02 0.34 -1.05 -0.36 -0.14 0.01 0.36 221 1.03</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">ftitle[1] 1.26 0.08 0.73 -0.03 0.75 1.20 1.74 2.85 79 1.03</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">ftitle[2] 0.47 0.04 0.70 -1.05 0.13 0.45 0.87 1.79 372 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">ftitle[3] -2.27 0.03 0.66 -3.59 -2.59 -2.34 -1.88 -0.86 486 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">ftitle[4] 0.78 0.06 0.73 -0.86 0.35 0.89 1.25 2.10 144 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fsibsp[1] 0.70 0.06 0.51 -0.33 0.36 0.69 1.09 1.69 77 1.04</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fsibsp[2] 0.48 0.08 0.53 -0.62 0.14 0.49 0.93 1.46 49 1.06</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fsibsp[3] 0.67 0.11 0.65 -0.60 0.23 0.61 1.16 1.85 35 1.07</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fsibsp[4] -1.58 0.04 0.64 -2.89 -2.03 -1.52 -1.16 -0.38 292 1.02</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fparch[1] 0.25 0.02 0.33 -0.25 0.03 0.22 0.39 1.02 450 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fparch[2] 0.04 0.01 0.33 -0.53 -0.13 0.00 0.17 0.83 521 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fparch[3] -0.03 0.06 0.38 -0.64 -0.22 -0.01 0.14 0.78 35 1.07</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">fparch[4] -0.47 0.18 0.54 -1.48 -0.81 -0.31 -0.02 0.22 9 1.12</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">ffare[1] -0.01 0.01 0.15 -0.39 -0.04 0.00 0.03 0.29 421 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">ffare[2] -0.02 0.01 0.15 -0.41 -0.06 0.00 0.02 0.26 419 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">ffare[3] -0.02 0.01 0.15 -0.35 -0.06 0.00 0.02 0.26 671 1.00</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">ffare[4] 0.00 0.01 0.16 -0.31 -0.05 0.00 0.04 0.36 586 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">ffare[5] 0.08 0.01 0.20 -0.15 -0.01 0.01 0.12 0.66 247 1.02</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">sdsex 0.51 0.04 0.44 0.03 0.21 0.35 0.68 1.64 123 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">sdpclass 0.75 0.04 0.36 0.32 0.48 0.68 0.94 1.59 64 1.04</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">sdembarked 0.43 0.02 0.32 0.06 0.21 0.39 0.53 1.24 328 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">sdoe 0.56 0.05 0.38 0.06 0.32 0.47 0.73 1.51 48 1.05</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">sdcabchar 0.39 0.03 0.26 0.03 0.18 0.36 0.59 0.97 61 1.04</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">sdage 0.33 0.06 0.29 0.02 0.09 0.23 0.53 0.89 23 1.15</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">sdticket 0.35 0.03 0.24 0.04 0.19 0.30 0.45 0.99 68 1.05</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">sdtitle 1.29 0.02 0.35 0.75 1.07 1.19 1.47 2.15 320 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">sdsibsp 1.00 0.02 0.37 0.49 0.74 0.95 1.16 1.95 222 1.01</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">sdparch 0.47 0.12 0.37 0.01 0.16 0.38 0.73 1.25 10 1.12</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">sdfare 0.14 0.02 0.17 0.00 0.02 0.09 0.19 0.60 92 1.03</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">lp__ -342.28 2.74 15.87 -372.33 -353.23 -344.26 -330.95 -310.12 34 1.09</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">Samples were drawn using NUTS(diag_e) at Sun Aug 30 13:00:19 2015.</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">For each parameter, n_eff is a crude measure of effective sample size,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">and Rhat is the potential scale reduction factor on split chains (at </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">convergence, Rhat=1).</span></div>
</div>
<h3>
Code</h3>
<h4>
Data reading</h4>
<div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;"># preparation and data reading section</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">library(rstan)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">rstan_options(auto_write = TRUE)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">options(mc.cores = parallel::detectCores())</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;"># read and combine</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">train <- read.csv('train.csv')</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">train$status <- 'train'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">test <- read.csv('test.csv')</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">test$status <- 'test'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">test$Survived <- NA</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt <- rbind(test,train)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;"># generate variables</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$Embarked[tt$Embarked==''] <- 'S'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$Embarked <- factor(tt$Embarked)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$Pclass <- factor(tt$Pclass)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$Survived <- factor(tt$Survived)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$age <- tt$Age</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$age[is.na(tt$age)] <- 999</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$age <- cut(tt$age,c(0,2,5,9,12,15,21,55,65,100,1000))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$Title <- sapply(tt$Name,function(x) strsplit(as.character(x),'[.,]')[[1]][2])</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$Title <- gsub(' ','',tt$Title)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$Title[tt$Title=='Dr' & tt$Sex=='female'] <- 'Miss'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$Title[tt$Title %in% c('Capt','Col','Don','Sir','Jonkheer','Major','Rev','Dr')] <- 'Mr'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$Title[tt$Title %in% c('Lady','Ms','theCountess','Mlle','Mme','Ms','Dona')] <- 'Miss'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$Title <- factor(tt$Title)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;"># changed cabin character</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$cabchar <- substr(tt$Cabin,1,1)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$cabchar[tt$cabchar %in% c('F','G','T')] <- 'X';</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$cabchar <- factor(tt$cabchar)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$ncabin <- nchar(as.character(tt$Cabin))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$cn <- as.numeric(gsub('[[:space:][:alpha:]]','',tt$Cabin))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$oe <- factor(ifelse(!is.na(tt$cn),tt$cn%%2,-1))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$Fare[is.na(tt$Fare)]<- median(tt$Fare,na.rm=TRUE)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$ticket <- sub('[[:digit:]]+$','',tt$Ticket)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$ticket <- toupper(gsub('(\\.)|( )|(/)','',tt$ticket))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$ticket[tt$ticket %in% c('A2','A4','AQ3','AQ4','AS')] <- 'An'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$ticket[tt$ticket %in% c('SCA3','SCA4','SCAH','SC','SCAHBASLE','SCOW')] <- 'SC'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$ticket[tt$ticket %in% c('CASOTON','SOTONO2','SOTONOQ')] <- 'SOTON'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$ticket[tt$ticket %in% c('STONO2','STONOQ')] <- 'STON'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$ticket[tt$ticket %in% c('C')] <- 'CA'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$ticket[tt$ticket %in% c('SOC','SOP','SOPP')] <- 'SOP'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$ticket[tt$ticket %in% c('SWPP','WC','WEP')] <- 'W'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$ticket[tt$ticket %in% c('FA','FC','FCC')] <- 'F'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$ticket[tt$ticket %in% c('PP','PPP','LINE','LP','SP')] <- 'PPPP'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$ticket <- factor(tt$ticket)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">tt$fare <- cut(tt$Fare,breaks=c(min(tt$Fare)-1,quantile(tt$Fare,seq(.2,.8,.2)),max(tt$Fare)+1))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">train <- tt[tt$status=='train',]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">test <- tt[tt$status=='test',]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">#end of preparation and data reading</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">options(width=90)</span></div>
</div>
<h4>
First model</h4>
<div>
<span style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace; font-size: xx-small;">datain <- list(</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;">
</span>
<br />
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: xx-small;"> survived = c(0,1)[train$Survived],</span></div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><span style="background-color: #f3f3f3;">
</span></span>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><span style="background-color: #f3f3f3;"> ntrain = nrow(train),</span></span></div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">
<div>
<span style="background-color: #f3f3f3;"> ntest=nrow(test),</span></div>
<div>
<span style="background-color: #f3f3f3;"> sex=c(1:2)[train$Sex],</span></div>
<div>
<span style="background-color: #f3f3f3;"> psex=c(1:2)[test$Sex],</span></div>
<div>
<span style="background-color: #f3f3f3;"> pclass=c(1:3)[train$Pclass],</span></div>
<div>
<span style="background-color: #f3f3f3;"> ppclass=c(1:3)[test$Pclass] </span></div>
<div>
<span style="background-color: #f3f3f3;">)</span></div>
<div>
<span style="background-color: #f3f3f3;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3;">parameters=c('intercept','fsex','fpclass','sdsex','sdpclass','sd1')</span></div>
<div>
<span style="background-color: #f3f3f3;">my_code <- ' </span></div>
<div>
<span style="background-color: #f3f3f3;"> data {</span></div>
<div>
<span style="background-color: #f3f3f3;"> int<lower=0> ntrain;</span></div>
<div>
<span style="background-color: #f3f3f3;"> int<lower=0> ntest;</span></div>
<div>
<span style="background-color: #f3f3f3;"> int survived[ntrain];</span></div>
<div>
<span style="background-color: #f3f3f3;"> int <lower=1,upper=2> sex[ntrain];</span></div>
<div>
<span style="background-color: #f3f3f3;"> int <lower=1,upper=2> psex[ntest];</span></div>
<div>
<span style="background-color: #f3f3f3;"> int <lower=1,upper=3> pclass[ntrain];</span></div>
<div>
<span style="background-color: #f3f3f3;"> int <lower=1,upper=3> ppclass[ntest];</span></div>
<div>
<span style="background-color: #f3f3f3;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3;"> }</span></div>
<div>
<span style="background-color: #f3f3f3;"> parameters {</span></div>
<div>
<span style="background-color: #f3f3f3;"> real fsex[2];</span></div>
<div>
<span style="background-color: #f3f3f3;"> real intercept;</span></div>
<div>
<span style="background-color: #f3f3f3;"> real fpclass[3];</span></div>
<div>
<span style="background-color: #f3f3f3;"> real <lower=0> sdsex;</span></div>
<div>
<span style="background-color: #f3f3f3;"> real <lower=0> sdpclass;</span></div>
<div>
<span style="background-color: #f3f3f3;"> real <lower=0> sd1;</span></div>
<div>
<span style="background-color: #f3f3f3;"> }</span></div>
<div>
<span style="background-color: #f3f3f3;"> transformed parameters {</span></div>
<div>
<span style="background-color: #f3f3f3;"> real expect[ntrain];</span></div>
<div>
<span style="background-color: #f3f3f3;"> for (i in 1:ntrain) {</span></div>
<div>
<span style="background-color: #f3f3f3;"> expect[i] <- inv_logit(</span></div>
<div>
<span style="background-color: #f3f3f3;"> intercept+</span></div>
<div>
<span style="background-color: #f3f3f3;"> fsex[sex[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3;"> fpclass[pclass[i]]</span></div>
<div>
<span style="background-color: #f3f3f3;"> );</span></div>
<div>
<span style="background-color: #f3f3f3;"> }</span></div>
<div>
<span style="background-color: #f3f3f3;"> </span></div>
<div>
<span style="background-color: #f3f3f3;"> }</span></div>
<div>
<span style="background-color: #f3f3f3;"> model { </span></div>
<div>
<span style="background-color: #f3f3f3;"> fsex ~ normal(0,sdsex);</span></div>
<div>
<span style="background-color: #f3f3f3;"> fpclass ~ normal(0,sdpclass);</span></div>
<div>
<span style="background-color: #f3f3f3;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3;"> sdsex ~ normal(0,sd1);</span></div>
<div>
<span style="background-color: #f3f3f3;"> sdpclass ~ normal(0,sd1);</span></div>
<div>
<span style="background-color: #f3f3f3;"> sd1 ~ normal(0,3);</span></div>
<div>
<span style="background-color: #f3f3f3;"> intercept ~ normal(0,1);</span></div>
<div>
<span style="background-color: #f3f3f3;"> survived ~ bernoulli(expect);</span></div>
<div>
<span style="background-color: #f3f3f3;"> }</span></div>
<div>
<span style="background-color: #f3f3f3;"> generated quantities {</span></div>
<div>
<span style="background-color: #f3f3f3;"> real pred[ntest];</span></div>
<div>
<span style="background-color: #f3f3f3;"> for (i in 1:ntest) {</span></div>
<div>
<span style="background-color: #f3f3f3;"> pred[i] <- inv_logit(</span></div>
<div>
<span style="background-color: #f3f3f3;"> intercept+</span></div>
<div>
<span style="background-color: #f3f3f3;"> fsex[psex[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3;"> fpclass[ppclass[i]]</span></div>
<div>
<span style="background-color: #f3f3f3;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3;"> );</span></div>
<div>
<span style="background-color: #f3f3f3;"> }</span></div>
<div>
<span style="background-color: #f3f3f3;"> }</span></div>
<div>
<span style="background-color: #f3f3f3;"> '</span></div>
<div>
<span style="background-color: #f3f3f3;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3;">fit1 <- stan(model_code = my_code, </span></div>
<div>
<span style="background-color: #f3f3f3;"> data = datain, </span></div>
<div>
<span style="background-color: #f3f3f3;"> pars=parameters,</span></div>
<div>
<span style="background-color: #f3f3f3;"> iter = 1000, </span></div>
<div>
<span style="background-color: #f3f3f3;"> chains = 4,</span></div>
<div>
<span style="background-color: #f3f3f3;"> open_progress=FALSE)</span></div>
<div>
<span style="background-color: #f3f3f3;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3;">fit1</span></div>
</span></div>
<h4>
Second model</h4>
<div>
<div>
<br /></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">datain <- list(</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> survived = c(0,1)[train$Survived],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ntrain = nrow(train),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ntest=nrow(test),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> sex=c(1:2)[train$Sex],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> psex=c(1:2)[test$Sex],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> pclass=c(1:3)[train$Pclass],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ppclass=c(1:3)[test$Pclass],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> embarked=c(1:3)[train$Embarked],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> pembarked=c(1:3)[test$Embarked],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> oe=c(1:3)[train$oe],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> poe=c(1:3)[test$oe],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cabchar=c(1:7)[train$cabchar],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> pcabchar=c(1:7)[test$cabchar],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> age=c(1:10)[train$age],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> page=c(1:10)[test$age],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ticket=c(1:13)[train$ticket],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> pticket=c(1:13)[test$ticket],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> title=c(1:4)[train$Title],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ptitle=c(1:4)[test$Title],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> sibsp=c(1:4,rep(4,6))[train$SibSp+1],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> psibsp=c(1:4,rep(4,6))[test$SibSp+1],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> parch=c(1:4,rep(4,6))[train$Parch+1],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> pparch=c(1:4,rep(4,6))[test$Parch+1],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fare=c(1:5)[train$fare],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> pfare=c(1:5)[test$fare]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">parameters=c('intercept','sd1',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> 'fsex','fpclass','fembarked',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> 'foe','fcabchar','fage',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> 'fticket','ftitle',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> 'fsibsp','fparch',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> 'ffare',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> 'sdsex','sdpclass','sdembarked',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> 'sdoe','sdcabchar','sdage',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> 'sdticket','sdtitle',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> 'sdsibsp','sdparch',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> 'sdfare')</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">my_code <- ' </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int<lower=0> ntrain;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int<lower=0> ntest;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int survived[ntrain];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=2> sex[ntrain];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=2> psex[ntest];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=3> pclass[ntrain];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=3> ppclass[ntest];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=3> embarked[ntrain];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=3> pembarked[ntest];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=3> oe[ntrain];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=3> poe[ntest];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=7> cabchar[ntrain];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=7> pcabchar[ntest];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=10> age[ntrain];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=10> page[ntest];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=13> ticket[ntrain];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=13> pticket[ntest];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=4> title[ntrain];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=4> ptitle[ntest];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=4> sibsp[ntrain];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=4> psibsp[ntest];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=4> parch[ntrain];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=4> pparch[ntest];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=5> fare[ntrain];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> int <lower=1,upper=5> pfare[ntest];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> parameters {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real fsex[2];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real intercept;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real fpclass[3];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real fembarked[3];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real foe[3];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real fcabchar[7];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real fage[10];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real fticket[13];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real ftitle[4];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real fparch[4];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real fsibsp[4];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real ffare[5];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real <lower=0> sdsex;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real <lower=0> sdpclass;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real <lower=0> sdembarked;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real <lower=0> sdoe;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real <lower=0> sdcabchar;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real <lower=0> sdage;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real <lower=0> sdticket;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real <lower=0> sdtitle;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real <lower=0> sdparch;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real <lower=0> sdsibsp;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real <lower=0> sdfare;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real <lower=0> sd1;</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> transformed parameters {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real expect[ntrain];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> for (i in 1:ntrain) {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> expect[i] <- inv_logit(</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> intercept+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fsex[sex[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fpclass[pclass[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fembarked[embarked[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> foe[oe[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fcabchar[cabchar[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fage[age[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fticket[ticket[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ftitle[title[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fsibsp[sibsp[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fparch[parch[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ffare[fare[i]]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> );</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> model { </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fsex ~ normal(0,sdsex);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fpclass ~ normal(0,sdpclass);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fembarked ~ normal(0,sdembarked);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> foe ~ normal(0,sdoe);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fcabchar ~ normal(0,sdcabchar);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fage ~ normal(0,sdage);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fticket ~ normal(0,sdticket);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ftitle ~ normal(0,sdtitle);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fsibsp ~ normal(0,sdsibsp);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fparch ~ normal(0,sdparch);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ffare ~ normal(0,sdfare);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> sdsex ~ normal(0,sd1);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> sdpclass ~ normal(0,sd1);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> sdembarked ~ normal(0,sd1);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> sdoe ~ normal(0,sd1);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> sdcabchar ~ normal(0,sd1);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> sdage ~ normal(0,sd1);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> sdticket ~ normal(0,sd1);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> sdtitle ~ normal(0,sd1);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> sdsibsp ~ normal(0,sd1);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> sdparch ~ normal(0,sd1);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> sdfare ~ normal(0,sd1);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> sd1 ~ normal(0,1);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> intercept ~ normal(0,1);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> survived ~ bernoulli(expect);</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> generated quantities {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> real pred[ntest];</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> for (i in 1:ntest) {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> pred[i] <- inv_logit(</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> intercept+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fsex[psex[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fpclass[ppclass[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fembarked[pembarked[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> foe[poe[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fcabchar[pcabchar[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fage[page[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fticket[pticket[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ftitle[ptitle[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fsibsp[psibsp[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fparch[pparch[i]]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ffare[pfare[i]]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> );</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> '</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">fit1 <- stan(model_code = my_code, </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data = datain, </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> pars=parameters,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> iter = 1000, </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> chains = 4,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> open_progress=FALSE)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">fit1</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#plot(fit1,ask=TRUE)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#traceplot(fit1,ask=TRUE)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">fit2 <- stan(model_code = my_code, </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data = datain, </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> fit=fit1,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> pars=c('pred'),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> iter = 2000, </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> warmup =200,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> chains = 4,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> open_progress=FALSE)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">fit3 <- as.matrix(fit2)[,-419]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#plots of individual passengers</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#plot(density(fit3[,1]))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#plot(density(fit3[,18]))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#plot(density(as.numeric(fit3),adjust=.3))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">decide1 <- apply(fit3,2,function(x) mean(x)>.5)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">decide2 <- apply(fit3,2,function(x) median(x)>.5)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#table(decide1,decide2)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">out <- data.frame(</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PassengerId=test$PassengerId,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Survived=as.numeric(decide1),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> row.names=NULL)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">write.csv(x=out,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> file='stanlin.csv',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> row.names=FALSE,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> quote=FALSE)</span></div>
</div>
<div>
<br /></div>
Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com3tag:blogger.com,1999:blog-3524617892004055830.post-43066302911160109792015-09-06T10:56:00.002+02:002015-09-06T10:56:32.369+02:00Predicting Titanic deaths on Kaggle V: RangerIn two previous posts (<a href="http://wiekvoet.blogspot.nl/2015/08/predicting-titanic-deaths-on-kaggle-iv.html">Predicting Titanic deaths on Kaggle IV: random forest revisited</a>, <a href="http://wiekvoet.blogspot.nl/2015/07/predicting-titanic-deaths-on-kaggle.html">Predicting Titanic deaths on Kaggle</a>) I was unable to make random forest predict as well as boosting. Hence when I read about an alternative implementation; <a href="http://nuit-blanche.blogspot.nl/2015/08/ranger-fast-implementation-of-random.html">ranger</a> I took the opportunity to check if with ranger I could improve predictions. The claim ranger makes is that it is faster than RandomForest.<br />
Meanwhile, I have also been reading that RandomForest is not the best implementation. I did not bookmark where, but Google turned up <a href="http://www.wise.io/tech/benchmarking-random-forest-part-1">Benchmarking Random Forest Classification</a> on <a href="http://wise.io/">wise.io</a>, and <a href="http://datascience.la/benchmarking-random-forest-implementations/">Benchmarking Random Forest Implementations</a> on <a href="http://datascience.la/">DataScience.LA</a>.<br />
<h3>
Data</h3>
A slightly adapted version of pre-processing was used. I created a more simple version of the cabin character, thereby moving it from different variables A to F to one multilevel factor with values A, B, C, D, E, X. In addition ticket has been made in a factor with more levels.<br />
<h3>
Age</h3>
Ranger is supposed to be fast, so I took the opportunity to do a replicated cross validation for predicting missing values for age. The cross validation is replicated because in previous experiments I found the difference between two settings might be smaller than the variation within settings. For this errorest() from the ipred package was used and a wrapper around ranger's predict was written to make it function. The plot below shows cross validation error for 10 different runs for each of the settings, so also has an indication of the variation of the cross validation error. From the plot. it was decided to chose mtry=2, nodesize=7. As can be seen from the plot, mtry 1 has the worse predictions, while at nodesize 7 and higher reasonable predictions can be made. In hindsight, bigger nodesizes might even be better, but this was not investigated at the time.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqz56mWxjavyi07PYpwRx39eUdY21Nyk5Y1BO5FrEWL-iyEPvlnaIdSrJYW9QJlhitYfYogyzrtB3GZiVusMm8ErIiK86fsIW3_lTSJlopfEodX46_kexNWJQJEyWp1cGtk4dEzIk02c4/s1600/agedens.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqz56mWxjavyi07PYpwRx39eUdY21Nyk5Y1BO5FrEWL-iyEPvlnaIdSrJYW9QJlhitYfYogyzrtB3GZiVusMm8ErIiK86fsIW3_lTSJlopfEodX46_kexNWJQJEyWp1cGtk4dEzIk02c4/s1600/agedens.png" /></a></div>
<h3>
Survival</h3>
<div>
Again using replicated cross validation, the following prediction errors were found. No mtry=1 in the plot, but that did not perform well. In this plot it does not seem like there are huge differences, but the combination of small nodesize and larger mtry does not seem to pay off. Since these prediction errors are in the same range as previous results it was decided not to make a Kaggle submission on these data.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhRuBg1Nv-js9YFHIf33Tl_E9enM50wQnXiXg9BfqCdjGx5v4CNqxRvwbequ_TzXFGrU9W8_zoHuzmqECK_78qSgXwpVbrG8CQtDeLLBAtXqPdQvcnUTPX1rcDkqN7BP7FzXy8JxN5ceZY/s1600/rangercheck.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhRuBg1Nv-js9YFHIf33Tl_E9enM50wQnXiXg9BfqCdjGx5v4CNqxRvwbequ_TzXFGrU9W8_zoHuzmqECK_78qSgXwpVbrG8CQtDeLLBAtXqPdQvcnUTPX1rcDkqN7BP7FzXy8JxN5ceZY/s1600/rangercheck.png" /></a></div>
<h4>
Number of trees</h4>
<div>
Since it is unclear to me what the influence of the number of trees was, I did a small experiment with 50, 500 and 5000 trees. Again 10 times a cross validation. In this plot, 50 trees gives a surprising good prediction error for such a simple model, but 5000 is a bit better. Rather than investigating if there is sufficient number of trees, I cut the corner and chose a large number; 200000. It should be noted that this fit into memory and only took a few minutes. However, this did not improve the Kaggle score of my previous random forest attempt.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFmU10Qo1jRBplhkkfgKAZYEfU5iKBSYEJ5vaWpxGLb3IxPhUwS5D1ZUQURxBLfeZ2iZNedlo3vAAHZ_sEC6htvlKOK4mnYxOrS6xgitwALjX7uCh3OJRQzOua-jJB7n0iCGaggwKvKF8/s1600/rangersize.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFmU10Qo1jRBplhkkfgKAZYEfU5iKBSYEJ5vaWpxGLb3IxPhUwS5D1ZUQURxBLfeZ2iZNedlo3vAAHZ_sEC6htvlKOK4mnYxOrS6xgitwALjX7uCh3OJRQzOua-jJB7n0iCGaggwKvKF8/s1600/rangersize.png" /></a></div>
<h3>
Conclusion</h3>
<div>
Ranger is indeed a fast and memory sparse random forest implementation. However, it was not able to improve my prediction error.</div>
<h3>
Code</h3>
<div>
Please know that code has been reformatted after pasting in blogger to improve layout. Some intermediate data saving and restoring code has been removed too.</div>
<div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># preparation and data reading section</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(ranger)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#https://www.reddit.com/r/MachineLearning/comments/3hvy7v/ranger_a_fast_implementation_of_random_forests/</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(lattice)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(latticeExtra)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># has cross validation</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(ipred)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># read and combine</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">train <- read.csv('train.csv')</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">train$status <- 'train'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">test <- read.csv('test.csv')</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">test$status <- 'test'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">test$Survived <- NA</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt <- rbind(test,train)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># generate variables</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Embarked[tt$Embarked==''] <- 'S'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Embarked <- factor(tt$Embarked)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Pclass <- factor(tt$Pclass)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Survived <- factor(tt$Survived)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title <- sapply(tt$Name,function(x) strsplit(as.character(x),'[.,]')[[1]][2])</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title <- gsub(' ','',tt$Title)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title[tt$Title=='Dr' & tt$Sex=='female'] <- 'Miss'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title[tt$Title %in% c('Capt','Col','Don','Sir','Jonkheer',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> 'Major','Rev','Dr')] <- 'Mr'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title[tt$Title %in% c('Lady','Ms',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> 'theCountess','Mlle','Mme','Ms','Dona')] <- 'Miss'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title <- factor(tt$Title)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># changed cabin character</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$cabchar <- substr(tt$Cabin,1,1)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$cabchar[tt$cabchar %in% c('F','G','T')] <- 'X';</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$cabchar <- factor(tt$cabchar)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ncabin <- nchar(as.character(tt$Cabin))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$cn <- as.numeric(gsub('[[:space:][:alpha:]]','',tt$Cabin))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$oe <- factor(ifelse(!is.na(tt$cn),tt$cn%%2,-1))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Fare[is.na(tt$Fare)]<- median(tt$Fare,na.rm=TRUE)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket <- sub('[[:digit:]]+$','',tt$Ticket)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket <- toupper(gsub('(\\.)|( )|(/)','',tt$ticket))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket[tt$ticket %in% c('A2','A4','AQ3','AQ4','AS')] <- 'An'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket[tt$ticket %in% c('SCA3','SCA4','SCAH','SC','SCAHBASLE','SCOW')] <- 'SC'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket[tt$ticket %in% c('CASOTON','SOTONO2','SOTONOQ')] <- 'SOTON'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket[tt$ticket %in% c('STONO2','STONOQ')] <- 'STON'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket[tt$ticket %in% c('C')] <- 'CA'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket[tt$ticket %in% c('SOC','SOP','SOPP')] <- 'SOP'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket[tt$ticket %in% c('SWPP','WC','WEP')] <- 'W'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket[tt$ticket %in% c('FA','FC','FCC')] <- 'F'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket[tt$ticket %in% c('PP','PPP','LINE','LP','SP')] <- 'PPPP'</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ticket <- factor(tt$ticket)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#end of preparation and data reading</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># age section</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># get an age without missings</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">forage <- tt[!is.na(tt$Age) & tt$status=='train',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> names(tt) %in% c('Age','Sex','Pclass','SibSP',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> 'Parch','Fare','Title','Embarked','cabchar','ncabin','ticket')]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># oe is side of vessel, not relevant for age?</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">totest <- expand.grid(mtry=1:4,min.node.size=1:11,rep=1:10)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">la <- lapply(1:nrow(totest),function(ii) {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ee <- errorest(Age ~ .,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mtry=totest$mtry[ii],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> min.node.size=totest$min.node.size[ii],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> model=ranger,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> predict=function(object,newdata) </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> predict(object,data=newdata)$predictions,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> write.forest=TRUE,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data=forage)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cc <- c(mtry=totest$mtry[ii],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> min.node.size=totest$min.node.size[ii],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> error=ee$error)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> #<span class="Apple-tab-span" style="white-space: pre;"> </span>print(cc)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cc</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> })</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">sla <- do.call(rbind,la)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">sla <- as.data.frame(sla)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">useOuterStrips(</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> densityplot(~ error | factor(mtry)+factor(min.node.size), </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data=sla))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># 2,7?</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">rfa1 <- ranger(Age ~ .,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data=forage,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mtry=2,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> write.forest=TRUE,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> min.node.size=7)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace; font-size: x-small;">tt$AGE <- tt$Age</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$AGE[is.na(tt$AGE)] <- predict(rfa1,tt[is.na(tt$AGE),])$predictions</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">table(tt$age)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># end of age section</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#final data section</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace; font-size: x-small;">train <- tt[tt$status=='train',]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">test <- tt[tt$status=='test',]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#end of final data section</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#model selection 1</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">forSurf <- train[,names(train) %in% c('Survived','AGE','Sex','Pclass','SibSP',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> 'Parch','Fare','Title','Embarked','ncabin','ticket','oe')]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">totest <- expand.grid(mtry=2:5,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> min.node.size=c(1:4,seq(6,12,2),15,20,25),rep=1:10)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">la2 <- lapply(1:nrow(totest),function(ii) {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ee <- errorest(Survived ~.,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mtry=totest$mtry[ii],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> min.node.size=totest$min.node.size[ii],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> model=ranger,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> predict=function(object,newdata) </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> predict(object,data=newdata)$predictions,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> write.forest=TRUE,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data=forSurf</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> )</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cc <- c(mtry=totest$mtry[ii],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> min.node.size=totest$min.node.size[ii],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> error=ee$error)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cat('.')</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> if (totest$mtry[ii]==max(totest$mtry) & </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> totest$min.node.size[ii]==max(totest$min.node.size)) </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cat('\n')</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cc</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> })</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">sla2 <- do.call(rbind,la2)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">sla2 <- as.data.frame(sla2)</span></div>
<div>
<span style="background-color: #f3f3f3;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">useOuterStrips(</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> densityplot(~ error | factor(mtry)+factor(min.node.size), </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data=sla2))</span></div>
<div>
<span style="background-color: #f3f3f3;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace; font-size: x-small;">############</span></div>
<div>
<span style="background-color: #f3f3f3;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">totest <- expand.grid(num.trees=c(50,500,5000),rep=1:10)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">la3 <- lapply(1:nrow(totest),function(ii) {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ee <- errorest(Survived ~.,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mtry=4,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> min.node.size=12,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> num.trees=totest$num.trees[ii],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> model=ranger,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> predict=function(object,newdata) </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> predict(object,data=newdata)$predictions,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> write.forest=TRUE,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data=forSurf</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> )</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cc <- c(num.trees=totest$num.trees[ii],error=ee$error)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cat('.')</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> if (totest$num.trees[ii]==max(totest$num.trees)) cat('\n')</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cc</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> })</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">sla3 <- do.call(rbind,la3)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">sla3 <- as.data.frame(sla3)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace; font-size: x-small;">densityplot(~ error | factor(num.trees), data=sla3)</span></div>
<div>
<span style="background-color: #f3f3f3;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#########</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace; font-size: x-small;">rang3 <- ranger(Survived ~ .,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mtry=4,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> min.node.size=12,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> write.forest=TRUE,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data=forSurf,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> num.trees=200000) </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">pp <- predict(rang3,test)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">out <- data.frame(</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PassengerId=test$PassengerId,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Survived=pp$predictions,row.names=NULL)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">write.csv(x=out,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> file='rf.1.sep.csv',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> row.names=FALSE,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> quote=FALSE)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># Your submission scored 0.75598, which is not an improvement of your best score.</span></div>
</div>
Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com0tag:blogger.com,1999:blog-3524617892004055830.post-90437549855929337322015-08-23T16:13:00.000+02:002015-08-23T16:13:08.311+02:00Predicting Titanic deaths on Kaggle IV: random forest revisitedOn <a href="http://wiekvoet.blogspot.nl/2015/07/predicting-titanic-deaths-on-kaggle.html">July 19th</a> I used randomForest to predict the deaths on Titanic in the Kaggle competition. Subsequently I found that both bagging and boosting gave better predictions than randomForest. This I found somewhat unsatisfactory, hence I am now revisiting randomForest. To my disappointment this does not result in predictions as good as bagging and boosting.<br />
Note that all code is at the bottom of the post<br />
<h3>
Data</h3>
Data has not changed very much.<br />
<h3>
Age</h3>
Since ipred package has a nice function for obtaining error using cross-validation, getting better predictions for Age when not in the data is the first adaptation. The model parameters to be optimized are mtry and nodesize. The plot shows that mtry=5 and nodesize=4 should give the best predictions.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiv29mOHg40dx6ApS6FH73Jfy_Ps_5yBjRMPlUlHsvkM-Csv_WKykHr4500nfjFlKIWWHHfTzmvMB-DHZqyxNu4gfXyix0c9waqniLkmzw1U16b3ZB8QoGQ2BAZXvkF1QrQ5AX_73dv6ow/s1600/ageplot1.20150822.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiv29mOHg40dx6ApS6FH73Jfy_Ps_5yBjRMPlUlHsvkM-Csv_WKykHr4500nfjFlKIWWHHfTzmvMB-DHZqyxNu4gfXyix0c9waqniLkmzw1U16b3ZB8QoGQ2BAZXvkF1QrQ5AX_73dv6ow/s1600/ageplot1.20150822.png" /></a></div>
<br />
Using these settings, the following predicted vs observed ages are obtained. I am not really impressed.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8S8366GcyG49lodlv2QFpJNEGj5lSWOjR6YTgbv3BBu6PF6fs7SUAcgv4bCfko7dVUwoRGT_EzakNbOXcQs8hE6C3b0SBNME1K71dkWQcHmZzzec0T6Akc5MypTioREELWpCG_YHV35A/s1600/ageplot2.20150822.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8S8366GcyG49lodlv2QFpJNEGj5lSWOjR6YTgbv3BBu6PF6fs7SUAcgv4bCfko7dVUwoRGT_EzakNbOXcQs8hE6C3b0SBNME1K71dkWQcHmZzzec0T6Akc5MypTioREELWpCG_YHV35A/s1600/ageplot2.20150822.png" /></a></div>
<h3>
Survival model 1</h3>
<h4>
Model building 1</h4>
Having complete data, the next step is using cross-validation to select nodesize and mtry for the survival model. The following predictive capability was observed. Note that the error in these models is a bit larger than observed previously with bagging and boosting. However, observing that, does not suggest a remedy. It was chosen to use nodesize=3 mtry=7.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWsESE5Tk0zvZG3ZLzwc0LDIlPlkqvSyJFOzgFNY6TDg4PECx51U46Ljstj1DbiU8DwGYbfK2cK4OmX3P4cToflKv62IrLs-fA-kAwzMAvAKD1FZ6lk4laYH8OpCG9Itlg6Fd1xtQ39G0/s1600/plot1.20150822.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWsESE5Tk0zvZG3ZLzwc0LDIlPlkqvSyJFOzgFNY6TDg4PECx51U46Ljstj1DbiU8DwGYbfK2cK4OmX3P4cToflKv62IrLs-fA-kAwzMAvAKD1FZ6lk4laYH8OpCG9Itlg6Fd1xtQ39G0/s1600/plot1.20150822.png" /></a></div>
<h4>
Evaluation Model 1</h4>
There are a number of ways to have randomForest give predictions. One can just ask for the categories, or the probability of a category. At this point I am looking at those probabilities, since I think the model might be improved. For this improvement, I do need to understand what is happening. Using the model, the following out of bag probabilities per category are found (pp[,1] is the probability of category 0). This is not ideal. Ideally most of the probabilities are close to 0 and 1. But here there are quite a number where this is not the case. Especially category 1 is not easily found and quite a few of the category 1 are seen as category 0. Hence the question becomes if it is possible to get better defined categories. As a first step, I will try to optimize the point where the cut is made between the two categories.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJbz836BlTavqxsh1HDhQCOVxEoXl4nCzPDmaCEAqP9cTe8OXOTgOS2VI5Uet0x5C-sYlFBTa5ev3PfaJbkT1lD9xnm9JQrPkTkobBgR4AZN7nP7mwtXecRiXSEWOhy8G-Do1HeRxUydY/s1600/prob120150822.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJbz836BlTavqxsh1HDhQCOVxEoXl4nCzPDmaCEAqP9cTe8OXOTgOS2VI5Uet0x5C-sYlFBTa5ev3PfaJbkT1lD9xnm9JQrPkTkobBgR4AZN7nP7mwtXecRiXSEWOhy8G-Do1HeRxUydY/s1600/prob120150822.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
The plot below shows the number of correct predictions as function of the cut off point. It shows that the whole center region is a possible cut off, except near 0.4. The value 0.5 is not optimal. </div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiqv4nm3tpPgpF1di7b_pSO-P4zw8NjY8UN5D-dJ4ktK-ikyvmkbpbRgxx5cBpEd4hRzG8ggWu1BEHEKf9s7vcx9pDBVCNGfVleu5Z3JsXLQ_NkT9BTTqIssdofsJRgjYL64DmjXLs6XQA/s1600/cuts20150822.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiqv4nm3tpPgpF1di7b_pSO-P4zw8NjY8UN5D-dJ4ktK-ikyvmkbpbRgxx5cBpEd4hRzG8ggWu1BEHEKf9s7vcx9pDBVCNGfVleu5Z3JsXLQ_NkT9BTTqIssdofsJRgjYL64DmjXLs6XQA/s1600/cuts20150822.png" /></a></div>
<h4>
Examination of cut of point</h4>
After making this plot I wondered if this shape would be same for other settings of nodesize and mtry. Since I have a distinct feeling it is all dependent on the luck of the draw, it is repeated a number of times for each setting. Based on this I have chosen that a cut off of 0.55 is appropriate for a a wider range of settings. The best out of box predictions seem to happen with a higher value for mtry and a low value for nodesize. Thinking back on the density plot, it would seem that high nodesize and low mtry has low probabilities in the center region. However, the price for that is quite some errors in out of bag predictions.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhO7b3QcL8bxs1LQMXpKC8v2FD1ByGkTW9UgQDx0HfXDFrczXVwJT49F0BHKx0NwQl0Npm8YINafKQDe8nK30QmO2-30FkG2p7e7wzXMp__DXff7Hjg9bWzu3w3u7lg6FDrT2ilv1bFM9M/s1600/cuts2.20150822.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhO7b3QcL8bxs1LQMXpKC8v2FD1ByGkTW9UgQDx0HfXDFrczXVwJT49F0BHKx0NwQl0Npm8YINafKQDe8nK30QmO2-30FkG2p7e7wzXMp__DXff7Hjg9bWzu3w3u7lg6FDrT2ilv1bFM9M/s1600/cuts2.20150822.png" /></a></div>
<h3>
Survival Model 2</h3>
<h4>
Model Building 2</h4>
Using the cut off of 0.55, again cross validation to select model parameters mtry and nodesize. Again each setting is tried a few times to get an idea of variability of prediction quality. Based on these settings I have chosen nodesize=6 and mtry=6.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjALHPVlLR4uAzP0W0mCWxsCWpa0Q3MMsCUjpJIizXPfa0CZLufqN0FH8qLFGFVhyI-otljoHml00hYpohE-NvsnOVmd2pUr_n6kWZBwLs0IGbCDQy8H0vXP7TDYLvK5EJcv8IZ6HJDtkw/s1600/cv2.20150822.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjALHPVlLR4uAzP0W0mCWxsCWpa0Q3MMsCUjpJIizXPfa0CZLufqN0FH8qLFGFVhyI-otljoHml00hYpohE-NvsnOVmd2pUr_n6kWZBwLs0IGbCDQy8H0vXP7TDYLvK5EJcv8IZ6HJDtkw/s1600/cv2.20150822.png" /></a></div>
<h4>
Submission</h4>
Your submission scored 0.75. Not really as much as I had hoped for.<br />
<h3>
Code</h3>
Note that the code has been reformatted and cleaned after pasting in the blogging application. This should not have caused any coding errors.<br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"># preparation and data reading section</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">library(randomForest)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">library(lattice)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"># has cross validation</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">library(ipred)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"># read and combine</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">train <- read.csv('train.csv')</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">train$status <- 'train'</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">test <- read.csv('test.csv')</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">test$status <- 'test'</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">test$Survived <- NA</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt <- rbind(test,train)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"># generate variables</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Pclass <- factor(tt$Pclass)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Survived <- factor(tt$Survived)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$age <- tt$Age</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$age[is.na(tt$age)] <- 35</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$age <- cut(tt$age,c(0,2,5,9,12,15,21,55,65,100))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title <- sapply(tt$Name,function(x) strsplit(as.character(x),'[.,]')[[1]][2])</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title <- gsub(' ','',tt$Title)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title[tt$Title %in% c('Capt','Col','Don','Sir','Jonkheer','Major')] <- 'Mr'</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title[tt$Title %in% c('Lady','Ms','theCountess','Mlle','Mme','Ms','Dona')] <- 'Miss'</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title <- factor(tt$Title)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$A <- factor(grepl('A',tt$Cabin))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$B <- factor(grepl('B',tt$Cabin))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$C <- factor(grepl('C',tt$Cabin))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$D <- factor(grepl('D',tt$Cabin))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$E <- factor(grepl('E',tt$Cabin))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$F <- factor(grepl('F',tt$Cabin))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ncabin <- nchar(as.character(tt$Cabin))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$PC <- factor(grepl('PC',tt$Ticket))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$STON <- factor(grepl('STON',tt$Ticket))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$cn <- as.numeric(gsub('[[:space:][:alpha:]]','',tt$Cabin))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$oe <- factor(ifelse(!is.na(tt$cn),tt$cn%%2,-1))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Fare[is.na(tt$Fare)]<- median(tt$Fare,na.rm=TRUE)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">#end of preparation and data reading</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"># age section</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"># get an age without missings</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">forage <- tt[!is.na(tt$Age) & tt$status=='train',names(tt) %in% </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> c('Age','Sex','Pclass','SibSP',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 'Parch','Fare','Title','Embarked','A','B','C','D','E','F',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 'ncabin','PC','STON','oe')]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">totest <- expand.grid(mtry=4:7,nodesize=3:6)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">la <- lapply(1:nrow(totest),function(ii) {</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ee <- errorest(Age ~.,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> mtry=totest$mtry[ii],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> nodesize=totest$nodesize[ii],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> model=randomForest,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> data=forage)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> cc <- c(mtry=totest$mtry[ii],nodesize=totest$nodesize[ii],error=ee$error)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> print(cc)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> cc</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> })</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sla <- do.call(rbind,la)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sla <- as.data.frame(sla)</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">xyplot(error ~ mtry, groups= nodesize, data=sla,auto.key=TRUE,type='l')</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;"># chosen 5,4</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">rfa1 <- randomForest(Age ~ .,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> data=forage,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ntree=1000,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> mtry=5,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> nodesize=4)</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">plot(tt$Age,predict(rfa1,tt))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">abline(a=0,b=1,col='red')</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">tt$AGE <- tt$Age</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$AGE[is.na(tt$AGE)] <- predict(rfa1,tt[is.na(tt$AGE),])</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">tt$age <- cut(tt$AGE,c(0,2,5,9,12,15,21,55,65,100))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"># end of age section</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">#final data section</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">train <- tt[tt$status=='train',]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">test <- tt[tt$status=='test',]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">#end of final data section</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">#model selection 1</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">forSurf <- train[,names(train) %in% </span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;"> c('Survived','age','AGE','Sex','Pclass','SibSP',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 'Parch','Fare','Title','Embarked','A','B','C','D','E','F',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 'ncabin','PC','STON','oe')]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"># rfx <- randomForest(Survived ~.,data=forSurf)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">totest <- expand.grid(mtry=6:9,nodesize=3:7)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">la <- lapply(1:nrow(totest),function(ii) {</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ee <- errorest(Survived ~.,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> mtry=totest$mtry[ii],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> nodesize=totest$nodesize[ii],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> model=randomForest,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> data=forSurf,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ntree=1000,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> est.para=control.errorest(k=20)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> )</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> cc <- c(mtry=totest$mtry[ii],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> nodesize=totest$nodesize[ii],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> sampsize=totest$sampsize[ii],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> error=ee$error)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> print(cc)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> cc</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> })</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sla <- do.call(rbind,la)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sla <- as.data.frame(sla)</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">xyplot(error ~ mtry, groups= nodesize, data=sla,auto.key=TRUE,type='l')</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">#end of model selection 1</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">#model evaluation section 1a</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">rfx <- randomForest(Survived ~.,data=forSurf,nodesize=3,mtry=7,ntree=1000)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">pp <- predict(rfx,type='prob')</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">densityplot(~ pp[,1] | forSurf$Survived,adj=.3)</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">cuts <- seq(.20,.7,.001) </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">plot(y=sapply(cuts,function(cc){</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> decide=factor(as.numeric(pp[,1]<cc))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> sum(decide==forSurf$Survived)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> }),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> x=cuts)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">#end of model evaluation section 1a</span><br />
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">cuts <- seq(.25,.65,.001) </span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;"># model evaluation 1b</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">eval2 <- expand.grid(nodesize=seq(4,100,8),mtry=seq(2,8,2),count=1:10)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sach <- lapply( 1:nrow(eval2),function(i) {</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> rfx <- randomForest(Survived ~.,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> data=forSurf,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> nodesize=eval2$nodesize[i],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> mtry=eval2$mtry[i],</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;"> ntree=1000)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> pp <- predict(rfx,type='prob')</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> nerr=sapply(cuts,function(cc){</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> decide=factor(as.numeric(pp[,1]<cc))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> sum(decide==forSurf$Survived)})</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> data.frame(</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> nerr=nerr,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> cuts=cuts,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> mtry=eval2$mtry[i],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> nodesize=eval2$nodesize[i],</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;"> i=rep(i,length(cuts)))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> })</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sach <- do.call(rbind,sach)</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">xyplot(nerr ~ cuts | nodesize + mtry ,group=i, data=sach,auto.key=FALSE,type='l')</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">##############</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"># # chose cuts at .55</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">##############</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">#biased prediction</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">twpred <- function(object,newdata=NULL) {</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> preds <- predict(object,newdata,type='prob')</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> factor(as.numeric(preds[,1]<0.55),levels=c('0','1'))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">}</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">totest2 <- expand.grid(mtry=seq(2,8,2),nodesize=seq(2,30,4),count=1:10)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">la2 <- lapply(1:nrow(totest2),function(ii) {</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ee <- errorest(Survived ~.,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> mtry=totest2$mtry[ii],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> nodesize=totest2$nodesize[ii],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> model=randomForest,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> data=forSurf,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ntree=500,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> predict=twpred,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> est.para=control.errorest(k=10)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> )</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> cc <- c(mtry=totest2$mtry[ii],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> nodesize=totest2$nodesize[ii],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> i=totest2$count[ii],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> error=ee$error)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> print(cc)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> cc</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> })</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sla2 <- do.call(rbind,la2)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sla2 <- as.data.frame(sla2)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">xyplot(error ~ factor(mtry) | factor(nodesize),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> groups= i, data=sla2,auto.key=FALSE,type='l')</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">##</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">#let select mtry=6, nodesize=6</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">rf2 <-randomForest(Survived ~ .,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> data=forSurf,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> replace=TRUE,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ntree=2000,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> nodesize=6,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> mtry=6) </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">pp <- predict(rf2,test)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">out <- data.frame(</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PassengerId=test$PassengerId,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Survived=pp,row.names=NULL)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">write.csv(x=out,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> file='rf.16.aug.csv',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> row.names=FALSE,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> quote=FALSE)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"># get a result</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"># Your submission scored 0.75598</span>Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com2tag:blogger.com,1999:blog-3524617892004055830.post-89046176247976808172015-08-09T11:02:00.000+02:002015-08-09T11:02:58.462+02:00Predicting Titanic deaths on Kaggle III: BaggingThis is the third post on prediction the deaths. The <a href="http://wiekvoet.blogspot.nl/2015/07/predicting-titanic-deaths-on-kaggle.html">first one</a> used randomforest, the <a href="http://wiekvoet.blogspot.nl/2015/07/predicting-titanic-deaths-on-kaggle-ii.html">second boosting</a> (gbm). The aim of the third post was to use bagging. In contrast to the former posts I abandoned dplyr in this post. It gave some now you see now you don't errors.<br />
<h3>
Data</h3>
The data is supposed to be the same as previous.<br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(ipred)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(rpart)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(lattice)</span><br />
<span style="background-color: #f3f3f3;"><br /></span>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># read and combine</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">train <- read.csv('train.csv')</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">train$status <- 'train'</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">test <- read.csv('test.csv')</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">test$status <- 'test'</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">test$Survived <- NA</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt <- rbind(test,train)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># generate variables</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Pclass <- factor(tt$Pclass)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Survived <- factor(tt$Survived)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$age <- tt$Age</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$age[is.na(tt$age)] <- 35</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$age <- cut(tt$age,c(0,2,5,9,12,15,21,55,65,100))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title <- sapply(tt$Name,function(x) strsplit(as.character(x),'[.,]')[[1]][2])</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title <- gsub(' ','',tt$Title)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title[tt$Title %in% c('Capt','Col','Don','Sir','Jonkheer','Major')] <- 'Mr'</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title[tt$Title %in% c('Lady','Ms','theCountess','Mlle','Mme','Ms','Dona')] <- 'Miss'</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Title <- factor(tt$Title)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$A <- factor(grepl('A',tt$Cabin))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$B <- factor(grepl('B',tt$Cabin))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$C <- factor(grepl('C',tt$Cabin))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$D <- factor(grepl('D',tt$Cabin))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$E <- factor(grepl('E',tt$Cabin))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$F <- factor(grepl('F',tt$Cabin))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$ncabin <- nchar(as.character(tt$Cabin))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$PC <- factor(grepl('PC',tt$Ticket))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$STON <- factor(grepl('STON',tt$Ticket))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$cn <- as.numeric(gsub('[[:space:][:alpha:]]','',tt$Cabin))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$oe <- factor(ifelse(!is.na(tt$cn),tt$cn%%2,-1))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$Fare[is.na(tt$Fare)]<- median(tt$Fare,na.rm=TRUE)</span><br />
<h3>
Age</h3>
<div>
The first step is again to predict the missing ages. Even though we have I have all data available in one data.frame, I still think the correct approach is to create the age model using only the training data. Note that I am not too impressed with the age model. Perhaps this should also be optimized.</div>
<div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">forage <- tt[!is.na(tt$Age) & tt$status=='train',names(tt) %in% </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> c('Age','Sex','Pclass','SibSP',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> 'Parch','Fare','Title','Embarked','A','B','C','D','E','F',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> 'ncabin','PC','STON','oe')]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">ipbag1 <- bagging(Age ~.,data=forage)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace; font-size: x-small;">ipbag1</span></div>
<div>
<div>
<span style="background-color: white; font-family: Courier New, Courier, monospace;">Bagging regression trees with 25 bootstrap replications </span></div>
<div>
<span style="background-color: white; font-family: Courier New, Courier, monospace;"><br /></span></div>
<div>
<span style="background-color: white; font-family: Courier New, Courier, monospace;">Call: bagging.data.frame(formula = Age ~ ., data = forage)</span></div>
<div style="font-family: 'Courier New', Courier, monospace; font-size: small;">
<span style="background-color: #f3f3f3;">plot(tt$Age~predict(ipbag1,tt))</span></div>
</div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$AGE <- tt$Age</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tt$AGE[is.na(tt$AGE)] <- predict(ipbag1,tt[is.na(tt$AGE),])</span></div>
</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhuzPlSwxa7XYqURd5l79gmw_N0M_TagbPXbAt4OCCt_J4xP_lX2r6GjNoErT6y1sazOV7-LSjdzD-LTHqhx4FxLL_EFo6NpQtd1_4F536KnBpBidNR-hK6c31Afxxsy7B4rIljvwGQkEU/s1600/bagage8aug15.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhuzPlSwxa7XYqURd5l79gmw_N0M_TagbPXbAt4OCCt_J4xP_lX2r6GjNoErT6y1sazOV7-LSjdzD-LTHqhx4FxLL_EFo6NpQtd1_4F536KnBpBidNR-hK6c31Afxxsy7B4rIljvwGQkEU/s1600/bagage8aug15.png" /></a></div>
<h3>
Selecting the survival model</h3>
<div>
ipred, the package in which bagging resides, comes with a nice general purpose cross validation utility. In the end, I decided the two parameters to be optimized are ns; the size of the bags and minsplit: the minimum number of observations that must exist in a node in order for a split to be attempted. Nbagg, the number of bootstrap evaluations, just needs to be big enough. Regarding nbagg, I have the feeling that this particular problem, with relatively few records, it may be needed to have relatively high nbagg in order to have reproducible models.</div>
<div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">di1 <- subset(titanic,select=c(</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> age,SibSp,Parch,Fare,Sex,Pclass,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Title,Embarked,A,B,C,D,E,F,ncabin,PC,STON,oe,AGE,Survived))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace; font-size: x-small;">dso <- expand.grid(ns=seq(100,300,25),nbagg=c(500),minsplit=1:6)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace; font-size: x-small;">la <- lapply(1:nrow(dso),function(ii) {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ee <- errorest(Survived ~ .,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ns=dso$ns[ii],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> control=rpart.control(minsplit=dso$minsplit[ii], cp=0, </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> xval=0,maxsurrogate=0),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> nbagg=dso$nbagg[ii],</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> model=bagging,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data=di1,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> est.para=control.errorest(k=20)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> )</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cc <- c(ns=dso$ns[ii],minsplit=dso$minsplit[ii],nbagg=dso$nbagg[ii],error=ee$error)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> print(cc)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cc</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> })</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace; font-size: x-small;">las <- do.call(rbind,la) </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">las <- as.data.frame(las)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace; font-size: x-small;">xyplot(error ~ ns, groups= minsplit, data=las,auto.key=TRUE,type='l')</span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhaxdbsE69wQyNGs8UsfMARrcsQHF8f58HTiJwWLq33XYLdswBAW_nEbLDGPXhxk2V35OfGzhSvPHVZ6PaYisVQMUJXJ_4aAhK4DRoJo_RO6DEhLqyi8gvGSYaYslwHPELwaS0-LG_Zztw/s1600/modsel8aug15.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhaxdbsE69wQyNGs8UsfMARrcsQHF8f58HTiJwWLq33XYLdswBAW_nEbLDGPXhxk2V35OfGzhSvPHVZ6PaYisVQMUJXJ_4aAhK4DRoJo_RO6DEhLqyi8gvGSYaYslwHPELwaS0-LG_Zztw/s1600/modsel8aug15.png" /></a></div>
<h4>
Predictions</h4>
<div>
Based on the plot I have chosen for ns=275 and minsplit=5. But, to be honest, in a previous run I had chosen ns=150 and minsplit=2. Obviously from this plot a silly choice. But, given the high variability in this plot between parameters which are relatively similar and the totally different result, I actually think there is relatively much noise in the validation. Thus what is actually seen is that there is relatively little difference between the settings.</div>
</div>
<div>
Having said that, these new settings got me just over 0.8 in the Kaggle score, while the previous settings were just below.</div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> bagmod <- bagging(Survived ~.,ns=275,nbagg=500,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> control=rpart.control(minsplit=5, cp=0, xval=0,maxsurrogate=0),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data=di1)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">pp <- predict(bagmod,test)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">out <- data.frame(</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PassengerId=test$PassengerId,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Survived=pp,row.names=NULL)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">write.csv(x=out,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> file='bag8aug.csv',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> row.names=FALSE,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> quote=FALSE)</span></div>
Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com2tag:blogger.com,1999:blog-3524617892004055830.post-13977746207670148562015-07-26T09:05:00.002+02:002015-07-26T09:05:55.990+02:00Predicting Titanic deaths on Kaggle II: gbmFollowing my <a href="http://wiekvoet.blogspot.nl/2015/07/predicting-titanic-deaths-on-kaggle.html">previous post</a> I have decided to try and use a different method: generalized boosted regression models (gbm). I have read the background in Elements of Statistical Learning and arthur charpentier's <a href="http://freakonometrics.hypotheses.org/19874">nice post</a> on it. This data is a nice occasion to get my hands dirty.<br />
<h4>
Data </h4>
Data as before. However, I have added some more variables. In addition, during the analysis it appeared that gbm does not like to have logical variables in the x-variables. One missing value of Fare in the test set gets the median value in order to avoid having missing values in the data. I must say I like using dplyr for this data handing. It allows me to use the same code for training and test with minimum effort.<br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(dplyr)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(gbm)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">set.seed(4321)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">titanic <- read.csv('train.csv') %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,Pclass=factor(Pclass),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Survived=factor(Survived),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> age=ifelse(is.na(Age),35,Age),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> age = cut(age,c(0,2,5,9,12,15,21,55,65,100)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Title=sapply(Name,function(x) strsplit(as.character(x),'[.,]')[[1]][2]),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Title=gsub(' ','',Title),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Title =ifelse(Title %in% c('Capt','Col','Don','Sir','Jonkheer','Major'),'Mr',Title),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Title =ifelse(Title %in% c('Lady','Ms','theCountess','Mlle','Mme','Ms','Dona'),'Miss',Title),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Title = factor(Title),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> A=factor(grepl('A',Cabin)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> B=factor(grepl('B',Cabin)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> C=factor(grepl('C',Cabin)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> D=factor(grepl('D',Cabin)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> E=factor(grepl('E',Cabin)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> F=factor(grepl('F',Cabin)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ncabin=nchar(as.character(Cabin)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PC=factor(grepl('PC',Ticket)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> STON=factor(grepl('STON',Ticket)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cn = as.numeric(gsub('[[:space:][:alpha:]]','',Cabin)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> oe=factor(ifelse(!is.na(cn),cn%%2,-1)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> train = sample(c(TRUE,FALSE),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> size=891,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> replace=TRUE, </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> prob=c(.9,.1) ) )</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">test <- read.csv('test.csv') %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Embarked=factor(Embarked,levels=levels(titanic$Embarked)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Pclass=factor(Pclass),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> # Survived=factor(Survived),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> age=ifelse(is.na(Age),35,Age),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> age = cut(age,c(0,2,5,9,12,15,21,55,65,100)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Title=sapply(Name,function(x) strsplit(as.character(x),'[.,]')[[1]][2]),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Title=gsub(' ','',Title),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Title =ifelse(Title %in% c('Capt','Col','Don','Sir','Jonkheer','Major'),'Mr',Title),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Title =ifelse(Title %in% c('Lady','Ms','theCountess','Mlle','Mme','Ms','Dona'),'Miss',Title),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Title = factor(Title),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> A=factor(grepl('A',Cabin)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> B=factor(grepl('B',Cabin)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> C=factor(grepl('C',Cabin)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> D=factor(grepl('D',Cabin)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> E=factor(grepl('E',Cabin)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> F=factor(grepl('F',Cabin)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ncabin=nchar(as.character(Cabin)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PC=factor(grepl('PC',Ticket)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> STON=factor(grepl('STON',Ticket)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cn = as.numeric(gsub('[[:space:][:alpha:]]','',Cabin)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> oe=factor(ifelse(!is.na(cn),cn%%2,-1))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">test$Fare[is.na(test$Fare)]<- median(titanic$Fare)</span><br />
<h3>
Age</h3>
<div>
Age has missing values, thus the first task is to fill those. Since gbm is the method used for the main analysis, I will be used it for age too. This has the added advantage that I can exercise with both a numerical and a categorical variable as response.</div>
<div>
One of the things with boosting is that it opens itself to over fitting. Boosting consists of adding trees which are structured to improve fit. At some point the trees will just start boost the noise rather than the structure. Gbm comes with a cross validation (cv) option, which is the preferred way to get the predictive qualities of models, and cv is used to determine the optimum number of trees. But, there is catch, it throws an error if there are variables in the data.frame which are not used in the model. Hence in the code below first the data is selected, and subsequently the model run.<br />
The model parameters, interaction.depth=4, shrinkage=0.0005 come from a bit of experimenting. n.trees has to be high enough that it is clear the optimum number of trees is lower than the number estimated. It seems n.cores=2 works under both windows and linux.</div>
<div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">forage <- filter(titanic,!is.na(titanic$Age)) %>%</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> select(.,Age,SibSp,Parch,Fare,Sex,Pclass,Title,Embarked,A,B,C,D,E,F,ncabin,PC,STON,oe)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">rfa1 <- gbm(Age ~ ., </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data=forage,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> interaction.depth=4,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cv.folds=10,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> n.trees=8000,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> shrinkage=0.0005,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> n.cores=2)</span></div>
</div>
<div>
<span style="font-family: Courier New, Courier, monospace;"></span><br />
<div style="font-size: small;">
<span style="font-family: Courier New, Courier, monospace;"><span style="background-color: #f3f3f3;">gbm.perf(rfa1)</span></span></div>
<span style="font-family: Courier New, Courier, monospace;">
<div>
<div>
Using cv method...</div>
<div>
[1] 6824</div>
</div>
<div class="separator" style="clear: both; font-size: small; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhkyuPrFF_VKZiQ3wrrxyZMzb5ZTEaDe2pMxC_75_GjKRv6BywsBFmrk_rEJi4pLqxPEegwh1JEX02uA2devS9RsELF8ZAXnX1dNkXZK7aQAqqY9laF_-_SwC9CQyPg7cbjaSxLBL1fJyM/s1600/gbmage.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhkyuPrFF_VKZiQ3wrrxyZMzb5ZTEaDe2pMxC_75_GjKRv6BywsBFmrk_rEJi4pLqxPEegwh1JEX02uA2devS9RsELF8ZAXnX1dNkXZK7aQAqqY9laF_-_SwC9CQyPg7cbjaSxLBL1fJyM/s1600/gbmage.png" /></a></div>
</span>It is time here to confess that I have been working on this over several sessions. It appears that when I created the code, 7118 trees were optimum, while I stored that data for a session with 6824 trees. Thus is the way of these methods, unlike traditional statistical methods, they have a different result any time. But, as can be seen from the plot, the difference should be minimal.</div>
<div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">titanic$AGE<- titanic$Age</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">titanic$AGE[is.na(titanic$AGE)] <- predict(rfa1,titanic,n.trees=7118)[is.na(titanic$Age)]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">test$AGE<- test$Age</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">test$AGE[is.na(test$AGE)] <- predict(rfa1,test,n.trees=7118)[is.na(test$Age)]</span></div>
</div>
<h3>
Survival</h3>
<div>
During the calculations I learned that the response should be a float containing 0 and 1. With two categories there are various distributions to be used: bernoulli, huberized and adaboost. Using the 10% test data I had set apart, it seemed adaboost worked best for these data. </div>
<div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">gb1 <- filter(titanic,train) %>%</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> select(.,age,SibSp,Parch,Fare,Sex,Pclass,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Title,Embarked,A,B,C,D,E,F,ncabin,PC,STON,oe,AGE,Survived)%>%</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(Survived=c(0,1)[Survived]) # not integer or factor but float</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#table(gb1$Survived)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">gb1m <- gbm(Survived ~ .,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cv.folds=11,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> n.cores=2,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> interaction.depth=5,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> shrinkage = 0.0005,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> distribution='adaboost',</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data=gb1,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> n.trees=10000)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">gbm.perf(gb1m)</span></div>
</div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace;">Using cv method...</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">[1] 6355</span></div>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjz93yX7aLD5Va727f1459aNMiEpgm68ibtkUpdbCZoJCxzAmLmchj8FfDXeqPnHuK22wBp8q31f0sKeb-8wOvoayC5u8mosiklMYPZuBbPBE7sI0ElN7sn6d4gW0CWjERn1N1q1ARSTo0/s1600/gb1m.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjz93yX7aLD5Va727f1459aNMiEpgm68ibtkUpdbCZoJCxzAmLmchj8FfDXeqPnHuK22wBp8q31f0sKeb-8wOvoayC5u8mosiklMYPZuBbPBE7sI0ElN7sn6d4gW0CWjERn1N1q1ARSTo0/s1600/gb1m.png" /></a></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<div>
In my code I have used 6000 trees. </div>
<div>
One thing about gbm is that it does not respond with categories. It is a proportion answers for either category.</div>
<div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">preds <- predict(gb1m,titanic,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> n.trees=6000,type='response')</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">density(preds) %>% plot</span></div>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg15VXinOzSGkTz4GOFrZCTV7ztVpBcnTKUQsLsyRHPk-hJW_LDtXVzPCYFtYtLaNEMTdDs3EdrOLIjTrvCd71_xyHEkUGM9_tg7lw_6U5mw9CLrc-w6l4-tUpB8b3KKMzyxLKad76vut4/s1600/gbmdesn.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg15VXinOzSGkTz4GOFrZCTV7ztVpBcnTKUQsLsyRHPk-hJW_LDtXVzPCYFtYtLaNEMTdDs3EdrOLIjTrvCd71_xyHEkUGM9_tg7lw_6U5mw9CLrc-w6l4-tUpB8b3KKMzyxLKad76vut4/s1600/gbmdesn.png" /></a></div>
<div>
Thus there is a need or opportunity to determine the cut off point. For this my test set comes in somewhat handy.</div>
<div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">preds2<- preds[!titanic$train]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">target <- c(0,1)[titanic$Survived[!titanic$train]]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">sapply(seq(.3,.7,.01),function(step)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> c(step,sum(ifelse(preds2<step,0,1)!=target)))</span></div>
</div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1,] 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 0.41</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[2,] 17.0 17.00 17.00 17.00 17.00 17.00 17.00 17.00 18.00 17.00 16.0 16.00</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1,] 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52 0.53</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[2,] 16.00 16.00 15.00 14.00 14.00 14.00 13.00 15.00 15.0 15.00 16.00 16.00</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1,] 0.54 0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62 0.63 0.64 0.65</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[2,] 16.00 17.00 17.00 17.00 17.00 18.00 18.0 18.00 18.00 19.00 20.00 20.00</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> [,37] [,38] [,39] [,40] [,41]</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[1,] 0.66 0.67 0.68 0.69 0.7</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[2,] 20.00 21.00 21.00 21.00 21.0</span></div>
</div>
It is a bit messy output, but at 0.48 the least errors are found.<br />
<h3>
Predictions</h3>
<div>
This is fairly straightforward. I am not unhappy to report an improvement, bringing me from tail to middle region of the peloton.</div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">pp <- predict(gb1m,test,n.trees=6000,type='response')</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">pp <- ifelse(pp<0.48,0,1)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">out <- data.frame(</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PassengerId=test$PassengerId,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Survived=pp,row.names=NULL)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">write.csv(x=out,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> file='gbm.csv',</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> row.names=FALSE,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> quote=FALSE)</span></div>
<div>
</div>
Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com0tag:blogger.com,1999:blog-3524617892004055830.post-71322830953743289172015-07-19T11:13:00.000+02:002015-07-19T11:13:11.906+02:00Predicting Titanic deaths on Kaggle <a href="http://www.kaggle.com/">Kaggle </a>has a competition to predict who will die on the famous Titanic '<b>Machine Learning from Disaster''</b>. It is placed as knowledge competition. Just up there to learn. I am late to the party, it has been been for 1 1/2 year, to end by end 2015. It is a small data set, hence interesting to learn from. It is also a competition with a number of entries which have perfect predictions.<br />
Just for fun, I have been trying to see what I would achieve with simple attempt with randomforest. For those in the competition, this randomforest got me 0.74163, placing me 2781 out of 3064 entries. An acceptable spot, since this is using off the shelf approach. An improvement may follow in a subsequent post.<br />
<h4>
Data</h4>
Data downloaded from Kaggle. It is real world data, hence has the odd missing (in passenger age) and a number of columns with messy data, which might be employed to create additional variables. For the purpose of validation about 90% of the data gets flagged to be training set. test will be the test, set, results of which to be passed back to Kaggle.<br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PassengerId Survived Pclass Name</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">1 1 0 3 Braund, Mr. Owen Harris</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">2 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">3 3 1 3 Heikkinen, Miss. Laina</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">4 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">5 5 0 3 Allen, Mr. William Henry</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">6 6 0 3 Moran, Mr. James</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Sex Age SibSp Parch Ticket Fare Cabin Embarked</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">1 male 22 1 0 A/5 21171 7.2500 S</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">2 female 38 1 0 PC 17599 71.2833 C85 C</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">3 female 26 0 0 STON/O2. 3101282 7.9250 S</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">4 female 35 1 0 113803 53.1000 C123 S</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">5 male 35 0 0 373450 8.0500 S</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">6 male NA 0 0 330877 8.4583 Q</span><br />
<div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(dplyr)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(randomForest)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(lattice)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">options(width=85)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">head(read.csv('train.csv'))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">titanic <- read.csv('train.csv') %>%</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,Pclass=factor(Pclass),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Survived=factor(Survived),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> age=ifelse(is.na(Age),35,Age),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> age = cut(age,c(0,2,5,9,12,15,21,55,65,100)),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> A=grepl('A',Cabin),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> B=grepl('B',Cabin),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> C=grepl('C',Cabin),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> D=grepl('D',Cabin),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cn = as.numeric(gsub('[[:space:][:alpha:]]','',Cabin)),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> oe=factor(ifelse(!is.na(cn),cn%%2,-1)),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> train = sample(c(TRUE,FALSE),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> size=891,</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> replace=TRUE, </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> prob=c(.9,.1) ) )</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">test <- read.csv('test.csv') %>%</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,Pclass=factor(Pclass),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> age=ifelse(is.na(Age),35,Age),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> age = cut(age,c(0,2,5,9,12,15,21,55,65,100)),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> A=grepl('A',Cabin),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> B=grepl('B',Cabin),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> C=grepl('C',Cabin),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> D=grepl('D',Cabin),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> cn = as.numeric(gsub('[[:space:][:alpha:]]','',Cabin)),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> oe=factor(ifelse(!is.na(cn),cn%%2,-1)),</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Embarked=factor(Embarked,levels=levels(titanic$Embarked))</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> )</span></div>
</div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">test$Fare[is.na(test$Fare)]<- median(titanic$Fare)</span></div>
</div>
<div>
Age has missing values, hence the first step is to fill those in. In the code above, an age factor has been made, where missings are imputed the largest category. </div>
<h4>
Model building</h4>
<div>
A simple prediction using randomForest.</div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span style="background-color: #f3f3f3;">rf1 <- randomForest(Survived ~ </span></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span style="background-color: #f3f3f3;"> Sex+Pclass + SibSp +</span></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span style="background-color: #f3f3f3;"> Parch + Fare + </span></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span style="background-color: #f3f3f3;"> Embarked + age +</span></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span style="background-color: #f3f3f3;"> A+B+C+D +oe,</span></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span style="background-color: #f3f3f3;"> data=titanic,</span></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span style="background-color: #f3f3f3;"> subset=train,</span></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span style="background-color: #f3f3f3;"> replace=FALSE,</span></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span style="background-color: #f3f3f3;"> ntree=1000)</span></span></div>
</div>
<div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Call:</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> randomForest(formula = Survived ~ Sex + Pclass + SibSp + Parch + Fare + Embarked + age + A + B + C + D + oe, data = titanic, replace = FALSE, ntree = 1000, subset = train) </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Type of random forest: classification</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Number of trees: 1000</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">No. of variables tried at each split: 3</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> OOB estimate of error rate: 16.94%</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Confusion matrix:</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 0 1 class.error</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">0 454 40 0.08097166</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">1 95 208 0.31353135</span></div>
</div>
</div>
<div>
<br /></div>
This shows some bias in the predictions. Class one gets predicted as class 0 far too often. Hence I will optimize not only the normal variables nodesize (Minimum size of terminal nodes) and mtry (Number of variables randomly sampled as candidates at each split) but also classwt (Priors of the classes).<br />
<br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">titanic$pred <- predict(rf1,titanic)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">with(titanic[!titanic$train,],sum(pred!=Survived)/length(pred))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">mygrid <- expand.grid(nodesize=c(2,4,6),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mtry=2:5,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> wt=seq(.5,.7,.05))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">sa <- sapply(1:nrow(mygrid), function(i) {</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> rfx <- randomForest(Survived ~ </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Sex+Pclass + SibSp +</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Parch + Fare + </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Embarked + age +</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> A+B+C+D +oe,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data=titanic,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> subset=train,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> replace=TRUE,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ntree=4000,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> nodesize=mygrid$nodesize[i],</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mtry=mygrid$mtry[i],</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> classwt=c(1-mygrid$wt[i],mygrid$wt[i])) </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> preds <- predict(rfx,titanic[!titanic$train,])</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> nwrong <- sum(preds!=titanic$Survived[!titanic$train])</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> c(nodesize=mygrid$nodesize[i],mtry=mygrid$mtry[i],wt=mygrid$wt[i],pw=nwrong/length(preds))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> })</span><br />
<span style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace; font-size: x-small;">tsa <- as.data.frame(t(sa))</span><br />
<span style="background-color: #f3f3f3; font-family: 'Courier New', Courier, monospace; font-size: x-small;">xyplot(pw ~ wt | mtry,group=factor(nodesize), data=tsa,auto.key=TRUE,type='l')</span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjyWrnmcPyAVUbUW7Oo6DwZPns1_rK1HlRu_DkuQRWeutauaBs3tX2S1y1wyZbdIquzsbJs1y8k0OylGOAkbpyjDq7HRUXPZkQoUZl_fuUqaRvXs1kf-kfWjhVrbW_urMr3M7DSGs-jCI/s1600/optim1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjyWrnmcPyAVUbUW7Oo6DwZPns1_rK1HlRu_DkuQRWeutauaBs3tX2S1y1wyZbdIquzsbJs1y8k0OylGOAkbpyjDq7HRUXPZkQoUZl_fuUqaRvXs1kf-kfWjhVrbW_urMr3M7DSGs-jCI/s1600/optim1.png" /></a></div>
What is less visible from this plot is the amount of variation I had in the results. One prediction better or worse really makes a difference in the figure. This is the reason I have increased the number of trees in the model.<br />
<h4>
Final predictions</h4>
<div>
Using the best settings from above, gets you to the bottom of the ranking. The script makes the model, writes predictions in the format required by kaggle.</div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">rf2 <- randomForest(Survived ~ </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Sex+Pclass + SibSp +</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Parch + Fare + </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Embarked + age +</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> A+B+C+D +oe,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> data=titanic,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> replace=TRUE,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ntree=5000,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> nodesize=4,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mtry=3,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> classwt=c(1-.6,.6)) </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">pp <- predict(rf2,test)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">out <- data.frame(</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PassengerId=test$PassengerId,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Survived=pp,row.names=NULL)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">write.csv(x=out,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> file='rf1.csv',</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> row.names=FALSE,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> quote=FALSE)</span><br />
<br />Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com4tag:blogger.com,1999:blog-3524617892004055830.post-44135784138818630962015-07-05T09:26:00.001+02:002015-07-05T09:26:31.666+02:00More on causes of death in Netherlands over the yearsLast week I had a post '<a href="http://wiekvoet.blogspot.nl/2015/06/deaths-in-netherlands-by-cause-and-age.html">Deaths in the Netherlands by cause and age</a>'. During creation of that post I made one plot which I had not shown. It shows something odd. There is a vertical striping. Hence mortality varies by year across age.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgAdS9U0RDqzyDuEls01b1YW86fw94m6PhdAeb9VSnxYnRqSZnoZ1OtQaN3NJzO8mII1RQjb7GH-E3XFNx7LqpoYdJJRoIOgQie6HfRH5hsd9231yIFb5LFtweiveawVn1sqD3TaX_82fo/s1600/deathall.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgAdS9U0RDqzyDuEls01b1YW86fw94m6PhdAeb9VSnxYnRqSZnoZ1OtQaN3NJzO8mII1RQjb7GH-E3XFNx7LqpoYdJJRoIOgQie6HfRH5hsd9231yIFb5LFtweiveawVn1sqD3TaX_82fo/s1600/deathall.png" /></a></div>
To examine this phenomenon further here is a plot of some underlying causes. I would say the striping is present for at least three categories; "Diseases of the circulatory system", "Diseases of the respiratory organs" and "Sympt., Abnormal clinical Observations". This is odd, since these do not seem to be contagious. I suspect therefore that something like harsh weather (heat or cold) makes life more difficult, but does not get to be the final cause in the administration.<br />
In addition there is something which I did not realize before regarding "Mental and behavioral disorders". They are age related. But it also seems that somewhere in the nineties of last century they became acceptable to register. And suddenly they are present, across several age categories.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLS3DiyRQOjJZfp7jnXMrMLA_5oBxckMJT5ZBq5EO_MwC-JnYaB_0rTFuP99Fkp7jAU1lqhdyDg6Aqqubfp_vnPwURuuSlOR8S5qm4UT4sF5dQN5ptND5ioSkQfMNTpWJEq8ywM_b9pfo/s1600/bycause.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLS3DiyRQOjJZfp7jnXMrMLA_5oBxckMJT5ZBq5EO_MwC-JnYaB_0rTFuP99Fkp7jAU1lqhdyDg6Aqqubfp_vnPwURuuSlOR8S5qm4UT4sF5dQN5ptND5ioSkQfMNTpWJEq8ywM_b9pfo/s1600/bycause.png" /></a></div>
This plot, same data, differently organized, shows that the years with these causes are similar, especially "Diseases of the circulatory system" and "Diseases of the respiratory organs"<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjW0Uc31hHtE6Q273vphCjJTxsIKfsh5uKThSf0YeIftHgSkecq6TgcKh5jlND2Y3JV4W39tp5VV4ZeE-4jrTC1yPe6KyAq8CsWEB83rKQsj96e2I-7MJ5i21MbQowiV2SiBR95PEFSfmc/s1600/byage.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjW0Uc31hHtE6Q273vphCjJTxsIKfsh5uKThSf0YeIftHgSkecq6TgcKh5jlND2Y3JV4W39tp5VV4ZeE-4jrTC1yPe6KyAq8CsWEB83rKQsj96e2I-7MJ5i21MbQowiV2SiBR95PEFSfmc/s1600/byage.png" /></a></div>
<h3>
Can it statistically be seen?</h3>
It is very nice that I can see that, but how about measuring it? Hence for age 90 to 95, after detrending, correlation between the two most visually correlated causes of death.<br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Pearson's product-moment correlation</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">data: xx$`Diseases of the respiratory organs` and xx$`Diseases of the circulatory system`</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">t = 2.4997, df = 62, p-value = 0.01509</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">alternative hypothesis: true correlation is not equal to 0</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">95 percent confidence interval:</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 0.06133681 0.51042863</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">sample estimates:</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> cor </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">0.302584 </span><br />
<h3>
Code</h3>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(dplyr)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(ggplot2)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">txtlines <- readLines('Overledenen__doodsoo_170615161506.csv')</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">txtlines <- grep('Centraal',txtlines,value=TRUE,invert=TRUE) </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#txtlines[1:5]</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#cat(txtlines[4])</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r1 <- read.csv(sep=';',header=FALSE,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> col.names=c('Causes','Causes2','Age','year','aantal','count'),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> na.strings='-',text=txtlines[3:length(txtlines)]) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> select(.,-aantal,-Causes2)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">transcauses <- c(</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Infectious and parasitic diseases",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Diseases of skin and subcutaneous",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Diseases musculoskeletal system and connective ",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Diseases of the genitourinary system",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Pregnancy, childbirth",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Conditions of perinatal period",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Congenital abnormalities",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Sympt., Abnormal clinical Observations",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "External causes of death",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Neoplasms",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Illness of blood, blood-forming organs",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Endocrine, nutritional, metabolic illness",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Mental and behavioral disorders",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Diseases of the nervous system and sense organs",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Diseases of the circulatory system",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Diseases of the respiratory organs",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Diseases of the digestive organs",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Population",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Total all causes of death")</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#cc <- </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">cbind(transcauses,levels(r1$Causes))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">options(width=100)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">levels(r1$Causes) <- transcauses</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">levels(r1$Age) <- </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> gsub('jaar','year',levels(r1$Age)) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> gsub('tot','to',.) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> gsub('of ouder','+',.)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r1 <- mutate(r1,age=as.numeric(sub(' .*$','',Age)))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">Deaths <- filter(r1,Causes=='Total all causes of death') %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,Total=count) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> select(.,-count,-Causes) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> merge(.,r1) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> filter(.,Causes %in% transcauses[18]) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,Population=count,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Percentage=100*Total/Population,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> year = as.numeric(gsub('*','',year,fixed=TRUE))) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> select(.,-Causes,-count)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">png('deathall.png')</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">v <- ggplot(Deaths[Deaths$age>60,], aes( year,Age, fill = Percentage))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">v + geom_raster() +</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> scale_fill_gradientn (</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> colours=c('white','black'))+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> theme(legend.position="bottom")+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ggtitle('Total all causes of death')</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">dev.off()</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">v3 <- filter(r1,Causes %in% transcauses[18],</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> age>65) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,Population=count) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> select(.,-count,-Causes) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> merge(.,r1) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> filter(.,Causes %in% transcauses[c(8,15,9,10,13,16)]) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,Total=count,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Percentage=100*Total/Population,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> year = as.numeric(gsub('*','',year,fixed=TRUE))) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> select(.,-count)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">png('bycause.png')</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ggplot(v3, aes( year,Age, fill = Percentage))+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> geom_raster() +</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> scale_fill_gradientn (</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> colours=c('white','black'))+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> theme(legend.position="bottom")+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> facet_wrap( ~ Causes,nrow=3)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">dev.off()</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">png('byage.png')</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">ggplot(v3[v3$age>75,], aes( year,Causes, fill = Percentage))+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> geom_raster() +</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> scale_fill_gradientn (</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> colours=c('white','black'))+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> theme(legend.position="bottom")+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> facet_wrap( ~ Age,nrow=3)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">dev.off()</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">xx <- filter(r1,Causes %in% transcauses[18],</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> age==90) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,Population=count) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> select(.,-count,-Causes) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> merge(.,r1) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> filter(.,Causes %in% transcauses[c(8,15,9,16)]) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,Total=count,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Percentage=100*Total/Population,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> year = as.numeric(gsub('*','',year,fixed=TRUE)),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Causes=factor(Causes)) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> select(.,-count,-Age,-age,-Population,-Total) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> reshape(.,direction='wide',timevar='Causes',idvar='year')</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">names(xx) <- gsub('Percentage.','',names(xx))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">for (i in 2:ncol(xx)) xx[,i]<- xx[,i] - predict(loess(xx[,i] ~ year,data=xx))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">cor.test(xx$`Diseases of the respiratory organs`,</span><br />
<span style="background-color: #f3f3f3;"><span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> xx$`Diseases of the circulatory system`)</span></span>Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com1tag:blogger.com,1999:blog-3524617892004055830.post-48457843804402958702015-06-28T09:58:00.001+02:002015-06-28T09:58:57.550+02:00Deaths in the Netherlands by cause and ageI downloaded counts of deaths by age, year and mayor cause from the <a href="http://statline.cbs.nl/statweb/?LA=en">Dutch statistics site</a>. In this post I do some plots to look at causes and changes between the years.<br />
<h3>
Data </h3>
Data from CBS. I downloaded the data in Dutch, hence the first thing to do was provide some kind of translation. The coding used seems slightly different from <a href="http://apps.who.int/classifications/icd10/browse/2015/en#/">IDC-10</a> main categories (and has been alphabetically disordered compared to that). I used Google translate and IDC-10 to obtain the translations<br />
<h3>
Plots</h3>
<h4>
Preparation</h4>
In the following I will be using both percentage of population and percentage of deaths by age cohort. The need for the percentage of deaths is because in some cohorts the percentages of deaths are much higher, thereby hiding anything happening in other cohorts. In addition I should mention that for visual purposes only the most important eight causes are used in the plots<br />
<div>
<h4>
Young</h4>
<div>
It seems that most of risks are associated with birth. In addition, these risks have steadily been decreasing. </div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIrZoyvXyGdhHqf4G5dltkEwdzpYWL8Bh6SRaiy__a_qj8KANfFE_ABq-xqcho0AwUkL8QOCCIaHOTLJIWeT2oqnwWT-Awc5HtNcYYw3JA2sSHIdIX7lzFTI4hYCCZuZkB3HrMNZL2tmw/s1600/youngpop1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIrZoyvXyGdhHqf4G5dltkEwdzpYWL8Bh6SRaiy__a_qj8KANfFE_ABq-xqcho0AwUkL8QOCCIaHOTLJIWeT2oqnwWT-Awc5HtNcYYw3JA2sSHIdIX7lzFTI4hYCCZuZkB3HrMNZL2tmw/s1600/youngpop1.png" /></a></div>
<div>
Looking at the age cohorts above 0 years, it seems accidents are most important. Most remarkable is a spike at 1953, which occurs for all four ages. After some consideration, I link this to the <a href="https://en.wikipedia.org/wiki/North_Sea_flood_of_1953">North Sea flood of 1953</a>. It is remarkable that this is visible in the plot. It says a lot about how safe we are from accidents that it does. In the age category 15 to 20 there is also a relatively large bump during the 1970 to 1975. This is more difficult to explain, but I suspect traffic, especially the <a href="https://en.wikipedia.org/wiki/Moped">moped</a>. A light motorcycle which preferably would be boosted to run much faster than the legal speed. 1975 saw the requirement to wear a helmet. It was much hated at the time, but in hindsight I can see that government felt compelled to do something and that it did have effect.</div>
<div>
Looking at the plots, it seems the next big cause are Neoplasms. This is not because these become more deadly, it is because accidents are getting under control.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8a03OUkDMPKRBtKCUkhDlXRBCeS1nTJDZxg0bmrz42_QRoqEZW8eV8pbqRGInd6XSgKoIWfPJRQYZOOeu5I7hn1At1usJXy9dPAyXoL9yBFnT8Qk0P4fMr9CCGcbyEHK3CkA55nHHuAA/s1600/youngpop2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8a03OUkDMPKRBtKCUkhDlXRBCeS1nTJDZxg0bmrz42_QRoqEZW8eV8pbqRGInd6XSgKoIWfPJRQYZOOeu5I7hn1At1usJXy9dPAyXoL9yBFnT8Qk0P4fMr9CCGcbyEHK3CkA55nHHuAA/s1600/youngpop2.png" /></a></div>
<h4>
Elderly</h4>
<div>
For the elderly, diseases of the circulatory system are the main cause and decreasing quite a bit. The number of Symptoms and Abnormal Clinical Observations seems to decrease too. Since this seems to be a nice name for the 'other' category, this may be better diagnostics.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgm4-WfFO12ATboA_7u7X7WDopo1NR_zTuIfY7epBv9IGUBfa3678CIbAnf9bK1wSVQ0F10ka0r9JoMjduDNudhaiMCyDmGhvqa4R_e4_VPojF9QY9XB0OT6MGR_RDlZ1_R4hde2NZKOBE/s1600/oldpop.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgm4-WfFO12ATboA_7u7X7WDopo1NR_zTuIfY7epBv9IGUBfa3678CIbAnf9bK1wSVQ0F10ka0r9JoMjduDNudhaiMCyDmGhvqa4R_e4_VPojF9QY9XB0OT6MGR_RDlZ1_R4hde2NZKOBE/s1600/oldpop.png" /></a></div>
<div>
What is less visible is the increase in mental and behavioral disorders, especially after 1980 and at oldest age. It also seems that Neoplasms are getting lower very slowly.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzKPpsQcn6zL01duSDKMdEBMWRFqtawwmZWOhw85Nj9-J8fYjRTb942fnQkJDd0OgpbSOGKzOFSJglpnHK5ICb2h68mmKWUA2Awx3VLWLekziywM6IKHlDoBCtDECKeg75KyL6qnAk-XQ/s1600/oldpop2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzKPpsQcn6zL01duSDKMdEBMWRFqtawwmZWOhw85Nj9-J8fYjRTb942fnQkJDd0OgpbSOGKzOFSJglpnHK5ICb2h68mmKWUA2Awx3VLWLekziywM6IKHlDoBCtDECKeg75KyL6qnAk-XQ/s1600/oldpop2.png" /></a></div>
<h3>
Code</h3>
<h4>
data reading</h4>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(dplyr)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">library(ggplot2)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">txtlines <- readLines('Overledenen__doodsoo_170615161506.csv')</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">txtlines <- grep('Centraal',txtlines,value=TRUE,invert=TRUE) </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#txtlines[1:5]</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#cat(txtlines[4])</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">r1 <- read.csv(sep=';',header=FALSE,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> col.names=c('Causes','Causes2','Age','year','aantal','count'),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> na.strings='-',text=txtlines[3:length(txtlines)]) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> select(.,-aantal,-Causes2)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">transcauses <- c(</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Infectious and parasitic diseases",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Diseases of skin and subcutaneous",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Diseases musculoskeletal system and connective ",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Diseases of the genitourinary system",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Pregnancy, childbirth",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Conditions of perinatal period",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Congenital abnormalities",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Sympt., Abnormal clinical Observations",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "External causes of death",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Neoplasms",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Illness of blood, blood-forming organs",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Endocrine, nutritional, metabolic illness",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Mental and behavioral disorders",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Diseases of the nervous system and sense organs",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Diseases of the circulatory system",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Diseases of the respiratory organs",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Diseases of the digestive organs",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Population",</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> "Total all causes of death")</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#cc <- cbind(transcauses,levels(r1$Causes))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">#options(width=100)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">levels(r1$Causes) <- transcauses</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">levels(r1$Age) <- </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> gsub('jaar','year',levels(r1$Age)) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> gsub('tot','to',.) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> gsub('of ouder','+',.) </span><br />
<h4>
Preparation for plots</h4>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">perc.of.death <- filter(r1,Causes=='Total all causes of death') %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,Population=count) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> select(.,-count,-Causes) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> merge(.,r1) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> filter(.,Causes %in% transcauses[1:17]) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,Percentage=100*count/Population,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Causes = factor(Causes),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> year = as.numeric(gsub('*','',year,fixed=TRUE))</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> )</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">perc.of.pop <- filter(r1,Causes=='Population') %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,Population=count) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> select(.,-count,-Causes) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> merge(.,r1) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> filter(.,Causes %in% transcauses[1:17]) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,Percentage=100*count/Population,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Causes = factor(Causes),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> year = as.numeric(gsub('*','',year,fixed=TRUE))</span><br />
<span style="background-color: #f3f3f3;"><span style="font-family: Courier New, Courier, monospace; font-size: x-small;"></span></span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span style="background-color: #f3f3f3;"> )</span></span><br />
<h4>
young</h4>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">png('youngpop1.png')</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tmp1 <- perc.of.pop %>% filter(.,Age %in% levels(perc.of.pop$Age)[c(1,2,11,3)],</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> !is.na(Percentage)) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,Age=factor(Age,levels=levels(perc.of.pop$Age)[c(1,2,11,3)]),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Causes =factor(Causes)) </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># select 'important' causes (which somewhen got over 15%)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">group_by(tmp1,Causes)%>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> summarize(.,mp = max(Percentage)) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,rk=rank(-mp)) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> merge(.,tmp1) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> filter(.,rk<=8) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ggplot(.,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> aes(y=Percentage,x=year,col=Causes)) +</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> geom_line()+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> guides(col=guide_legend(ncol=2)) + </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> facet_wrap( ~Age ) +</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> theme(legend.position="bottom")+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ylab('Percentage of Cohort')</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">dev.off()</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">###</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">png('youngpop2.png')</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">tmp1 <- perc.of.pop %>% filter(.,Age %in% levels(perc.of.pop$Age)[c(2,11,3,4)],</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> !is.na(Percentage)) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,Age=factor(Age,levels=levels(perc.of.pop$Age)[c(2,11,3,4)]),</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Causes =factor(Causes)) </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># select 'important' causes (which somewhen got over 15%)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">group_by(tmp1,Causes)%>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> summarize(.,mp = max(Percentage)) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,rk=rank(-mp)) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> merge(.,tmp1) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> filter(.,rk<=8) %>%</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ggplot(.,</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> aes(y=Percentage,x=year,col=Causes)) +</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> geom_line()+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> guides(col=guide_legend(ncol=2)) + </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> facet_wrap( ~Age ) +</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> theme(legend.position="bottom")+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> ylab('Percentage of Cohort')</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"># https://en.wikipedia.org/wiki/North_Sea_flood_of_1953</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">dev.off()</span><br />
<h4>
old</h4>
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;">png('oldpop.png')</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;">tmp2 <- perc.of.pop %>% filter(.,Age %in% levels(perc.of.pop$Age)[18:21],</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> !is.na(Percentage)) %>%</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,Age=factor(Age),</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> Causes =factor(Causes)) </span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;">group_by(tmp2,Causes)%>%</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> summarize(.,mp = max(Percentage)) %>%</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,rk=rank(-mp)) %>%</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> merge(.,tmp2) %>%</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> filter(.,rk<=8) %>%</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> ggplot(.,</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> aes(y=Percentage,x=year,col=Causes)) +</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> geom_line()+</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> guides(col=guide_legend(ncol=2)) + </span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> facet_wrap( ~Age ) +</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> theme(legend.position="bottom")+</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> ylab('Percentage of Cohort')</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;">dev.off()</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"># rj.GD </span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"># 2 </span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;">png('oldpop2.png')</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;">tmp2 <- perc.of.pop %>% </span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> filter(.,</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> Age %in% levels(perc.of.death$Age)[18:21],</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> year>=1980,</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> !is.na(Percentage)) %>%</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,Age=factor(Age),</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> Causes =factor(Causes)) </span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;">group_by(tmp2,Causes)%>%</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> summarize(.,mp = max(Percentage)) %>%</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,rk=rank(-mp)) %>%</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> merge(.,tmp2) %>%</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> filter(.,rk<=8) %>%</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> ggplot(.,</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> aes(y=Percentage,x=year,col=Causes)) +</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> geom_line()+</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> guides(col=guide_legend(ncol=2)) + </span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> facet_wrap( ~Age ) +</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> theme(legend.position="bottom")+</span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;"> ylab('Percentage of Cohort')</span><br />
<span style="background-color: white;"><span style="font-family: Courier New, Courier, monospace; font-size: x-small;"></span></span><br />
<span style="background-color: white; font-family: Courier New, Courier, monospace; font-size: x-small;">dev.off()</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><span style="background-color: #f3f3f3;"></span> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
</div>
Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com1tag:blogger.com,1999:blog-3524617892004055830.post-18855873739698295802015-06-21T10:14:00.000+02:002015-06-21T10:14:54.264+02:00SAS PROC MCMC example 12 in R: Change point model I restarted at working my way through the PROC MCMC examples. The SAS manual describes this example: <span style="background-color: white; color: #353535; font-family: lato, arial, 'Arial Unicode MS', geneva, 'Lucida Grande', sans-serif; font-size: 13.4399995803833px; line-height: 16.7999992370605px;">Consider the data set from Bacon and Watts (</span><a href="http://support.sas.com/documentation/cdl/en/statug/67523/HTML/default/statug_mcmc_references.htm#statug_mcmcbaco_d71" style="background-color: white; border: 0px; color: #0e66ba; font-family: lato, arial, 'Arial Unicode MS', geneva, 'Lucida Grande', sans-serif; font-size: 13.4399995803833px; line-height: 16.7999992370605px; margin: 0px; padding: 0px;">1971</a><span style="background-color: white; color: #353535; font-family: lato, arial, 'Arial Unicode MS', geneva, 'Lucida Grande', sans-serif; font-size: 13.4399995803833px; line-height: 16.7999992370605px;">), where </span><span style="background-color: white; color: #353535; font-family: lato, arial, 'Arial Unicode MS', geneva, 'Lucida Grande', sans-serif; font-size: 13.4399995803833px; line-height: 16.7999992370605px;"><img alt="$y_ i$" class="math" src="http://support.sas.com/documentation/cdl/en/statug/67523/HTML/default/images/statug_mcmc0084.png" height="10" style="border: 0px; font-family: 'Thorndale AMT', 'Times New Roman', Times, serif; line-height: 1.25em; margin: 0px; padding: 0px; vertical-align: -3px;" width="11" /></span><span style="background-color: white; color: #353535; font-family: lato, arial, 'Arial Unicode MS', geneva, 'Lucida Grande', sans-serif; font-size: 13.4399995803833px; line-height: 16.7999992370605px;"> is the logarithm of the height of the stagnant surface layer and the covariate </span><span style="background-color: white; color: #353535; font-family: lato, arial, 'Arial Unicode MS', geneva, 'Lucida Grande', sans-serif; font-size: 13.4399995803833px; line-height: 16.7999992370605px;"><img alt="$x_ i$" class="math" src="http://support.sas.com/documentation/cdl/en/statug/67523/HTML/default/images/statug_mcmc0516.png" height="9" style="border: 0px; font-family: 'Thorndale AMT', 'Times New Roman', Times, serif; line-height: 1.25em; margin: 0px; padding: 0px; vertical-align: -2px;" width="11" /></span><span style="background-color: white; color: #353535; font-family: lato, arial, 'Arial Unicode MS', geneva, 'Lucida Grande', sans-serif; font-size: 13.4399995803833px; line-height: 16.7999992370605px;"> is the logarithm of the flow rate of water. </span><br />
It is a simple example. It provided no problems at all for STAN and Jags. For LaplacesDemon on the other hand I had some problems. It took me quite some effort to obtain samples which seemed to be behaving. I did not try to do this in MCMCpack, but noted that the function MCMCregressChange() uses a slightly different model. The section below shows first the results, at the bottom the code is given.<br />
<br />
Previous post in the series PROC MCMC examples programmed in R were: <a href="http://wiekvoet.blogspot.nl/2014/09/bayesian-models-in-r.html">example 61.1: sampling from a known density</a>, <a href="http://wiekvoet.blogspot.nl/2014/10/bayes-models-from-sas-proc-mixed-in-r.html">example 61.2: Box Cox transformation</a>, <a href="http://wiekvoet.blogspot.nl/2014/11/sas-proc-mcmc-example-in-r-poisson.html">example 61.5: Poisson regression</a>, <a href="http://wiekvoet.blogspot.nl/2014/12/sas-proc-mcmc-in-r-nonlinear-poisson.html">example 61.6: Non-Linear Poisson Regression</a>, <a href="http://wiekvoet.blogspot.nl/2015/01/sas-proc-mcmc-example-in-r-logistic.html">example 61.7: Logistic Regression Random-Effects Model</a>, and <a href="http://wiekvoet.blogspot.nl/2015/03/sas-proc-mcmc-example-in-r-nonlinear.html">example 61.8: Nonlinear Poisson Regression Multilevel Random-Effects Model</a><br />
<h3>
Data</h3>
Data are read as below.<br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">observed <- <br />'1.12 -1.39 1.12 -1.39 0.99 -1.08 1.03 -1.08<br />0.92 -0.94 0.90 -0.80 0.81 -0.63 0.83 -0.63<br />0.65 -0.25 0.67 -0.25 0.60 -0.12 0.59 -0.12<br />0.51 0.01 0.44 0.11 0.43 0.11 0.43 0.11<br />0.33 0.25 0.30 0.25 0.25 0.34 0.24 0.34<br />0.13 0.44 -0.01 0.59 -0.13 0.70 -0.14 0.70<br />-0.30 0.85 -0.33 0.85 -0.46 0.99 -0.43 0.99<br />-0.65 1.19'<br />observed <- scan(text=gsub('[[:space:]]+',' ',observed),<br /> what=list(y=double(),x=double()),<br /> sep=' ')<br />stagnant <- as.data.frame(observed) </span></span></span><br />
<h4>
LaplacesDemon</h4>
I have been playing around with LaplacesDemon. There is actually a function <br />
LaplacesDemon.hpc which can use multiple cores. However, on my computer it seems more efficient just to use mclapply() from the parallel package and give the result class LaplacesDemon.hpc . Having said that, I had again quite some trouble to get LaplacesDemon to work well. In the end I used a combination of two calls to LaplacesDemon. The plot below shows selected samples after the first run. Not good enough, but that I do like this way of displaying the results of chains. It should be added that the labels looked correct with all parameters. However, that gave to much output for this blog. In addition, after the second call the results looked acceptable. <br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrIfyBvb1KXOPvewqFEtm0hZyAFaY8I2Bzla3kRty1AZJ018bTULH0Tni9QSCQbvuMq3Wl6jUnzazVRtnZrBLcJ-iCifP0ejEgxlotwA2KOjoE_FP-HQiYUqeJgC8lfiU9SLAumHhya1c/s1600/lpd1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrIfyBvb1KXOPvewqFEtm0hZyAFaY8I2Bzla3kRty1AZJ018bTULH0Tni9QSCQbvuMq3Wl6jUnzazVRtnZrBLcJ-iCifP0ejEgxlotwA2KOjoE_FP-HQiYUqeJgC8lfiU9SLAumHhya1c/s1600/lpd1.png" /></a></div>
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"><br /></span></span>
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">Call:<br />LaplacesDemon(Model = Model, Data = MyData, Initial.Values = apply(cc1$Posterior1, <br /> 2, median), Covar = var(cc1$Posterior1), Iterations = 1e+05, <br /> Algorithm = "RWM")<br /><br />Acceptance Rate: 0.2408<br />Algorithm: Random-Walk Metropolis<br />Covariance Matrix: (NOT SHOWN HERE; diagonal shown instead)<br /> alpha beta1 beta2 cp s2 <br />4.920676e-04 2.199525e-04 3.753738e-04 8.680339e-04 6.122862e-08 <br /><br />Covariance (Diagonal) History: (NOT SHOWN HERE)<br />Deviance Information Criterion (DIC):<br /> All Stationary<br />Dbar -144.660 -144.660<br />pD 7.174 7.174<br />DIC -137.486 -137.486<br />Initial Values:<br /> alpha beta1 beta2 cp s2 <br /> 0.5467048515 -0.4100040451 -1.0194586232 0.0166405998 0.0004800931 <br /><br />Iterations: 4e+05<br />Log(Marginal Likelihood): 56.92606<br />Minutes of run-time: 1.32<br />Model: (NOT SHOWN HERE)<br />Monitor: (NOT SHOWN HERE)<br />Parameters (Number of): 5<br />Posterior1: (NOT SHOWN HERE)<br />Posterior2: (NOT SHOWN HERE)<br />Recommended Burn-In of Thinned Samples: 0<br />Recommended Burn-In of Un-thinned Samples: 0<br />Recommended Thinning: 5<br />Specs: (NOT SHOWN HERE)<br />Status is displayed every 100 iterations<br />Summary1: (SHOWN BELOW)<br />Summary2: (SHOWN BELOW)<br />Thinned Samples: 40000<br />Thinning: 1<br /><br /><br />Summary of All Samples<br /> Mean SD MCSE ESS LB<br />alpha 5.348239e-01 0.0244216567 2.999100e-04 11442.06 4.895229e-01<br />beta1 -4.196180e-01 0.0142422533 1.661658e-04 12726.60 -4.469688e-01<br />beta2 -1.013882e+00 0.0164892337 1.681833e-04 15191.59 -1.046349e+00<br />cp 2.855852e-02 0.0308177765 3.649332e-04 11945.66 -3.406306e-02<br />s2 4.472644e-04 0.0001429674 1.383748e-06 16571.94 2.474185e-04<br />Deviance -1.446602e+02 3.7879060637 4.940488e-02 10134.91 -1.496950e+02<br />LP 4.636511e+01 1.8939530321 2.470244e-02 10134.91 4.164313e+01<br /> Median UB<br />alpha 0.53339024 5.842152e-01<br />beta1 -0.41996859 -3.903572e-01<br />beta2 -1.01387256 -9.815650e-01<br />cp 0.03110570 8.398674e-02<br />s2 0.00042101 8.006666e-04<br />Deviance -145.46896682 -1.352162e+02<br />LP 46.76949458 4.888251e+01<br /><br /><br />Summary of Stationary Samples<br /> Mean SD MCSE ESS LB<br />alpha 5.348239e-01 0.0244216567 2.999100e-04 11442.06 4.895229e-01<br />beta1 -4.196180e-01 0.0142422533 1.661658e-04 12726.60 -4.469688e-01<br />beta2 -1.013882e+00 0.0164892337 1.681833e-04 15191.59 -1.046349e+00<br />cp 2.855852e-02 0.0308177765 3.649332e-04 11945.66 -3.406306e-02<br />s2 4.472644e-04 0.0001429674 1.383748e-06 16571.94 2.474185e-04<br />Deviance -1.446602e+02 3.7879060637 4.940488e-02 10134.91 -1.496950e+02<br />LP 4.636511e+01 1.8939530321 2.470244e-02 10134.91 4.164313e+01<br /> Median UB<br />alpha 0.53339024 5.842152e-01<br />beta1 -0.41996859 -3.903572e-01<br />beta2 -1.01387256 -9.815650e-01<br />cp 0.03110570 8.398674e-02<br />s2 0.00042101 8.006666e-04<br />Deviance -145.46896682 -1.352162e+02<br />LP 46.76949458 4.888251e+01</span></span><br />
<h4>
STAN</h4>
Stan did not give much problems for this analysis.<br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">Inference for Stan model: smodel.</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">4 chains, each with iter=2000; warmup=1000; thin=1; <br />post-warmup draws per chain=1000, total post-warmup draws=4000.<br /><br /> mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat<br />Beta[1] -0.42 0.00 0.01 -0.45 -0.43 -0.42 -0.41 -0.39 1017 1.00<br />Beta[2] -1.01 0.00 0.02 -1.05 -1.02 -1.01 -1.00 -0.98 1032 1.00<br />Alpha 0.54 0.00 0.03 0.49 0.52 0.53 0.55 0.59 680 1.00<br />s2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1361 1.00<br />cp 0.03 0.00 0.03 -0.04 0.00 0.03 0.05 0.09 636 1.00<br />lp__ 90.63 0.06 1.78 86.17 89.71 91.00 91.91 93.03 935 1.01<br /><br />Samples were drawn using NUTS(diag_e) at Fri Jun 19 21:17:54 2015.<br />For each parameter, n_eff is a crude measure of effective sample size,<br />and Rhat is the potential scale reduction factor on split chains (at <br />convergence, Rhat=1).</span></span><br />
<h4>
JAGS</h4>
Again no problems for Jags.<br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">Inference for Bugs model at "/tmp/Rtmpy4a6C5/modeld4f6e9c9055.txt", fit using jags,<br /> 4 chains, each with 10000 iterations (first 5000 discarded), n.thin = 5<br /> n.sims = 4000 iterations saved<br /> mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat<br />alpha 0.534 0.027 0.479 0.518 0.533 0.552 0.586 1.040<br />beta[1] -0.420 0.015 -0.450 -0.429 -0.420 -0.410 -0.389 1.013<br />beta[2] -1.014 0.017 -1.049 -1.024 -1.014 -1.003 -0.980 1.023<br />cp 0.029 0.035 -0.038 0.006 0.032 0.051 0.100 1.037<br />s2 0.000 0.000 0.000 0.000 0.000 0.001 0.001 1.004<br />deviance -144.501 3.986 -149.634 -147.378 -145.432 -142.584 -134.378 1.021<br /> n.eff<br />alpha 160<br />beta[1] 380<br />beta[2] 290<br />cp 160<br />s2 710<br />deviance 290<br /><br />For each parameter, n.eff is a crude measure of effective sample size,<br />and Rhat is the potential scale reduction factor (at convergence, Rhat=1).<br /><br />DIC info (using the rule, pD = var(deviance)/2)<br />pD = 7.9 and DIC = -136.6<br />DIC is an estimate of expected predictive error (lower deviance is better).</span></span><br />
<h3>
CODE used</h3>
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"># http://support.sas.com/documentation/cdl/en/statug/67523/HTML/default/viewer.htm#statug_mcmc_examples18.htm</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"># Example 61.12 Change Point Models </span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">##############</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">#Data </span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">##############</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">observed <- </span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">'1.12 -1.39 1.12 -1.39 0.99 -1.08 1.03 -1.08</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">0.92 -0.94 0.90 -0.80 0.81 -0.63 0.83 -0.63</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">0.65 -0.25 0.67 -0.25 0.60 -0.12 0.59 -0.12</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">0.51 0.01 0.44 0.11 0.43 0.11 0.43 0.11</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">0.33 0.25 0.30 0.25 0.25 0.34 0.24 0.34</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">0.13 0.44 -0.01 0.59 -0.13 0.70 -0.14 0.70</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">-0.30 0.85 -0.33 0.85 -0.46 0.99 -0.43 0.99</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">-0.65 1.19'</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">observed <- scan(text=gsub('[[:space:]]+',' ',observed),</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> what=list(y=double(),x=double()),</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> sep=' ')</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">stagnant <- as.data.frame(observed)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">#plot(y~x,data=stagnant)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">##############</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">#LaplacesDemon </span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">##############</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">library('LaplacesDemon')</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">library(parallel)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"><br /></span></span></span>
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">mon.names <- "LP"</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">parm.names <- c('alpha',paste('beta',1:2,sep=''),'cp','s2')</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"><br /></span></span></span>
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">PGF <- function(Data) {</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> x <-c(rnorm(5,0,1))</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> x[4] <- runif(1,-1.3,1.1)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> x[5] <- runif(1,0,2)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> x</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">}</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">MyData <- list(mon.names=mon.names, </span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> parm.names=parm.names,</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> PGF=PGF,</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> x=stagnant$x,</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> y=stagnant$y)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">#N<-1</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">Model <- function(parm, Data)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">{</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> alpha=parm[1]</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> beta=parm[2:3]</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> cp=parm[4]</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> s2=parm[5]</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> yhat <- alpha+(Data$x-cp)*beta[1+(Data$x>=cp)]</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> LL <- sum(dnorm(Data$y,yhat,sd=sqrt(s2),log=TRUE))</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> prior <- sum(dnorm(parm[1:3],0,1e3,log=TRUE))+</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> dunif(cp,-1.3,1.1,log=TRUE)+</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> dunif(s2,0,5,log=TRUE)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> LP=LL+prior</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LP,</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> yhat=yhat,</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> parm=parm)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> return(Modelout)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">}</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">Fit1 <- mclapply(1:4,function(i) </span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> LaplacesDemon(Model, </span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> Data=MyData,</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> Iterations=100000,</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> Algorithm='RWM',</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> Covar=c(rep(.01,4),.00001),</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> Initial.Values = c(.5,-.4,-1,.05,.001)) #Initial.Values </span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">class(Fit1) <- 'demonoid.hpc'</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">plot(Fit1,Data=MyData,Parms=c('alpha'))</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">cc1 <- Combine(Fit1,MyData)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">#</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">Fit2 <- mclapply(1:4,function(i) </span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> LaplacesDemon(Model, </span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> Data=MyData,</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> Iterations=100000,</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> Algorithm='RWM',</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> Covar=var(cc1$Posterior1),</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> Initial.Values = apply(cc1$Posterior1,2,median)) #Initial.Values </span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">class(Fit2) <- 'demonoid.hpc'</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">#plot(Fit2,Data=MyData,Parms=c('alpha'))</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">cc2 <- Combine(Fit2,MyData)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">cc2</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">##############</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">#STAN </span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">##############</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">stanData <- list(</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> N=nrow(stagnant),</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> x=stagnant$x,</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> y=stagnant$y)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"><br /></span></span></span>
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">library(rstan) </span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">smodel <- '</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> data {</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> int <lower=1> N;</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> vector[N] x;</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> vector[N] y;</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> }</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> parameters {</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> real Beta[2];</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> real Alpha;</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> real <lower=0> s2;</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> real <lower=-1.3,upper=1.1> cp;</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> }</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> transformed parameters {</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> vector[N] yhat;</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> for (i in 1:N)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> yhat[i] <- Alpha+(x[i]-cp)*Beta[1+(x[i]>cp)];</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> }</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> model {</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> y ~ normal(yhat,sqrt(s2));</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> s2 ~ uniform(0,1e3);</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> cp ~ uniform(-1.3,1.1);</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> Alpha ~ normal(0,1000);</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> Beta ~ normal(0,1000);</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> }</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> ' </span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">fstan <- stan(model_code = smodel, </span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> data = stanData, </span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> pars=c('Beta','Alpha','s2','cp'))</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">fstan</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">##############</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">#Jags </span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">##############</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">library(R2jags)</span></span></span><span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">jagsdata <- list(</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> N=nrow(stagnant),</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> x=stagnant$x,</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> y=stagnant$y)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"><br /></span></span></span>
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">jagsm <- function() {</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> for(i in 1:N) {</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> yhat[i] <- alpha+(x[i]-cp)*beta[1+(x[i]>cp)]</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> y[i] ~ dnorm(yhat[i],tau)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> }</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> tau <- 1/s2</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> s2 ~ dunif(0,1e3)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> cp ~ dunif(-1.3,1.1)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> alpha ~ dnorm(0,0.001)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> beta[1] ~ dnorm(0,0.001)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> beta[2] ~ dnorm(0,0.001)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">}</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">params <- c('alpha','beta','s2','cp')</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">myj <-jags(model=jagsm,</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> data = jagsdata,</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> n.chains=4,</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> n.iter=10000,</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> parameters=params,</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">)</span></span></span><br />
<span style="background-color: #f3f3f3;"><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">myj</span></span></span>Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com0tag:blogger.com,1999:blog-3524617892004055830.post-33033392986289400752015-06-14T11:14:00.000+02:002015-06-14T11:14:19.379+02:00Parallel and a new laptopI am thinking about a new laptop. For one thing a 1366*768 resolution just seems to get impractically small. Secondly, faster comutations, more memory.<br />
Regarding CPU speed, my current laptop has a lowly Celeron 877. From what I see at my computers activity, under R it is mostly one core which does the work. Which means that even though there are two cores the single core CPU mark of 715 (from cpubenchmark.net) is what I have available. A bit of checking shows the current batch of processors has mainly more cores. For instance, the highest rated common CPU, an Intel Core i7-4710HQ, has a CPU mark of 7935 and single core 1870. That is 2.5 times faster for one core. But it is best because there are four cores. The same is true down the line. Four cores is common. But single core speed has not improved that much. Unless I can actually use those extra cores, what is the gain? Hence I am wondering, can I do something with extra cores for real world R computations? For this I can investigate.<br />
<h4>
Easy approach, Parallel</h4>
A bit of browsing shows that the parallel package is the easy way to use multiple cores, think of using mclapply() rather than lapply. And in many situations this is easy, for instance, cross validation is easy, except for the small upfront cost of partitioning the data in chunks. Trying different settings for a machine learning problem is similar.<br />
To give this a certain real world setting, data was taken from the UCI machine learning repository: <a href="http://archive.ics.uci.edu/ml/datasets/Physicochemical+Properties+of+Protein+Tertiary+Structure#">Physicochemical Properties of Protein Tertiary Structure Data Set </a>which has 45730 rows and 9 variables. A bit of plotting shows this figure for 2000 random selected rows. It seems the problem is not so much which variables to use but rather interactions. This was also suggested by poor performance of linear regression.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHUF6SEyWgCOycxRpw-jD87Chg0_94fBQJGXZqQmkrxoX8-goZFEeHM3jE2UQGqNOrAwYSrlkFurUmlWIX217R1kEl2B8itz-c1YT0h8petSDJXKoY7jwMqUunWoi0-mPiz_R5Zj2nCmo/s1600/pairs1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHUF6SEyWgCOycxRpw-jD87Chg0_94fBQJGXZqQmkrxoX8-goZFEeHM3jE2UQGqNOrAwYSrlkFurUmlWIX217R1kEl2B8itz-c1YT0h8petSDJXKoY7jwMqUunWoi0-mPiz_R5Zj2nCmo/s1600/pairs1.png" /></a></div>
<h4>
Random forest in parallel</h4>
Even though nine variables is a bit low for random forest, I elected to use it as first technique. The main variables to tune are nodesize and number of variables to try. Hence I wrapped this in mclapply, not even using a cross validation and taking care not to nest the mclapply calls. The result was a big usage of memory. Which in hindsight may be obvious. Each of the instances gets a complete data set. The net effect is that I ran out of RAM and data was swapped. This cannot be good for performance. It may also explain comments I have read that the caret package uses too much memory. A decent set of hardware for machine learning including a four core processor would create four instances of the same data. Perhaps adding another 4 GB of memory and an SSD rather than a HDD would serve me just as well as a new laptop... <br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">tol <- expand.grid(mtry=1:3,<br /> nodesize=c(3,5,10))<br />bomen <- mclapply(seq(1:nrow(tol)),function(i)<br /> randomForest(<br /> y=train[,1],<br /> x=train[,-1],<br /> ntree=50,<br /> mtry=tol$mtry[i],<br /> nodesize=tol$nodesize[i])<br />)</span></span><br />
<h4>
Final thoughts</h4>
New hardware could also bring GPU computing in the picture. But this seems not so easy. It is unclear to me if CUDA or OpenCL is preferable and neither seems particularly easy to use. Then again, I could minimize hardware usage, buy a chromebook with a decent screen do my stuff in the cloud. For now though, I will continue to investigate how extra cores can help me.Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com6tag:blogger.com,1999:blog-3524617892004055830.post-43107271051896685442015-06-07T12:55:00.002+02:002015-06-07T12:55:30.110+02:00European debt and interestI was told the Eurostat package would be interesting for me. This is indeed true and now I want to use it to plot some data which are related core of some of the European policies; debt.<br />
In these plots I only show individual countries, not aggregated over the EU or Euro zone. In addition Norway is dropped, because it had less data, is not an EU country and has fairly different financial positions than the rest of Europe. This resulted in 28 countries, which can be displayed in a grid of 4 by 7.<br />
<h3>
Lending and borrowing</h3>
In lending and borrowing we see the crisis troubles. Especially in Ireland there is a spike showing increased borrowing. In a few years from net lender to 30% borrowing is massive. But the same can be seen in Spain, UK. In fact it only few countries did not have more borrowing in that period.<br />
The plot has an additional line, in red 3% borrowing is depicted, 3% being in the stability and growth pact. It can be seen that quite some countries are under or near those 3% or edging closer to it.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVAwktynrxagdkkIH1rv95SPtvIm50pnZ64tvWqBvdqDENU8ixNmzij90oYCdWHhTcCltVV6LoWXaO2klqnqpOrfZJjEjVxE4rjiqOyLoA5Bff5DAamBI44ksUBPhltA-WCrWx0At9xYs/s1600/B9.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVAwktynrxagdkkIH1rv95SPtvIm50pnZ64tvWqBvdqDENU8ixNmzij90oYCdWHhTcCltVV6LoWXaO2klqnqpOrfZJjEjVxE4rjiqOyLoA5Bff5DAamBI44ksUBPhltA-WCrWx0At9xYs/s1600/B9.png" /></a></div>
<h3>
Debt</h3>
The consequence of all that borrowing is debt. All countries have debt. The red line is placed at 60%, which is the upper limit for the Euro zone stability and growth pact. Many countries had debt under 60% before the crisis, examples, Spain, Netherlands and Ireland. Italy and Belgium were quite above those 60% and got some increase. Germany, Europe's economic miracle, had debts over 60%. There seems to be no obvious link between the debt before crisis and the debt post crisis.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhVs_LmkxtUmOiOlrDxSoZxsVJHulWmtcGixTx73CWI98ExqE_3hJb7AgbHbDyM7Zcx3jjBC3dQqg2xW2FjMXAJYgDKqyvrrSHcsJLTj3QhJNpMFaWte7zBShRDy_7N8l4y-WO27kU8sxo/s1600/GD.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhVs_LmkxtUmOiOlrDxSoZxsVJHulWmtcGixTx73CWI98ExqE_3hJb7AgbHbDyM7Zcx3jjBC3dQqg2xW2FjMXAJYgDKqyvrrSHcsJLTj3QhJNpMFaWte7zBShRDy_7N8l4y-WO27kU8sxo/s1600/GD.png" /></a></div>
<h3>
Interest</h3>
The consequence of debt is interest. This is probably the most remarkable set of data. There are no targets here. What is visible is the decrease in interest for many countries at the end of last century. Here the positive influence of the Euro is visible. But the strange thing is that many countries did hardly suffer increased interest payments during the crisis at all. Italy payed approximately 5% for more than 10 years now. Netherlands has a decreasing curve which was flattened by the crisis and is decreasing again. In summary, many countries currently pay historically low interests.<br />
Part of this is obviously the work of various policies to contain the crisis. But it can also mean that debt may not be the biggest problem Europe faces at this moment.<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj1KskMx6Xlfuwbdh2tLZSmfrqyx_tas0ttuFo6JEUjfOk5PlWU-NuM8Yyv07IM2eCcXBtkvfK-cCDeKI8jLZQqCovaml9_2lr_2wBmMK4z8KfMzUYs1Em_GSAXQnEuAGrRlKEP6Tc6UEY/s1600/D41PAY.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj1KskMx6Xlfuwbdh2tLZSmfrqyx_tas0ttuFo6JEUjfOk5PlWU-NuM8Yyv07IM2eCcXBtkvfK-cCDeKI8jLZQqCovaml9_2lr_2wBmMK4z8KfMzUYs1Em_GSAXQnEuAGrRlKEP6Tc6UEY/s1600/D41PAY.png" /></a></div>
<h3>
Code</h3>
<div>
The code is fairly simple. What is needed is retrieving what codes mean. For countries these are extracted from the database. For the other codes it is most easy just to open the table on the Eurostat website and look there which codes are interesting.</div>
<div>
The dplyr package allowed channeling selection and plotting in one call, thereby eliminating the chance of not updating intermediate data frames.</div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">library(eurostat)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">library(dplyr)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">library(ggplot2)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">library(scales)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">r1 <- get_eurostat('gov_10dd_edpt1')</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"># add country names</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">r2 <- get_eurostat_dic('geo') %>%</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> geo=V1,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> country=V2,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> country=gsub('\\(.*$','',country)) %>%</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> select(.,geo,country) %>%</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> merge(.,r1) %>%</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"># filter countries</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> filter(.,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> !grepl('EA.*',geo),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> !grepl('EU.*',geo),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> geo!='NO')</span><br />
<br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">filter(r2,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> sector=='S13', # general government</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> na_item=='B9', # Net lending (+) /net borrowing (-)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> unit=='PC_GDP' # % GDP</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ) %>%</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ggplot(.,aes(x=time,y=values)) + </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> geom_line()+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ylab('% GDP')+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> facet_wrap(~country,nrow=4) +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ggtitle('Net lending (+) /net borrowing (-)') +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> xlab('Year') +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> geom_hline(yintercept=-3,colour='red') +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> scale_x_date(</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> breaks=c(as.Date("2000-01-01"),as.Date("2010-01-01") )</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ,labels = date_format("%Y"))</span><br />
<br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">#########</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">filter(r2,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> sector=='S13', # general government</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> na_item=='GD', # Gross debt</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> unit=='PC_GDP' # % GDP</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ) %>%</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ggplot(.,aes(x=time,y=values)) + </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> geom_line()+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ylab('% GDP')+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> facet_wrap(~country,nrow=4) +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ggtitle('Gross Debt') +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> xlab('Year') +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> geom_hline(yintercept=60,colour='red') +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> scale_x_date(</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> breaks=c(as.Date("2000-01-01"),as.Date("2010-01-01") )</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ,labels = date_format("%Y"))</span><br />
<br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">#########</span><br />
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">filter(r2,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> sector=='S13', # general government</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> na_item=='D41PAY', # Interest, payable</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> unit=='PC_GDP' # % GDP</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ) %>%</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ggplot(.,aes(x=time,y=values)) +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> geom_line()+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ylab('% GDP')+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> facet_wrap(~country,nrow=4) +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ggtitle('Interest, payable') +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> xlab('Year') +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> scale_x_date(</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> breaks=c(as.Date("2000-01-01"),as.Date("2010-01-01") )</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ,labels = date_format("%Y"))</span>Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com1tag:blogger.com,1999:blog-3524617892004055830.post-34178696133237147572015-05-31T09:11:00.000+02:002015-05-31T09:11:13.079+02:00Paper Helicopter Experiment, part IIIAs final part of my paper helicopter experiment analysis (<a href="http://wiekvoet.blogspot.nl/2015/05/the-paper-helicopter-experiment.html">part I</a>, <a href="http://wiekvoet.blogspot.nl/2015/05/helicopter-experiment-part-ii.html">part II</a>) I do a reanalysis for one more data set. In 2002 Erik Erhardt and Hantao Mai did an extensive experiment, see <a href="http://statacumen.com/old/projects/WPI/Erhardt_Erik_rsmproj.pdf">The Search for the Optimal Paper Helicopter</a>. They did a number of steps, including variable screening, steepest ascend and confirmatory experiment. For my part, I have combined all those data in one data set, and checked what kind of model would be used.<br />
<h3>
Data</h3>
The data extracted contains 45 observations. These observations have a number of replications, for instance a central composite design has a replicated center and the optimum found has been repeatedly tested.<br />
After creation of a factor combining all variables it is pretty easy to examine the replications. The replications are thus. Here the first eight variables are the experimental settings, <i>allvl </i>is the factor combining all levels, <i>Time </i>is response and <i>Freq </i>the frequency of occurrence for <i>allvl</i>:<br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> RotorLength RotorWidth BodyLength FootLength FoldLength FoldWidth</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">1 8.50 4.00 3.5 1.25 8 2.0</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">2 8.50 4.00 3.5 1.25 8 2.0</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">3 8.50 4.00 3.5 1.25 8 2.0</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">4 8.50 4.00 3.5 1.25 8 2.0</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">5 8.50 4.00 3.5 1.25 8 2.0</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">6 8.50 4.00 3.5 1.25 8 2.0</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">7 11.18 2.94 2.0 2.00 6 1.5</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">8 11.18 2.94 2.0 2.00 6 1.5</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">9 11.18 2.94 2.0 2.00 6 1.5</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">10 11.18 2.94 2.0 2.00 6 1.5</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">11 11.18 2.94 2.0 2.00 6 1.5</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">12 11.18 2.94 2.0 2.00 6 1.5</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">13 11.50 2.83 2.0 1.50 6 1.5</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">14 11.50 2.83 2.0 1.50 6 1.5</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">15 11.50 2.83 2.0 1.50 6 1.5</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperWeight DirectionOfFold allvl Time Freq</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">1 heavy against 8.5.4.0.3.5.1.2. 8.2.0.heavy.against 13.88 3</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">2 heavy against 8.5.4.0.3.5.1.2. 8.2.0.heavy.against 15.91 3</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">3 heavy against 8.5.4.0.3.5.1.2. 8.2.0.heavy.against 16.08 3</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">4 light against 8.5.4.0.3.5.1.2. 8.2.0.light.against 10.52 3</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">5 light against 8.5.4.0.3.5.1.2. 8.2.0.light.against 10.81 3</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">6 light against 8.5.4.0.3.5.1.2. 8.2.0.light.against 10.89 3</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">7 light against 11.2.2.9.2.0.2.0. 6.1.5.light.against 17.29 6</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">8 light against 11.2.2.9.2.0.2.0. 6.1.5.light.against 19.41 6</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">9 light against 11.2.2.9.2.0.2.0. 6.1.5.light.against 18.55 6</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">10 light against 11.2.2.9.2.0.2.0. 6.1.5.light.against 15.54 6</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">11 light against 11.2.2.9.2.0.2.0. 6.1.5.light.against 16.40 6</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">12 light against 11.2.2.9.2.0.2.0. 6.1.5.light.against 19.67 6</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">13 light against 11.5.2.8.2.0.1.5. 6.1.5.light.against 16.35 3</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">14 light against 11.5.2.8.2.0.1.5. 6.1.5.light.against 16.41 3</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">15 light against 11.5.2.8.2.0.1.5. 6.1.5.light.against 17.38 3</span><br />
<h3>
Transformation</h3>
<div>
It is also possible to do a regression of <i>Time </i>against <i>allvl</i> and examine the residuals. Since it is not difficult to imagine that error is proportional to elapsed time this is done for both original data and log10 transformed data.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiIDoAvN2xWJLibzvtfcd2_giuPqHepRS7N8Tkmj45JiVOhOMF3JlVsxmQtg4PvxRxs48LHy_ziCwfv87n03vmwMzJW10WfHI7PtQ_uxqFrp_nxn9tR-u2wMmEWt6eIzTZGIbiyOX1xk8Q/s1600/tranform.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiIDoAvN2xWJLibzvtfcd2_giuPqHepRS7N8Tkmj45JiVOhOMF3JlVsxmQtg4PvxRxs48LHy_ziCwfv87n03vmwMzJW10WfHI7PtQ_uxqFrp_nxn9tR-u2wMmEWt6eIzTZGIbiyOX1xk8Q/s1600/tranform.png" /></a></div>
<div>
As can be seen it seems that larger values have the larger error, but it is not really corrected very much by a log transformation. To examine this a bit more, the Box-Cox transformation is used. From there it seems square root is almost optimum, but log and no transformation should also work. It was decided to use a square root transformation.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDUxFKTJ8rpZnINan5CKNttbTQA33mNyKgQNhuqfHxiYa8JtzQs4ceuuQcw_br1w_wzI2EBI7ICTpF2eHzhFVR3KXYZ8y3uI0j9bFsddBXTTQSIvqer2OChyphenhyphen_Kpqf8vfMKw-5kn5ZjJ9A/s1600/boxcox.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDUxFKTJ8rpZnINan5CKNttbTQA33mNyKgQNhuqfHxiYa8JtzQs4ceuuQcw_br1w_wzI2EBI7ICTpF2eHzhFVR3KXYZ8y3uI0j9bFsddBXTTQSIvqer2OChyphenhyphen_Kpqf8vfMKw-5kn5ZjJ9A/s320/boxcox.png" width="320" /></a></div>
<div>
Given the square root transformation the residual error should not be lower than 0.02, since that is what the replications have. On the other hand, much higher than 0.02 is a clear sign of under fitting.</div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace;">Analysis of Variance Table</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">Response: sqrt(Time)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> Df Sum Sq Mean Sq F value Pr(>F) </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">allvl 3 1.84481 0.61494 26 2.707e-05 ***</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">Residuals 11 0.26016 0.02365 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">---</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1</span></div>
</div>
<h3>
Model selection</h3>
<div>
Given the residual variance desired, the model linear in variables is not sufficient.</div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace;">Analysis of Variance Table</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">Response: sTime</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> Df Sum Sq Mean Sq F value Pr(>F) </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">RotorLength 1 3.6578 3.6578 18.4625 0.0001257 ***</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">RotorWidth 1 1.0120 1.0120 5.1078 0.0299644 * </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">BodyLength 1 0.1352 0.1352 0.6823 0.4142439 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">FootLength 1 0.2719 0.2719 1.3725 0.2490708 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">FoldLength 1 0.0060 0.0060 0.0302 0.8629331 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">FoldWidth 1 0.0189 0.0189 0.0953 0.7592922 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">PaperWeight 1 0.6528 0.6528 3.2951 0.0778251 . </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">DirectionOfFold 1 0.4952 0.4952 2.4994 0.1226372 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">Residuals 36 7.1324 0.1981 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">---</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1</span></div>
</div>
<div>
<br /></div>
Adding interactions and quadratic effects via stepwise regression did not improve much.<br />
<div>
<div>
<span style="font-family: Courier New, Courier, monospace;">Analysis of Variance Table</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">Response: sTime</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> Df Sum Sq Mean Sq F value Pr(>F) </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">RotorLength 1 3.6578 3.6578 29.5262 3.971e-06 ***</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">RotorWidth 1 1.0120 1.0120 8.1687 0.007042 ** </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">FootLength 1 0.3079 0.3079 2.4851 0.123676 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">PaperWeight 1 0.6909 0.6909 5.5769 0.023730 * </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">I(RotorLength^2) 1 2.2035 2.2035 17.7872 0.000159 ***</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">I(RotorWidth^2) 1 0.3347 0.3347 2.7018 0.108941 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">FootLength:PaperWeight 1 0.4291 0.4291 3.4634 0.070922 . </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">RotorWidth:FootLength 1 0.2865 0.2865 2.3126 0.137064 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">Residuals 36 4.4598 0.1239 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">---</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '</span></div>
<div>
Just adding the quadratic effects did not help either. However, using both linear and quadratic as a starting point did give a more extensive model.</div>
</div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace;">Analysis of Variance Table</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">Response: sTime</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> Df Sum Sq Mean Sq F value Pr(>F) </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">RotorLength 1 3.6578 3.6578 103.8434 5.350e-10 ***</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">RotorWidth 1 1.0120 1.0120 28.7293 1.918e-05 ***</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">FootLength 1 0.3079 0.3079 8.7401 0.0070780 ** </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">FoldLength 1 0.0145 0.0145 0.4113 0.5276737 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">FoldWidth 1 0.0099 0.0099 0.2816 0.6007138 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">PaperWeight 1 0.7122 0.7122 20.2180 0.0001633 ***</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">DirectionOfFold 1 0.5175 0.5175 14.6902 0.0008514 ***</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">I(RotorLength^2) 1 1.7405 1.7405 49.4119 3.661e-07 ***</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">I(RotorWidth^2) 1 0.3160 0.3160 8.9709 0.0064635 ** </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">I(FootLength^2) 1 0.1216 0.1216 3.4525 0.0760048 . </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">I(FoldLength^2) 1 0.0045 0.0045 0.1272 0.7245574 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">RotorLength:RotorWidth 1 0.4181 0.4181 11.8693 0.0022032 ** </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">RotorLength:PaperWeight 1 0.3778 0.3778 10.7247 0.0033254 ** </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">RotorWidth:FootLength 1 0.6021 0.6021 17.0947 0.0004026 ***</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">PaperWeight:DirectionOfFold 1 0.3358 0.3358 9.5339 0.0051968 ** </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">RotorWidth:FoldLength 1 1.5984 1.5984 45.3778 7.167e-07 ***</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">RotorWidth:FoldWidth 1 0.3937 0.3937 11.1769 0.0028207 ** </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">RotorWidth:PaperWeight 1 0.2029 0.2029 5.7593 0.0248924 * </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">RotorWidth:DirectionOfFold 1 0.0870 0.0870 2.4695 0.1297310 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">RotorLength:FootLength 1 0.0687 0.0687 1.9517 0.1757410 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">FootLength:PaperWeight 1 0.0732 0.0732 2.0781 0.1629080 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">Residuals 23 0.8102 0.0352 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">---</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1</span></div>
</div>
This model does is quite extensive. For prediction purpose I would probably drop a few terms. For instance FootLength:PaperWeight could be removed, lessen the fit yet improve predictions, since its p-value is close to 0.15. As it is currently the model does have some issues. For instance, quite some points have high leverage.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiZQPldvRauYEkfL3j-P7iDVa93IUMDiSppu3IKUzlTobgz5NawpHoihPvnS-_bmvTBVWv_6Rdq4Dk9ye2wqh9xaGUs234F3lODcJ9O_v56kxURyBX1eBfuvZLMFWONxn75xiTIacmfweI/s1600/fit2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiZQPldvRauYEkfL3j-P7iDVa93IUMDiSppu3IKUzlTobgz5NawpHoihPvnS-_bmvTBVWv_6Rdq4Dk9ye2wqh9xaGUs234F3lODcJ9O_v56kxURyBX1eBfuvZLMFWONxn75xiTIacmfweI/s1600/fit2.png" /></a></div>
<div>
<br />
<div>
<h3>
Conclusion</h3>
The paper helicopter needs quite a complex model to fit all effects on flying time. This somewhat validates the complex models found in part 1. </div>
<h3>
Code used</h3>
</div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">library(dplyr)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">library(car)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">h3 <- read.table(sep='\t',header=TRUE,text='</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> RotorLength<span class="Apple-tab-span" style="white-space: pre;"> </span>RotorWidth<span class="Apple-tab-span" style="white-space: pre;"> </span>BodyLength<span class="Apple-tab-span" style="white-space: pre;"> </span>FootLength<span class="Apple-tab-span" style="white-space: pre;"> </span>FoldLength<span class="Apple-tab-span" style="white-space: pre;"> </span>FoldWidth<span class="Apple-tab-span" style="white-space: pre;"> </span>PaperWeight<span class="Apple-tab-span" style="white-space: pre;"> </span>DirectionOfFold<span class="Apple-tab-span" style="white-space: pre;"> </span>Time</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 5.5<span class="Apple-tab-span" style="white-space: pre;"> </span>3<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>0<span class="Apple-tab-span" style="white-space: pre;"> </span>5<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>11.8</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 5.5<span class="Apple-tab-span" style="white-space: pre;"> </span>3<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.5<span class="Apple-tab-span" style="white-space: pre;"> </span>11<span class="Apple-tab-span" style="white-space: pre;"> </span>2.5<span class="Apple-tab-span" style="white-space: pre;"> </span>heavy<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>8.29</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 5.5<span class="Apple-tab-span" style="white-space: pre;"> </span>3<span class="Apple-tab-span" style="white-space: pre;"> </span>5.5<span class="Apple-tab-span" style="white-space: pre;"> </span>0<span class="Apple-tab-span" style="white-space: pre;"> </span>11<span class="Apple-tab-span" style="white-space: pre;"> </span>2.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>with<span class="Apple-tab-span" style="white-space: pre;"> </span>9</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 5.5<span class="Apple-tab-span" style="white-space: pre;"> </span>3<span class="Apple-tab-span" style="white-space: pre;"> </span>5.5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.5<span class="Apple-tab-span" style="white-space: pre;"> </span>5<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>heavy<span class="Apple-tab-span" style="white-space: pre;"> </span>with<span class="Apple-tab-span" style="white-space: pre;"> </span>7.21</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 5.5<span class="Apple-tab-span" style="white-space: pre;"> </span>5<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>0<span class="Apple-tab-span" style="white-space: pre;"> </span>11<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>heavy<span class="Apple-tab-span" style="white-space: pre;"> </span>with<span class="Apple-tab-span" style="white-space: pre;"> </span>6.65</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 5.5<span class="Apple-tab-span" style="white-space: pre;"> </span>5<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.5<span class="Apple-tab-span" style="white-space: pre;"> </span>5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>with<span class="Apple-tab-span" style="white-space: pre;"> </span>10.26</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 5.5<span class="Apple-tab-span" style="white-space: pre;"> </span>5<span class="Apple-tab-span" style="white-space: pre;"> </span>5.5<span class="Apple-tab-span" style="white-space: pre;"> </span>0<span class="Apple-tab-span" style="white-space: pre;"> </span>5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.5<span class="Apple-tab-span" style="white-space: pre;"> </span>heavy<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>7.98</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 5.5<span class="Apple-tab-span" style="white-space: pre;"> </span>5<span class="Apple-tab-span" style="white-space: pre;"> </span>5.5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.5<span class="Apple-tab-span" style="white-space: pre;"> </span>11<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>8.06</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.5<span class="Apple-tab-span" style="white-space: pre;"> </span>3<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>0<span class="Apple-tab-span" style="white-space: pre;"> </span>5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.5<span class="Apple-tab-span" style="white-space: pre;"> </span>heavy<span class="Apple-tab-span" style="white-space: pre;"> </span>with<span class="Apple-tab-span" style="white-space: pre;"> </span>9.2</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.5<span class="Apple-tab-span" style="white-space: pre;"> </span>3<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.5<span class="Apple-tab-span" style="white-space: pre;"> </span>11<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>with<span class="Apple-tab-span" style="white-space: pre;"> </span>19.35</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.5<span class="Apple-tab-span" style="white-space: pre;"> </span>3<span class="Apple-tab-span" style="white-space: pre;"> </span>5.5<span class="Apple-tab-span" style="white-space: pre;"> </span>0<span class="Apple-tab-span" style="white-space: pre;"> </span>11<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>heavy<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>12.08</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.5<span class="Apple-tab-span" style="white-space: pre;"> </span>3<span class="Apple-tab-span" style="white-space: pre;"> </span>5.5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.5<span class="Apple-tab-span" style="white-space: pre;"> </span>5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>20.5</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.5<span class="Apple-tab-span" style="white-space: pre;"> </span>5<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>0<span class="Apple-tab-span" style="white-space: pre;"> </span>11<span class="Apple-tab-span" style="white-space: pre;"> </span>2.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>13.58</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.5<span class="Apple-tab-span" style="white-space: pre;"> </span>5<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.5<span class="Apple-tab-span" style="white-space: pre;"> </span>5<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>heavy<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>7.47</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.5<span class="Apple-tab-span" style="white-space: pre;"> </span>5<span class="Apple-tab-span" style="white-space: pre;"> </span>5.5<span class="Apple-tab-span" style="white-space: pre;"> </span>0<span class="Apple-tab-span" style="white-space: pre;"> </span>5<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>with<span class="Apple-tab-span" style="white-space: pre;"> </span>9.79</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.5<span class="Apple-tab-span" style="white-space: pre;"> </span>5<span class="Apple-tab-span" style="white-space: pre;"> </span>5.5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.5<span class="Apple-tab-span" style="white-space: pre;"> </span>11<span class="Apple-tab-span" style="white-space: pre;"> </span>2.5<span class="Apple-tab-span" style="white-space: pre;"> </span>heavy<span class="Apple-tab-span" style="white-space: pre;"> </span>with<span class="Apple-tab-span" style="white-space: pre;"> </span>9.2</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 8.5<span class="Apple-tab-span" style="white-space: pre;"> </span>4<span class="Apple-tab-span" style="white-space: pre;"> </span>3.5<span class="Apple-tab-span" style="white-space: pre;"> </span>1.25<span class="Apple-tab-span" style="white-space: pre;"> </span>8<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>10.52</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 8.5<span class="Apple-tab-span" style="white-space: pre;"> </span>4<span class="Apple-tab-span" style="white-space: pre;"> </span>3.5<span class="Apple-tab-span" style="white-space: pre;"> </span>1.25<span class="Apple-tab-span" style="white-space: pre;"> </span>8<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>10.81</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 8.5<span class="Apple-tab-span" style="white-space: pre;"> </span>4<span class="Apple-tab-span" style="white-space: pre;"> </span>3.5<span class="Apple-tab-span" style="white-space: pre;"> </span>1.25<span class="Apple-tab-span" style="white-space: pre;"> </span>8<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>10.89</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 8.5<span class="Apple-tab-span" style="white-space: pre;"> </span>4<span class="Apple-tab-span" style="white-space: pre;"> </span>3.5<span class="Apple-tab-span" style="white-space: pre;"> </span>1.25<span class="Apple-tab-span" style="white-space: pre;"> </span>8<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>heavy<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>15.91</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 8.5<span class="Apple-tab-span" style="white-space: pre;"> </span>4<span class="Apple-tab-span" style="white-space: pre;"> </span>3.5<span class="Apple-tab-span" style="white-space: pre;"> </span>1.25<span class="Apple-tab-span" style="white-space: pre;"> </span>8<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>heavy<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>16.08</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 8.5<span class="Apple-tab-span" style="white-space: pre;"> </span>4<span class="Apple-tab-span" style="white-space: pre;"> </span>3.5<span class="Apple-tab-span" style="white-space: pre;"> </span>1.25<span class="Apple-tab-span" style="white-space: pre;"> </span>8<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>heavy<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>13.88</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 8.5<span class="Apple-tab-span" style="white-space: pre;"> </span>4<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>12.99</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 9.5<span class="Apple-tab-span" style="white-space: pre;"> </span>3.61<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>15.22</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 10.5<span class="Apple-tab-span" style="white-space: pre;"> </span>3.22<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>16.34</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.83<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>18.78</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 12.5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.44<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>17.39</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 13.5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.05<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>7.24</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 10.5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.44<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>13.65</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 12.5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.44<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>13.74</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 10.5<span class="Apple-tab-span" style="white-space: pre;"> </span>3.22<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>15.48</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 12.5<span class="Apple-tab-span" style="white-space: pre;"> </span>3.22<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>13.53</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.83<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>17.38</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.83<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>16.35</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.83<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>16.41</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 10.08<span class="Apple-tab-span" style="white-space: pre;"> </span>2.83<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>12.51</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 12.91<span class="Apple-tab-span" style="white-space: pre;"> </span>2.83<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>15.17</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.5<span class="Apple-tab-span" style="white-space: pre;"> </span>2.28<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>14.86</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.5<span class="Apple-tab-span" style="white-space: pre;"> </span>3.38<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>11.85</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.18<span class="Apple-tab-span" style="white-space: pre;"> </span>2.94<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>15.54</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.18<span class="Apple-tab-span" style="white-space: pre;"> </span>2.94<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>16.4</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.18<span class="Apple-tab-span" style="white-space: pre;"> </span>2.94<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>19.67</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.18<span class="Apple-tab-span" style="white-space: pre;"> </span>2.94<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>19.41</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.18<span class="Apple-tab-span" style="white-space: pre;"> </span>2.94<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>18.55</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 11.18<span class="Apple-tab-span" style="white-space: pre;"> </span>2.94<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>2<span class="Apple-tab-span" style="white-space: pre;"> </span>6<span class="Apple-tab-span" style="white-space: pre;"> </span>1.5<span class="Apple-tab-span" style="white-space: pre;"> </span>light<span class="Apple-tab-span" style="white-space: pre;"> </span>against<span class="Apple-tab-span" style="white-space: pre;"> </span>17.29</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ')</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">names(h3)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">h3 <- h3 %>% </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> mutate(.,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> FRL=factor(format(RotorLength,digits=2)),</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> FRW=factor(format(RotorWidth,digits=2)),</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> FBL=factor(format(BodyLength,digits=2)),</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> FFt=factor(format(FootLength,digits=2)),</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> FFd=factor(format(FoldLength,digits=2)),</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> FFW=factor(format(FoldWidth,digits=2)),</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> allvl=interaction(FRL,FRW,FBL,FFt,FFd,FFW,PaperWeight,DirectionOfFold,drop=TRUE)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> )</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">h4 <- xtabs(~allvl,data=h3) %>% </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> as.data.frame %>%</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> filter(.,Freq>1) %>%</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> merge(.,h3) %>%</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> select(.,RotorLength,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> RotorWidth,BodyLength,FootLength,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> FoldLength,FoldWidth,PaperWeight,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> DirectionOfFold,allvl,Time,Freq) %>%</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> print</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">lm(Time~allvl,data=h4) %>% anova</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">par(mfrow=c(1,2))</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">aov(Time~allvl,data=h3) %>% residualPlot(.,main='Untransformed')</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">aov(log10(Time)~allvl,data=h3) %>% residualPlot(.,main='Log10 Transform')</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">lm(Time ~ </span><span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> RotorLength + RotorWidth + BodyLength +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> FootLength + FoldLength + FoldWidth + </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperWeight + DirectionOfFold,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> data=h3) %>%</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> boxCox(.)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">dev.off()</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">lm(sqrt(Time)~allvl,data=h4) %>% anova</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">h3 <- mutate(h3,sTime=sqrt(Time))</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">lm(sTime ~ RotorLength + RotorWidth + BodyLength +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> FootLength + FoldLength + FoldWidth + </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperWeight + DirectionOfFold,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> data=h3) %>%</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> anova</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">s1 <- lm(sTime ~ </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> RotorLength + RotorWidth + BodyLength +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> FootLength + FoldLength + FoldWidth + </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperWeight + DirectionOfFold ,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> data=h3) %>%</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> step(.,scope=~(RotorLength + RotorWidth + BodyLength +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> FootLength + FoldLength + FoldWidth + </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperWeight + DirectionOfFold)*</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> (RotorLength + RotorWidth + BodyLength +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> FootLength + FoldLength + FoldWidth)+</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> I(RotorLength^2) + I(RotorWidth^2) + I(BodyLength^2) +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> I(FootLength^2) + I(FoldLength^2) + I(FoldWidth^2) +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperWeight*DirectionOfFold)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">anova(s1)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">s1 <- lm(sTime ~ </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> RotorLength + RotorWidth + BodyLength +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> FootLength + FoldLength + FoldWidth + </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperWeight + DirectionOfFold ,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> data=h3) %>%</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> step(.,scope=~(RotorLength + RotorWidth + BodyLength +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> FootLength + FoldLength + FoldWidth + </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperWeight + DirectionOfFold)*</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> (RotorLength + RotorWidth + BodyLength +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> FootLength + FoldLength + FoldWidth)+</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> I(RotorLength^2) + I(RotorWidth^2) + I(BodyLength^2) +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> I(FootLength^2) + I(FoldLength^2) + I(FoldWidth^2) +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperWeight*DirectionOfFold)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">anova(s1)</span></div>
<div>
<br /></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">s2 <- lm(sTime ~ </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> RotorLength + RotorWidth + BodyLength +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> FootLength + FoldLength + FoldWidth + </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperWeight + DirectionOfFold +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> I(RotorLength^2) + I(RotorWidth^2) + I(BodyLength^2) +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> I(FootLength^2) + I(FoldLength^2) + I(FoldWidth^2) ,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> data=h3) %>%</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> step(.,scope=~(RotorLength + RotorWidth + BodyLength +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> FootLength + FoldLength + FoldWidth + </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperWeight + DirectionOfFold)*</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> (RotorLength + RotorWidth + BodyLength +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> FootLength + FoldLength + FoldWidth)+</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> I(RotorLength^2) + I(RotorWidth^2) + I(BodyLength^2) +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> I(FootLength^2) + I(FoldLength^2) + I(FoldWidth^2) +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperWeight*DirectionOfFold)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">anova(s2)</span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace; font-size: x-small;">par(mfrow=c(2,2))</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">plot(s2)</span></div>
</div>
Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com0tag:blogger.com,1999:blog-3524617892004055830.post-68543971996722539672015-05-25T09:04:00.001+02:002015-05-25T09:04:33.793+02:00Paper Helicopter experiment, part II<a href="http://wiekvoet.blogspot.nl/2015/05/the-paper-helicopter-experiment.html">Last week</a> I created a JAGS model combining data from two paper helicopter datasets. This week, I will use the model to find the longest flying one.<br />
<h3>
Predicting</h3>
The JAGS/RJAGS system has no predict() function that I know of. What I therefore did is adapt the model so during estimation of the parameters the predictions were made. Using this adapted model, two prediction steps were made.<br />
In step one predictions from the whole design space were combined. To keep the number of predictions at least somewhat limited, only a few levels were used for the continuous variables. This step was used to select the best region within the whole space. Step two focuses on the best region and provides more detailed predictions.<br />
<h4>
Step 1 </h4>
After predicting from the whole experimental space, the mean and lower 5% limits of predicted times were plotted.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgD60oo_PD3ABb7CGFOHyW0ZePKQYdVSsceLvCxKEERabKCLn9J8CtspW-aJUYH6ONdmAfkS1hSNp7KniwGDG-RRat5yberjsMvdueYpiYJ_S2_7uH9SAFuWMWMlkCgGydGIC591QS7nTs/s1600/select1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgD60oo_PD3ABb7CGFOHyW0ZePKQYdVSsceLvCxKEERabKCLn9J8CtspW-aJUYH6ONdmAfkS1hSNp7KniwGDG-RRat5yberjsMvdueYpiYJ_S2_7uH9SAFuWMWMlkCgGydGIC591QS7nTs/s1600/select1.png" /></a></div>
<br />
It was decided to focus on the region top right. At least 2.7 for the lower 5% limit, and at least 3.7 for the mean time. The associated settings are summarized below.<br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperType WingLength BodyWidth BodyLength TapedBody</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> bond :72 Min. : 7.408 Min. :2.540 Min. : 3.810 No :114 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> regular1 :72 1st Qu.:12.065 1st Qu.:3.387 1st Qu.: 6.562 Yes: 54 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> construction:24 Median :12.065 Median :4.233 Median : 6.562 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Mean :11.871 Mean :4.163 Mean : 6.955 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 3rd Qu.:12.065 3rd Qu.:5.080 3rd Qu.: 9.313 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Max. :12.065 Max. :5.080 Max. :12.065 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> TapedWing PaperClip PaperClip2 Fold test Time </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> No : 68 No :84 No: 20 No:168 WH:168 Mode:logical </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Yes:100 Yes:84 WH:148 NA's:168 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> RH: 0 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Mean u95 l05 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Min. :3.203 Min. :3.574 Min. :2.701 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 1st Qu.:3.327 1st Qu.:3.808 1st Qu.:2.776 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Median :3.465 Median :3.975 Median :2.847 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Mean :3.486 Mean :4.108 Mean :2.873 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 3rd Qu.:3.636 3rd Qu.:4.388 3rd Qu.:2.952 </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Max. :3.877 Max. :5.044 Max. :3.165 </span><br />
<h4>
Phase 2</h4>
The second prediction only varied PaperType, BodyWidth, BodyLength and TapedWing. All others were set at their most occurring setting. As can be seen, there is a bit of a trade-off. It is possible to select the longest time, but that incurs some chance of a much lower time, because of model uncertainty. On the other hand, for a slightly lesser mean time, we can have the certainty.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMptpRvbgV30h1aUenwo-vy9imJdwLiRjJCpxkT1SqArdZkwzMaDneFMWgy1TEJAOWrpieEZXwXrAkobonoD5-Tx137AcHbTZac6SUvY20NVuvofwRkXHLg0jIl998wZgnyrJtPwHotUs/s1600/select2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMptpRvbgV30h1aUenwo-vy9imJdwLiRjJCpxkT1SqArdZkwzMaDneFMWgy1TEJAOWrpieEZXwXrAkobonoD5-Tx137AcHbTZac6SUvY20NVuvofwRkXHLg0jIl998wZgnyrJtPwHotUs/s1600/select2.png" /></a></div>
It is my choice to avoid the more uncertain region. Hence I will base my choice on the lower limit. Here we can see that there is a tradeoff. The bond paper needs a slightly longer BodyLength, while Regular paper can have a short BodyLength. BodyWidth should be maximized, but that is not a sensitive parameter.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioLY5BYCrF6yO0cn1uxHSCOWFdmPNu6oOH2Sbll8i1A3kcSCScNllTXXujjde_TqAP4QFXkHFB6Q7_Byv45bJU2-IhV9AwApDh3C9N3PHuuc5zsLzlZBE6dJv4jOp9EB4RIAFeV6nrdZc/s1600/l05.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioLY5BYCrF6yO0cn1uxHSCOWFdmPNu6oOH2Sbll8i1A3kcSCScNllTXXujjde_TqAP4QFXkHFB6Q7_Byv45bJU2-IhV9AwApDh3C9N3PHuuc5zsLzlZBE6dJv4jOp9EB4RIAFeV6nrdZc/s1600/l05.png" /></a></div>
For completeness, the mean prediction. This shows hardly any interaction. Hence the need for higher BodyLength in bond type paper is due to lack of experiments in this region. A few confirming final experiments seem to be in order. Within those, we could also include a low BodyWidth, since the models are unclear if this should be maximized or minimized.<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjz15Tj9jI2eGzFeFUvEP5f7N4vVSQFgypYNbdBQxU8Y9T68z02si9M0X6mgpXznEtjtDGB6N2LHoLpln3upSbv7OWf3dkJLMW31dme1vIHxT4NMAJlZYvuarfnK6ydOnvkQDlUr_mIMeY/s1600/mean.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjz15Tj9jI2eGzFeFUvEP5f7N4vVSQFgypYNbdBQxU8Y9T68z02si9M0X6mgpXznEtjtDGB6N2LHoLpln3upSbv7OWf3dkJLMW31dme1vIHxT4NMAJlZYvuarfnK6ydOnvkQDlUr_mIMeY/s1600/mean.png" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<h3>
Code used</h3>
Code for actual data are in previous post. This code starts after reading in those data.<br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">helis <- rbind(h1,h2)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">helis$test <- factor(helis$test)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">helis$PaperClip2 <- factor(ifelse(helis$PaperClip=='No','No',as.character(helis$test)),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> levels=c('No','WH','RH'))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">library(R2jags)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">library(ggplot2)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">helispred <- expand.grid(</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperType=c('bond','regular1','construction'),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> WingLength=seq(min(helis$WingLength),max(helis$WingLength),length.out=4),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BodyWidth=seq(min(helis$BodyWidth),max(helis$BodyWidth),length.out=4),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BodyLength=seq(min(helis$BodyLength),max(helis$BodyLength),length.out=4),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> TapedBody=c('No','Yes'),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> TapedWing=c('No','Yes'),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperClip=c('No','Yes'),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperClip2=c('No','WH','RH'),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Fold='No',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> test='WH',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Time=NA)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">helisboth <- rbind(helis,helispred)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">#################################</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">datain <- list(</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperType=c(2,1,3,1)[helisboth$PaperType],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> WingLength=helisboth$WingLength,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BodyLength=helisboth$BodyLength,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BodyWidth=helisboth$BodyWidth,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperClip=c(1,2,3)[helisboth$PaperClip2],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> TapedBody=c(0,1)[helisboth$TapedBody],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> TapedWing=c(0,1)[helisboth$TapedWing],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> test=c(1,2)[helisboth$test],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Time=helisboth$Time,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> n=nrow(helis),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> m=nrow(helispred))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">parameters <- c('Mul','WL','BL','PT','BW','PC','TB','TW','StDev',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 'WLBW','WLPC', 'WLWL',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 'BLPT' ,'BLPC', 'BLBL',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 'BWPC', 'BWBW', 'other','pred')</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">jmodel <- function() {</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> for (i in 1:(n+m)) { </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> premul[i] <- (test[i]==1)+Mul*(test[i]==2)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> mu[i] <- premul[i] * (</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> WL*WingLength[i]+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BL*BodyLength[i] + </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PT[PaperType[i]] +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BW*BodyWidth[i] +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PC[PaperClip[i]] +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> TB*TapedBody[i]+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> TW*TapedWing[i]+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> WLBW*WingLength[i]*BodyWidth[i]+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> WLPC[1]*WingLength[i]*(PaperClip[i]==2)+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> WLPC[2]*WingLength[i]*(PaperClip[i]==3)+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BLPT[1]*BodyLength[i]*(PaperType[i]==2)+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BLPT[2]*BodyLength[i]*(PaperType[i]==3)+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BLPC[1]*BodyLength[i]*(PaperClip[i]==2)+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BLPC[2]*BodyLength[i]*(PaperClip[i]==3)+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BWPC[1]*BodyWidth[i]*(PaperClip[i]==2)+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BWPC[2]*BodyWidth[i]*(PaperClip[i]==3) +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> WLWL*WingLength[i]*WingLength[i]+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BLBL*BodyLength[i]*BodyLength[i]+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BWBW*BodyWidth[i]*BodyWidth[i] </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> )</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> for (i in 1:n) {</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Time[i] ~ dnorm(mu[i],tau[test[i]])</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"># residual[i] <- Time[i]-mu[i]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> for (i in 1:2) {</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> tau[i] <- pow(StDev[i],-2)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> StDev[i] ~dunif(0,3)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> WLPC[i] ~dnorm(0,1)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BLPT[i] ~dnorm(0,1)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BLPC[i] ~dnorm(0,1) </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BWPC[i] ~dnorm(0,1) </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> for (i in 1:3) {</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PT[i] ~ dnorm(PTM,tauPT)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> tauPT <- pow(sdPT,-2)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> sdPT ~dunif(0,3)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PTM ~dnorm(0,0.01)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> WL ~dnorm(0,0.01) </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BL ~dnorm(0,0.01)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BW ~dnorm(0,0.01)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PC[1] <- 0</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PC[2]~dnorm(0,0.01)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PC[3]~dnorm(0,0.01) </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> TB ~dnorm(0,0.01)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> TW ~dnorm(0,0.01)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> WLBW~dnorm(0,1)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> WLTW~dnorm(0,1)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> WLWL~dnorm(0,1)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BLBL~dnorm(0,1) </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BWBW~dnorm(0,1)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> other~dnorm(0,1)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Mul ~ dnorm(1,1) %_% I(0,2)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> for (i in 1:m) {</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> pred[i] <- mu[i+n]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">}</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">jj <- jags(model.file=jmodel,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> data=datain,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> parameters=parameters,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> progress.bar='gui',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> n.chain=5,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> n.iter=4000,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> inits=function() list(Mul=1.3,WL=0.15,BL=-.08,PT=rep(1,3),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PC=c(NA,0,0),TB=0,TW=0))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">#jj</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">predmat <- jj$BUGSoutput$sims.matrix[,grep('pred',dimnames(jj$BUGSoutput$sims.matrix)[[2]],value=TRUE)]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">helispred$Mean <- colMeans(predmat)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">helispred$u95 <- apply(predmat,2,function(x) quantile(x,.95))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">helispred$l05 <- apply(predmat,2,function(x) quantile(x,.05))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">png('select1.png')</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">qplot(y=Mean,x=l05,data=helispred)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">dev.off()</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">select <- helispred[helispred$Mean>3.2 & helispred$l05>2.7,]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">summary(select)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">########</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">helispred <- expand.grid(</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperType=c('bond','regular1'),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> WingLength=12.065,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BodyWidth=seq(2.5,5,length.out=11),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BodyLength=seq(3.8,12,length.out=11),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> TapedBody=c('No'),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> TapedWing=c('No','Yes'),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperClip='No',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperClip2=c('WH'),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Fold='No',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> test='WH',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Time=NA)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">helisboth <- rbind(helis,helispred)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">datain <- list(</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperType=c(2,1,3,1)[helisboth$PaperType],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> WingLength=helisboth$WingLength,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BodyLength=helisboth$BodyLength,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> BodyWidth=helisboth$BodyWidth,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PaperClip=c(1,2,3)[helisboth$PaperClip2],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> TapedBody=c(0,1)[helisboth$TapedBody],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> TapedWing=c(0,1)[helisboth$TapedWing],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> test=c(1,2)[helisboth$test],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> Time=helisboth$Time,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> n=nrow(helis),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> m=nrow(helispred))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">jj <- jags(model.file=jmodel,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> data=datain,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> parameters=parameters,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> progress.bar='gui',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> n.chain=5,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> n.iter=4000,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> inits=function() list(Mul=1.3,WL=0.15,BL=-.08,PT=rep(1,3),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> PC=c(NA,0,0),TB=0,TW=0))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">#jj</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">predmat <- jj$BUGSoutput$sims.matrix[,grep('pred',dimnames(jj$BUGSoutput$sims.matrix)[[2]],value=TRUE)]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">helispred$Mean <- colMeans(predmat)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">helispred$u95 <- apply(predmat,2,function(x) quantile(x,.95))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">helispred$l05 <- apply(predmat,2,function(x) quantile(x,.05))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">#</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">png('select2.png')</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">qplot(y=Mean,x=l05,data=helispred)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">dev.off()</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">png('l05.png')</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">v <- ggplot(helispred, aes(BodyLength, BodyWidth, z = l05))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">v + stat_contour(aes(colour= ..level.. )) +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> scale_colour_gradient(name='Time' )+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> facet_grid(PaperType ~ TapedWing )+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ggtitle('Lower 95% predicion') </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">dev.off()</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">png('mean.png')</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">v <- ggplot(helispred, aes(BodyLength, BodyWidth, z = Mean))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">v + stat_contour(aes(colour= ..level.. )) +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> scale_colour_gradient(name='Time' )+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> facet_grid(PaperType ~ TapedWing ) +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> ggtitle('Mean prediction')</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">dev.off()</span><br />
<div>
<br /></div>
Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com0tag:blogger.com,1999:blog-3524617892004055830.post-11880602671634580042015-05-17T09:01:00.000+02:002015-05-17T09:01:49.865+02:00The paper helicopter experimentThe paper helicopter is one of the devices to explain about design of experiments. The aim is to create the longest flying paper helicopter by means of experimental design.<br />
Paper helicopters are a nice example, because they are cheap to make, easy to test landing time and sufficient variables to make it non obvious.<br />
Rather than make and measure my own helicopters, I decided to use data from the internet. In this post I use data from <a href="http://williamghunter.net/george-box-articles/teaching-engineers-experimental-design-with-a-paper-helicopter">williamghunter.net</a> and <a href="http://www.rose-hulman.edu/~stienstr/me421/DOE2001.htm">http://www.rose-hulman.edu</a>. There is more data on the internet, but these two are fairly similar. Both use a fractional factorial design of 16 runs and they have the same variables. However, a quick check showed that these were different results and, very important, the aliasing structure was different.<br />
<h3>
Data</h3>
Data were taken from the above given locations. Rather than using the coded units, the data was converted to sizes in cm. Time to land was converted to seconds.<br />
Since these were separate experiments, it has to be assumed that they used different paper, different heights to drop helicopters from. It even seems, that different ways were found to attach a paperclip to the helicopters.<br />
<h3>
Simple analysis</h3>
To confirm the data an analysis on coded units was performed. These results were same as given by the websites, results not shown here. My own analysis starts with real world units and is by regression. Disadvantage or real world units is that one cannot compare the size of the effects, however, given the designs used, the t-statistic can be used for this purpose.<br />
The first data set shows WingLength and BodyLength to have the largest effects. <br />
<span style="font-family: Courier New, Courier, monospace;">Coefficients:</span><br />
<span style="font-family: Courier New, Courier, monospace;"> Estimate Std. Error t value Pr(>|t|) </span><br />
<span style="font-family: Courier New, Courier, monospace;">(Intercept) 1.92798 0.54903 3.512 0.009839 ** </span><br />
<span style="font-family: Courier New, Courier, monospace;">PaperTyperegular1 -0.12500 0.13726 -0.911 0.392730 </span><br />
<span style="font-family: Courier New, Courier, monospace;">WingLength 0.17435 0.03088 5.646 0.000777 ***</span><br />
<span style="font-family: Courier New, Courier, monospace;">BodyLength -0.08999 0.03088 -2.914 0.022524 * </span><br />
<span style="font-family: Courier New, Courier, monospace;">BodyWidth 0.01312 0.07205 0.182 0.860634 </span><br />
<span style="font-family: Courier New, Courier, monospace;">PaperClipYes 0.05000 0.13726 0.364 0.726403 </span><br />
<span style="font-family: Courier New, Courier, monospace;">FoldYes -0.10000 0.13726 -0.729 0.489918 </span><br />
<span style="font-family: Courier New, Courier, monospace;">TapedBodyYes -0.15000 0.13726 -1.093 0.310638 </span><br />
<span style="font-family: Courier New, Courier, monospace;">TapedWingYes 0.17500 0.13726 1.275 0.242999</span> <br />
<div>
The second data set shows WingLength, PaperClip and PaperType to have the largest effects. </div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace;">Coefficients:</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"> Estimate Std. Error t value Pr(>|t|) </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">(Intercept) 0.73200 0.21737 3.368 0.01196 * </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">PaperTyperegular2 0.28200 0.06211 4.541 0.00267 ** </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">WingLength 0.16654 0.01223 13.622 2.7e-06 ***</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">BodyLength -0.02126 0.01630 -1.304 0.23340 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">BodyWidth -0.03307 0.04890 -0.676 0.52058 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">PaperClipYes -0.35700 0.06211 -5.748 0.00070 ***</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">FoldYes 0.04500 0.06211 0.725 0.49222 </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">TapedBodyYes -0.14700 0.06211 -2.367 0.04983 * </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">TapedWingYes 0.06600 0.06211 1.063 0.32320</span></div>
</div>
It seems then, that the two experiments show somewhat different effects. WingLength is certainly important. BodyLength maybe. Regarding paper, both have regular paper, but one has bond paper and the other construction paper. It is not difficult to imagine these are quite different.<br />
<h3>
Combined analysis</h3>
<div>
The combination analysis is programmed in Jags. To capture a different falling distance, a <i>Mul</i> parameter is used, which defines a multiplicative effect between the two experiments. In addition, both sets have their own measurement error. There are four types of paper, two from each data set, and three levels of paperclip, no paperclip assumed same for both experiments. In addition to the parameters given earlier, residuals are estimated, in order to have some idea about the quality of fit.</div>
<div>
The model then, looks like this</div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">jmodel <- function() {</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> for (i in 1:n) { </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> premul[i] <- (test[i]==1)+Mul*(test[i]==2)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mu[i] <- premul[i] * (</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> WL*WingLength[i]+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BL*BodyLength[i] + </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PT[PaperType[i]] +</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BW*BodyWidth[i] +</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PC[PaperClip[i]] +</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> FO*Fold[i]+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> TB*TapedBody[i]+</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> TW*TapedWing[i]</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> )</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Time[i] ~ dnorm(mu[i],tau[test[i]])</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> residual[i] <- Time[i]-mu[i]</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> for (i in 1:2) {</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> tau[i] <- pow(StDev[i],-2)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> StDev[i] ~dunif(0,3)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> for (i in 1:4) {</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PT[i] ~ dnorm(PTM,tauPT)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> tauPT <- pow(sdPT,-2)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> sdPT ~dunif(0,3)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PTM ~dnorm(0,0.01)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> WL ~dnorm(0,0.01)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BL ~dnorm(0,1000)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BW ~dnorm(0,1000)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PC[1] <- 0</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PC[2]~dnorm(0,0.01)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PC[3]~dnorm(0,0.01)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> </span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> FO ~dnorm(0,1000)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> TB ~dnorm(0,0.01)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> TW ~dnorm(0,0.01)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Mul ~ dnorm(1,1) %_% I(0,2)</span><br />
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">}</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Inference for Bugs model at "C:/Users/Kees/AppData/Local/Temp/Rtmp4o0rhh/model16f468e854ce.txt", fit using jags,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 4 chains, each with 3000 iterations (first 1500 discarded)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> n.sims = 6000 iterations saved</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">BL -0.029 0.014 -0.056 -0.038 -0.028 -0.019 -0.001 1.001 4400</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">BW -0.005 0.025 -0.052 -0.023 -0.006 0.011 0.044 1.002 1900</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">FO 0.005 0.028 -0.050 -0.014 0.005 0.023 0.058 1.001 6000</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Mul 1.166 0.149 0.819 1.087 1.176 1.254 1.433 1.028 130</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">PC[1] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 1</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">PC[2] 0.066 0.141 -0.208 -0.021 0.061 0.147 0.360 1.002 2300</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">PC[3] -0.362 0.070 -0.501 -0.404 -0.362 -0.319 -0.225 1.001 6000</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">PT[1] 1.111 0.397 0.516 0.864 1.059 1.286 2.074 1.021 150</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">PT[2] 1.019 0.379 0.437 0.783 0.974 1.186 1.925 1.019 160</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">PT[3] 0.728 0.170 0.397 0.615 0.728 0.840 1.068 1.002 2900</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">PT[4] 0.991 0.168 0.655 0.885 0.993 1.103 1.309 1.002 1600</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">StDev[1] 0.133 0.039 0.082 0.108 0.127 0.150 0.225 1.005 540</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">StDev[2] 0.304 0.075 0.192 0.251 0.292 0.343 0.488 1.003 1300</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">TB -0.144 0.059 -0.264 -0.181 -0.144 -0.108 -0.025 1.001 4100</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">TW 0.084 0.059 -0.033 0.045 0.084 0.122 0.203 1.001 4400</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">WL 0.164 0.013 0.138 0.156 0.164 0.172 0.188 1.004 810</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[1] 0.174 0.146 -0.111 0.079 0.173 0.268 0.464 1.002 1700</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[2] 0.466 0.158 0.162 0.361 0.463 0.567 0.780 1.004 730</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[3] 0.150 0.170 -0.173 0.041 0.147 0.253 0.499 1.003 1100</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[4] -0.416 0.162 -0.733 -0.523 -0.418 -0.308 -0.099 1.001 3800</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[5] -0.087 0.168 -0.419 -0.198 -0.084 0.026 0.238 1.005 560</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[6] -0.085 0.156 -0.397 -0.184 -0.084 0.016 0.221 1.003 1200</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[7] -0.056 0.159 -0.371 -0.156 -0.055 0.047 0.251 1.003 910</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[8] -0.203 0.157 -0.527 -0.304 -0.198 -0.100 0.095 1.001 6000</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[9] 0.150 0.150 -0.139 0.052 0.148 0.247 0.451 1.001 6000</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[10] 0.103 0.156 -0.200 0.003 0.101 0.206 0.415 1.004 720</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[11] 0.133 0.160 -0.176 0.027 0.131 0.237 0.454 1.002 2100</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[12] 0.335 0.177 -0.006 0.218 0.332 0.451 0.689 1.004 830</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[13] -0.436 0.156 -0.747 -0.536 -0.436 -0.337 -0.128 1.002 2100</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[14] 0.098 0.162 -0.227 -0.007 0.099 0.205 0.410 1.004 670</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[15] -0.018 0.160 -0.340 -0.118 -0.015 0.084 0.292 1.003 920</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[16] -0.127 0.155 -0.441 -0.224 -0.125 -0.027 0.173 1.001 3600</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[17] 0.037 0.088 -0.135 -0.018 0.037 0.093 0.215 1.002 1600</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[18] -0.088 0.090 -0.274 -0.141 -0.086 -0.031 0.081 1.002 2500</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[19] -0.074 0.088 -0.248 -0.129 -0.072 -0.018 0.100 1.002 1900</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[20] -0.079 0.088 -0.259 -0.133 -0.076 -0.023 0.091 1.001 3800</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[21] -0.037 0.087 -0.201 -0.093 -0.039 0.016 0.141 1.002 3000</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[22] 0.051 0.087 -0.128 -0.001 0.053 0.107 0.221 1.001 4800</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[23] -0.008 0.084 -0.177 -0.061 -0.009 0.046 0.159 1.001 5500</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[24] 0.129 0.086 -0.047 0.076 0.130 0.185 0.294 1.002 1900</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[25] 0.196 0.087 0.030 0.141 0.196 0.249 0.370 1.003 1400</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[26] -0.027 0.084 -0.195 -0.081 -0.026 0.029 0.138 1.001 6000</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[27] 0.070 0.088 -0.101 0.016 0.070 0.124 0.247 1.001 3700</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[28] -0.166 0.089 -0.355 -0.221 -0.163 -0.108 0.004 1.001 3700</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[29] -0.052 0.087 -0.223 -0.107 -0.053 0.002 0.124 1.001 4300</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[30] 0.039 0.089 -0.139 -0.016 0.038 0.095 0.218 1.002 2500</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[31] -0.079 0.089 -0.245 -0.135 -0.080 -0.026 0.103 1.002 2300</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[32] 0.048 0.085 -0.122 -0.006 0.049 0.102 0.214 1.002 2300</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">deviance -15.555 7.026 -26.350 -20.655 -16.487 -11.540 0.877 1.004 750</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">For each parameter, n.eff is a crude measure of effective sample size,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">and Rhat is the potential scale reduction factor (at convergence, Rhat=1).</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">DIC info (using the rule, pD = var(deviance)/2)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">pD = 24.6 and DIC = 9.0</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">DIC is an estimate of expected predictive error (lower deviance is better).</span><br />
Striking in the results is big residuals, for instance for observations 2, 4 and 13. The residuals for observations 4 and 13 are also big when a similar classical model is used, hence this is a clear indication of some kind of interaction. </div>
<h4>
Model with interactions</h4>
<div>
Adding the most obvious interactions, such as WingLength*TapedBody did not really provide a suitable answer. Indeed, large residuals at observations 4 and 13, which are at opposite sides in the fractional factorial design, can not be resolved with one interaction.</div>
<div>
Hence I proceeded with adding all two way interactions. Since this was expected to result in a model without clear estimates, all interactions had a strong prior; mean was 0 and precision (<i>tau</i>) was 1000. This model was subsequently reduced by giving the interactions which clearly differed from 0 a lesser precision while interactions which where clearly zero were removed. During this process the parameter <i>Fold </i>was removed from the parameter set. Finally, quadratic effects were added. There is one additional parameter, <i>other, </i>it has no function in the model, but tells what the properties of the prior for the interactions are. Parameters with a standard deviation less than <i>other</i> have information added from the data.</div>
<div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">jmodel <- function() {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> for (i in 1:n) { </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> premul[i] <- (test[i]==1)+Mul*(test[i]==2)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> mu[i] <- premul[i] * (</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> WL*WingLength[i]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BL*BodyLength[i] + </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PT[PaperType[i]] +</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BW*BodyWidth[i] +</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PC[PaperClip[i]] +</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> TB*TapedBody[i]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> TW*TapedWing[i]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> WLBW*WingLength[i]*BodyWidth[i]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> WLPC[1]*WingLength[i]*(PaperClip[i]==2)+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> WLPC[2]*WingLength[i]*(PaperClip[i]==3)+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BLPT[1]*BodyLength[i]*(PaperType[i]==2)+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BLPT[2]*BodyLength[i]*(PaperType[i]==3)+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BLPC[1]*BodyLength[i]*(PaperClip[i]==2)+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BLPC[2]*BodyLength[i]*(PaperClip[i]==3)+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BWPC[1]*BodyWidth[i]*(PaperClip[i]==2)+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BWPC[2]*BodyWidth[i]*(PaperClip[i]==3) +</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> WLWL*WingLength[i]*WingLength[i]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BLBL*BodyLength[i]*BodyLength[i]+</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BWBW*BodyWidth[i]*BodyWidth[i]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> )</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Time[i] ~ dnorm(mu[i],tau[test[i]])</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> residual[i] <- Time[i]-mu[i]</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> for (i in 1:2) {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> tau[i] <- pow(StDev[i],-2)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> StDev[i] ~dunif(0,3)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> WLPC[i] ~dnorm(0,1)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BLPT[i] ~dnorm(0,1)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BLPC[i] ~dnorm(0,1) </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BWPC[i] ~dnorm(0,1) </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> for (i in 1:3) {</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PT[i] ~ dnorm(PTM,tauPT)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> }</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> tauPT <- pow(sdPT,-2)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> sdPT ~dunif(0,3)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PTM ~dnorm(0,0.01)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> WL ~dnorm(0,0.01) </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BL ~dnorm(0,0.01)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BW ~dnorm(0,0.01)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PC[1] <- 0</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PC[2]~dnorm(0,0.01)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> PC[3]~dnorm(0,0.01) </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> TB ~dnorm(0,0.01)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> TW ~dnorm(0,0.01)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> WLBW~dnorm(0,1)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> WLTW~dnorm(0,1)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> WLWL~dnorm(0,1)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BLBL~dnorm(0,1) </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> BWBW~dnorm(0,1)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> </span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> other~dnorm(0,1)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;"> Mul ~ dnorm(1,1) %_% I(0,2)</span></div>
<div>
<span style="background-color: #f3f3f3; font-family: Courier New, Courier, monospace; font-size: x-small;">}</span></div>
</div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Inference for Bugs model at "C:/Users/Kees/AppData/Local/Temp/Rtmp4o0rhh/model16f472b05364.txt", fit using jags,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> 5 chains, each with 4000 iterations (first 2000 discarded), n.thin = 2</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> n.sims = 5000 iterations saved</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"> mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">BL 0.021 0.197 -0.367 -0.080 0.027 0.121 0.396 1.021 590</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">BLBL -0.001 0.015 -0.027 -0.009 -0.003 0.006 0.031 1.015 1200</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">BLPC[1] -0.099 0.105 -0.295 -0.125 -0.086 -0.053 0.021 1.100 560</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">BLPC[2] -0.110 0.111 -0.334 -0.134 -0.094 -0.060 0.018 1.130 250</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">BLPT[1] -0.038 0.190 -0.503 -0.124 0.001 0.069 0.286 1.005 600</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">BLPT[2] 0.058 0.038 -0.031 0.045 0.063 0.078 0.113 1.063 400</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">BW -0.430 0.558 -1.587 -0.657 -0.389 -0.143 0.463 1.045 960</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">BWBW 0.009 0.094 -0.160 -0.031 0.009 0.052 0.176 1.053 1300</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">BWPC[1] -0.224 0.173 -0.615 -0.295 -0.209 -0.133 0.064 1.011 5000</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">BWPC[2] -0.093 0.101 -0.285 -0.137 -0.091 -0.044 0.085 1.040 5000</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">Mul 1.053 0.145 0.680 0.997 1.069 1.139 1.281 1.098 290</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">PC[1] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 1</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">PC[2] 1.459 2.367 -3.571 0.333 1.565 2.617 6.138 1.019 420</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">PC[3] 0.401 0.732 -0.619 0.032 0.309 0.629 1.954 1.074 320</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">PT[1] 1.353 1.437 -1.364 0.556 1.318 2.088 4.128 1.032 480</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">PT[2] 1.906 1.767 -1.087 0.828 1.726 2.814 5.879 1.013 1300</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">PT[3] 0.731 1.419 -1.864 -0.058 0.682 1.444 3.535 1.032 520</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">StDev[1] 0.108 0.082 0.045 0.067 0.088 0.120 0.302 1.023 450</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">StDev[2] 0.267 0.156 0.122 0.177 0.229 0.301 0.706 1.021 390</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">TB -0.146 0.051 -0.247 -0.172 -0.145 -0.119 -0.048 1.011 5000</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">TW 0.086 0.054 -0.007 0.055 0.082 0.112 0.204 1.010 1700</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">WL 0.209 0.380 -0.496 0.007 0.188 0.394 1.035 1.014 670</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">WLBW 0.051 0.062 -0.013 0.026 0.043 0.062 0.167 1.159 220</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">WLPC[1] 0.057 0.210 -0.304 -0.063 0.024 0.152 0.556 1.004 1600</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">WLPC[2] 0.020 0.027 -0.031 0.010 0.021 0.033 0.066 1.044 2400</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">WLWL -0.014 0.026 -0.072 -0.026 -0.011 0.001 0.032 1.014 5000</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">other 0.002 1.007 -1.973 -0.680 0.000 0.684 1.952 1.002 2200</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[1] 0.227 0.272 -0.178 0.066 0.190 0.334 0.935 1.041 390</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[2] 0.035 0.231 -0.447 -0.084 0.037 0.160 0.503 1.007 2500</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[3] 0.026 0.269 -0.404 -0.118 -0.002 0.131 0.587 1.039 430</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[4] -0.123 0.279 -0.542 -0.276 -0.157 -0.018 0.530 1.053 370</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[5] -0.046 0.241 -0.535 -0.168 -0.043 0.083 0.422 1.008 5000</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[6] -0.094 0.241 -0.568 -0.221 -0.095 0.035 0.390 1.005 2600</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[7] 0.284 0.268 -0.139 0.140 0.263 0.392 0.861 1.046 430</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[8] 0.018 0.240 -0.460 -0.107 0.022 0.144 0.494 1.006 5000</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[9] 0.121 0.299 -0.310 -0.042 0.079 0.223 0.827 1.054 300</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[10] 0.038 0.237 -0.428 -0.086 0.034 0.155 0.518 1.006 3100</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[11] -0.077 0.251 -0.562 -0.204 -0.073 0.046 0.401 1.020 5000</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[12] 0.153 0.262 -0.286 0.013 0.133 0.267 0.711 1.035 610</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[13] -0.024 0.244 -0.466 -0.160 -0.035 0.095 0.493 1.008 5000</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[14] -0.019 0.244 -0.537 -0.140 -0.013 0.111 0.456 1.006 5000</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[15] -0.159 0.250 -0.663 -0.281 -0.156 -0.038 0.302 1.026 860</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[16] 0.034 0.273 -0.531 -0.076 0.056 0.178 0.486 1.037 410</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[17] 0.001 0.115 -0.185 -0.057 -0.008 0.047 0.232 1.047 890</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[18] 0.016 0.105 -0.187 -0.038 0.017 0.067 0.211 1.014 3300</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[19] -0.068 0.108 -0.262 -0.118 -0.068 -0.017 0.127 1.036 5000</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[20] 0.067 0.114 -0.138 0.017 0.067 0.115 0.270 1.046 4500</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[21] 0.003 0.117 -0.223 -0.046 0.007 0.057 0.203 1.044 3200</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[22] -0.004 0.113 -0.202 -0.059 -0.007 0.044 0.211 1.035 2000</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[23] -0.039 0.134 -0.313 -0.081 -0.023 0.027 0.145 1.097 300</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[24] 0.009 0.114 -0.197 -0.042 0.009 0.061 0.223 1.039 5000</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[25] 0.045 0.110 -0.170 -0.005 0.046 0.095 0.248 1.028 5000</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[26] -0.044 0.108 -0.252 -0.096 -0.043 0.007 0.165 1.024 4000</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[27] 0.046 0.112 -0.164 -0.005 0.046 0.100 0.264 1.022 3600</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[28] -0.062 0.115 -0.296 -0.104 -0.053 -0.004 0.112 1.047 1400</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[29] -0.025 0.143 -0.321 -0.064 -0.006 0.042 0.153 1.110 230</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[30] -0.016 0.118 -0.228 -0.066 -0.015 0.037 0.196 1.042 1400</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[31] -0.025 0.115 -0.239 -0.072 -0.021 0.028 0.174 1.047 1300</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">residual[32] 0.020 0.111 -0.176 -0.033 0.017 0.066 0.233 1.041 2600</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">deviance -32.864 19.923 -62.354 -46.843 -35.763 -22.807 16.481 1.014 420</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">For each parameter, n.eff is a crude measure of effective sample size,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">and Rhat is the potential scale reduction factor (at convergence, Rhat=1).</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">DIC info (using the rule, pD = var(deviance)/2)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">pD = 196.7 and DIC = 163.8</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">DIC is an estimate of expected predictive error (lower deviance is better).</span></div>
</div>
<div>
<h4>
Model discussion</h4>
This model does not have the big residuals. In addition it seems that some parameters, e.g. <i>WLWL</i> and <i>WLBW</i> have small mean values and small standard deviations. To me this suggests that they are indeed estimated and found to be close to 0. After all, if the data contained no information, their standard deviation would be similar to the prior, which is much larger, as seen from the <i>other </i>parameter.<br />
The quadratic effects were added to allow detection of a maximum. There is not much presence of these effects, except perhaps in WingLength (parameter <i>WLWL</i>).<br />
For descriptive purposes, I will leave these parameters in. However, for predictive purposes, it may be better to remove them or shrink them closer to zero. <br />
Given the complex way in which the parameters are chosen, it is very well possible that a different model would be better. In hindsight, I might have used the BMA function to do a more thorough selection. Thus the model needs to be validated some more. Since I found two additional data sets on line, these might be used for this purpose.<br />
<h3>
Code</h3>
</div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">h1 <- read.table(sep='\t',header=TRUE,text='</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PaperType<span class="Apple-tab-span" style="white-space: pre;"> </span>WingLength<span class="Apple-tab-span" style="white-space: pre;"> </span>BodyLength<span class="Apple-tab-span" style="white-space: pre;"> </span>BodyWidth<span class="Apple-tab-span" style="white-space: pre;"> </span>PaperClip<span class="Apple-tab-span" style="white-space: pre;"> </span>Fold<span class="Apple-tab-span" style="white-space: pre;"> </span>TapedBody<span class="Apple-tab-span" style="white-space: pre;"> </span>TapedWing<span class="Apple-tab-span" style="white-space: pre;"> </span>Time</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">regular1<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>3.175<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>2.5</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">bond<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>3.175<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>2.9</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">regular1<span class="Apple-tab-span" style="white-space: pre;"> </span>12.065<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>3.175<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>3.5</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">bond<span class="Apple-tab-span" style="white-space: pre;"> </span>12.065<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>3.175<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>2.7</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">regular1<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>12.065<span class="Apple-tab-span" style="white-space: pre;"> </span>3.175<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>2</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">bond<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>12.065<span class="Apple-tab-span" style="white-space: pre;"> </span>3.175<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>2.3</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">regular1<span class="Apple-tab-span" style="white-space: pre;"> </span>12.065<span class="Apple-tab-span" style="white-space: pre;"> </span>12.065<span class="Apple-tab-span" style="white-space: pre;"> </span>3.175<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>2.9</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">bond<span class="Apple-tab-span" style="white-space: pre;"> </span>12.065<span class="Apple-tab-span" style="white-space: pre;"> </span>12.065<span class="Apple-tab-span" style="white-space: pre;"> </span>3.175<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>3</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">regular1<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>5.08<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>2.4</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">bond<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>5.08<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>2.6</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">regular1<span class="Apple-tab-span" style="white-space: pre;"> </span>12.065<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>5.08<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>3.2</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">bond<span class="Apple-tab-span" style="white-space: pre;"> </span>12.065<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>5.08<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>3.7</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">regular1<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>12.065<span class="Apple-tab-span" style="white-space: pre;"> </span>5.08<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>1.9</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">bond<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>12.065<span class="Apple-tab-span" style="white-space: pre;"> </span>5.08<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>2.2</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">regular1<span class="Apple-tab-span" style="white-space: pre;"> </span>12.065<span class="Apple-tab-span" style="white-space: pre;"> </span>12.065<span class="Apple-tab-span" style="white-space: pre;"> </span>5.08<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>3</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">bond<span class="Apple-tab-span" style="white-space: pre;"> </span>12.065<span class="Apple-tab-span" style="white-space: pre;"> </span>12.065<span class="Apple-tab-span" style="white-space: pre;"> </span>5.08<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>3</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">')</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">h2 <- read.table(sep='\t',header=TRUE,text='</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PaperType<span class="Apple-tab-span" style="white-space: pre;"> </span>BodyWidth<span class="Apple-tab-span" style="white-space: pre;"> </span>BodyLength<span class="Apple-tab-span" style="white-space: pre;"> </span>WingLength<span class="Apple-tab-span" style="white-space: pre;"> </span>PaperClip<span class="Apple-tab-span" style="white-space: pre;"> </span>Fold<span class="Apple-tab-span" style="white-space: pre;"> </span>TapedBody<span class="Apple-tab-span" style="white-space: pre;"> </span>TapedWing <span class="Apple-tab-span" style="white-space: pre;"> </span>Time</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">regular2<span class="Apple-tab-span" style="white-space: pre;"> </span>2.54<span class="Apple-tab-span" style="white-space: pre;"> </span>3.81<span class="Apple-tab-span" style="white-space: pre;"> </span>5.08<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>1.74</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">construction<span class="Apple-tab-span" style="white-space: pre;"> </span>2.54<span class="Apple-tab-span" style="white-space: pre;"> </span>3.81<span class="Apple-tab-span" style="white-space: pre;"> </span>5.08<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>1.296</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">regular2<span class="Apple-tab-span" style="white-space: pre;"> </span>3.81<span class="Apple-tab-span" style="white-space: pre;"> </span>3.81<span class="Apple-tab-span" style="white-space: pre;"> </span>5.08<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>1.2</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">construction<span class="Apple-tab-span" style="white-space: pre;"> </span>3.81<span class="Apple-tab-span" style="white-space: pre;"> </span>3.81<span class="Apple-tab-span" style="white-space: pre;"> </span>5.08<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>0.996</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">regular2<span class="Apple-tab-span" style="white-space: pre;"> </span>2.54<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>5.08<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>1.056</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">construction<span class="Apple-tab-span" style="white-space: pre;"> </span>2.54<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>5.08<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>1.104</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">regular2<span class="Apple-tab-span" style="white-space: pre;"> </span>3.81<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>5.08<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>1.668</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">construction<span class="Apple-tab-span" style="white-space: pre;"> </span>3.81<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>5.08<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>1.308</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">regular2<span class="Apple-tab-span" style="white-space: pre;"> </span>2.54<span class="Apple-tab-span" style="white-space: pre;"> </span>3.81<span class="Apple-tab-span" style="white-space: pre;"> </span>10.16<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>2.46</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">construction<span class="Apple-tab-span" style="white-space: pre;"> </span>2.54<span class="Apple-tab-span" style="white-space: pre;"> </span>3.81<span class="Apple-tab-span" style="white-space: pre;"> </span>10.16<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>1.74</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">regular2<span class="Apple-tab-span" style="white-space: pre;"> </span>3.81<span class="Apple-tab-span" style="white-space: pre;"> </span>3.81<span class="Apple-tab-span" style="white-space: pre;"> </span>10.16<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>2.46</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">construction<span class="Apple-tab-span" style="white-space: pre;"> </span>3.81<span class="Apple-tab-span" style="white-space: pre;"> </span>3.81<span class="Apple-tab-span" style="white-space: pre;"> </span>10.16<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>2.184</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">regular2<span class="Apple-tab-span" style="white-space: pre;"> </span>2.54<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>10.16<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>2.316</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">construction<span class="Apple-tab-span" style="white-space: pre;"> </span>2.54<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>10.16<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>2.208</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">regular2<span class="Apple-tab-span" style="white-space: pre;"> </span>3.81<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>10.16<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>No<span class="Apple-tab-span" style="white-space: pre;"> </span>1.98</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">construction<span class="Apple-tab-span" style="white-space: pre;"> </span>3.81<span class="Apple-tab-span" style="white-space: pre;"> </span>7.62<span class="Apple-tab-span" style="white-space: pre;"> </span>10.16<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>Yes<span class="Apple-tab-span" style="white-space: pre;"> </span>1.788</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">')</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">l1 <- lm(Time ~ PaperType<span class="Apple-tab-span" style="white-space: pre;"> </span>+ WingLength<span class="Apple-tab-span" style="white-space: pre;"> </span>+ BodyLength +<span class="Apple-tab-span" style="white-space: pre;"> </span>BodyWidth<span class="Apple-tab-span" style="white-space: pre;"> </span>+ </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PaperClip<span class="Apple-tab-span" style="white-space: pre;"> </span>+ Fold<span class="Apple-tab-span" style="white-space: pre;"> </span>+ TapedBody +<span class="Apple-tab-span" style="white-space: pre;"> </span>TapedWing, data=h1)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">summary(l1)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">residuals(l1)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">l2 <- lm(Time ~ PaperType<span class="Apple-tab-span" style="white-space: pre;"> </span>+ WingLength<span class="Apple-tab-span" style="white-space: pre;"> </span>+ BodyLength +<span class="Apple-tab-span" style="white-space: pre;"> </span>BodyWidth<span class="Apple-tab-span" style="white-space: pre;"> </span>+ </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PaperClip<span class="Apple-tab-span" style="white-space: pre;"> </span>+ Fold<span class="Apple-tab-span" style="white-space: pre;"> </span>+ TapedBody +<span class="Apple-tab-span" style="white-space: pre;"> </span>TapedWing, data=h2)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">summary(l2)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace; font-size: xx-small;">h1$test <- 'WH'</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"># WingLength, BodyLength</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">h2$test <- 'RH'</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">#WhingLegnth, PaperClip, PaperType</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace; font-size: xx-small;">helis <- rbind(h1,h2)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">helis$test <- factor(helis$test)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">helis$PaperClip2 <- factor(ifelse(helis$PaperClip=='No','No',as.character(helis$test)),</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> levels=c('No','WH','RH'))</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="font-family: 'Courier New', Courier, monospace; font-size: xx-small;">library(R2jags)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">datain <- list(</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PaperType=c(1:4)[helis$PaperType],</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> WingLength=helis$WingLength,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BodyLength=helis$BodyLength,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BodyWidth=helis$BodyWidth,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PaperClip=c(1,2,3)[helis$PaperClip2],</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> Fold=c(0,1)[helis$Fold],</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> TapedBody=c(0,1)[helis$TapedBody],</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> TapedWing=c(0,1)[helis$TapedWing],</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> test=c(1,2)[helis$test],</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> Time=helis$Time,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> n=nrow(helis))</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">parameters <- c('Mul','WL','BL','PT','BW','PC','FO','TB','TW','StDev','residual')</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">jmodel <- function() {</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> for (i in 1:n) { </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> premul[i] <- (test[i]==1)+Mul*(test[i]==2)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> mu[i] <- premul[i] * (</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> WL*WingLength[i]+</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BL*BodyLength[i] + </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PT[PaperType[i]] +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BW*BodyWidth[i] +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PC[PaperClip[i]] +</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> FO*Fold[i]+</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> TB*TapedBody[i]+</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> TW*TapedWing[i]</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> )</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> Time[i] ~ dnorm(mu[i],tau[test[i]])</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> residual[i] <- Time[i]-mu[i]</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> }</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> for (i in 1:2) {</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> tau[i] <- pow(StDev[i],-2)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> StDev[i] ~dunif(0,3)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> }</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> for (i in 1:4) {</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PT[i] ~ dnorm(PTM,tauPT)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> }</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> tauPT <- pow(sdPT,-2)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> sdPT ~dunif(0,3)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PTM ~dnorm(0,0.01)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> WL ~dnorm(0,0.01)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BL ~dnorm(0,1000)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BW ~dnorm(0,1000)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PC[1] <- 0</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PC[2]~dnorm(0,0.01)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PC[3]~dnorm(0,0.01)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> </span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> FO ~dnorm(0,1000)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> TB ~dnorm(0,0.01)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> TW ~dnorm(0,0.01)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> Mul ~ dnorm(1,1) %_% I(0,2)</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">}</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">jj <- jags(model.file=jmodel,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> data=datain,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> parameters=parameters,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> progress.bar='gui',</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> n.chain=4,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> n.iter=3000,</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> inits=function() list(Mul=1.3,WL=0.15,BL=-.08,PT=rep(1,4),</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BW=0,PC=c(NA,0,0),FO=0,TB=0,TW=0))</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">jj</span></div>
</div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">#################################</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">datain <- list(</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PaperType=c(2,1,3,1)[helis$PaperType],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> WingLength=helis$WingLength,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BodyLength=helis$BodyLength,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BodyWidth=helis$BodyWidth,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PaperClip=c(1,2,3)[helis$PaperClip2],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> TapedBody=c(0,1)[helis$TapedBody],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> TapedWing=c(0,1)[helis$TapedWing],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> test=c(1,2)[helis$test],</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> Time=helis$Time,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> n=nrow(helis))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">parameters <- c('Mul','WL','BL','PT','BW','PC','TB','TW','StDev','residual',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> 'WLBW','WLPC', 'WLWL',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> 'BLPT' ,'BLPC', 'BLBL',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> 'BWPC', 'BWBW', 'other')</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">jmodel <- function() {</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> for (i in 1:n) { </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> premul[i] <- (test[i]==1)+Mul*(test[i]==2)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> mu[i] <- premul[i] * (</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> WL*WingLength[i]+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BL*BodyLength[i] + </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PT[PaperType[i]] +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BW*BodyWidth[i] +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PC[PaperClip[i]] +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> TB*TapedBody[i]+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> TW*TapedWing[i]+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> WLBW*WingLength[i]*BodyWidth[i]+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> WLPC[1]*WingLength[i]*(PaperClip[i]==2)+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> WLPC[2]*WingLength[i]*(PaperClip[i]==3)+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BLPT[1]*BodyLength[i]*(PaperType[i]==2)+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BLPT[2]*BodyLength[i]*(PaperType[i]==3)+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BLPC[1]*BodyLength[i]*(PaperClip[i]==2)+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BLPC[2]*BodyLength[i]*(PaperClip[i]==3)+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BWPC[1]*BodyWidth[i]*(PaperClip[i]==2)+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BWPC[2]*BodyWidth[i]*(PaperClip[i]==3) +</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> WLWL*WingLength[i]*WingLength[i]+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BLBL*BodyLength[i]*BodyLength[i]+</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BWBW*BodyWidth[i]*BodyWidth[i]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> )</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> Time[i] ~ dnorm(mu[i],tau[test[i]])</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> residual[i] <- Time[i]-mu[i]</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> }</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> for (i in 1:2) {</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> tau[i] <- pow(StDev[i],-2)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> StDev[i] ~dunif(0,3)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> WLPC[i] ~dnorm(0,1)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BLPT[i] ~dnorm(0,1)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BLPC[i] ~dnorm(0,1) </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BWPC[i] ~dnorm(0,1) </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> }</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> for (i in 1:3) {</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PT[i] ~ dnorm(PTM,tauPT)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> }</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> tauPT <- pow(sdPT,-2)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> sdPT ~dunif(0,3)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PTM ~dnorm(0,0.01)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> WL ~dnorm(0,0.01) </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BL ~dnorm(0,0.01)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BW ~dnorm(0,0.01)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PC[1] <- 0</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PC[2]~dnorm(0,0.01)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PC[3]~dnorm(0,0.01) </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> TB ~dnorm(0,0.01)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> TW ~dnorm(0,0.01)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> WLBW~dnorm(0,1)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> WLTW~dnorm(0,1)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> WLWL~dnorm(0,1)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BLBL~dnorm(0,1) </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> BWBW~dnorm(0,1)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> </span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> other~dnorm(0,1)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> Mul ~ dnorm(1,1) %_% I(0,2)</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">}</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><br /></span>
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">jj <- jags(model.file=jmodel,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> data=datain,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> parameters=parameters,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> progress.bar='gui',</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> n.chain=5,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> n.iter=4000,</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> inits=function() list(Mul=1.3,WL=0.15,BL=-.08,PT=rep(1,3),</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> PC=c(NA,0,0),TB=0,TW=0))</span><br />
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">jj</span><br />
<br /></div>
Wingfeethttp://www.blogger.com/profile/01585623097384646816noreply@blogger.com0