Data
Data are stone flakes data which I analyzed previously. The first post was clustering, second linking to hominid type, third regression. Together these made for the bulk of a standard analysis. In this new analysis the same starting data is used.r2 <- read.table('StoneFlakes.txt',header=TRUE,na.strings='?')
r1 <- read.table('annotation.txt',header=TRUE,na.strings='?')
r12 <- merge(r1,r2)
r12$group <- factor(r12$group,labels=c('Lower Paleolithic',
'Levallois technique',
'Middle Paleolithic',
'Homo Sapiens'))
r12$site <- factor(c('other','gravel pit')[r12$site+1])
r12$mat <- factor(c('flint','other')[r12$mat])
r12$lmage <- log10(-r12$age)
Deal
The starting point of this post was to continue/ my analysis of last week. But when I discovered deal could be used to discover the model, repeating last week's analysis was chosen instead. Deal does not have a Vignette, but there is a paperdeal: A Package for Learning Bayesian Networks which helped me very well to get started.
First Model
Initially I wanted to start with a model containing only continuous variables, similar to before, but that threw an error in jointprior(). Hence I added groups as factor. Autosearch() and heuristicsearch() give a lot of output, basically one line for each step. For brievety these are not shown. The good thing about this model is that it has a solution where 'group' is driving other variables.library(deal)
rfin <- subset(r12,,c(names(r2)[-1],'group'))
rfin <- rfin[complete.cases(rfin),]
rfin.nw <- network(rfin)
rfin.prior <- jointprior(rfin.nw)
Imaginary sample size: 8
rfin.nw <- learn(rfin.nw,rfin,rfin.prior)$nw
rfin.search <- autosearch(rfin.nw,
rfin,
rfin.prior,
trace=FALSE)
plot(rfin.search$nw)
Heuristic is used to further improve the model. In the end the model seems a bit more complex that pcalg, but not unreasonably so.
rfin,
rfin.prior,
restart=10,
trace=FALSE,
trylist=rfin.search$trylist)
plot(rfin.heuristic$nw)
Second model
For brevity I won't be repeating the code. It is all the same except for the data going in, which will be shown. The second model is similar to the first, but the (potential) outliers have been removed. This model looks even more clean.!(r12$ID %in% c('ms','c','roe','sz','va','arn')),
c(names(r2)[-1],'group'))
rfin <- rfin[complete.cases(rfin),]
Third model
Including all sensible factors is the model I wanted to do. However, it seemed the imaginary sample size grew to 96. It is my experience that higher imaginary sample sizes produce more complex networks and longer run times. The current model seems a bit too complex to my liking. Moving under the recommended number gave runtime errors.Final model
In the final model (a few are skipped now) a number of simplifications were made. Region is removed as factor, log(-age) as continuous variable. Group has lost Homo Sapiens, since that category had only three records. The model is restricted in the sense that group cannot be the result of other variables. In the plot these are shown as red arrows.rfin <- subset(r12,
!(r12$ID %in% c('ms','c','roe','sz','va','arn')),
c(-ID,-number,-age,-dating,-region,-lmage))
rfin <- rfin[rfin$group !='Homo Sapiens',]
rfin <- rfin[complete.cases(rfin),]
rfin.nw <- network(rfin)
rfin.prior <- jointprior(rfin.nw)
mybanlist <- matrix(
c(2:11,
rep(1,10)),ncol=2)
banlist(rfin.nw) <- mybanlist
No comments:
Post a Comment