Sunday, February 24, 2013

Earthquakes in Netherlands

In the Netherlands we have Natural Gas. Unfortunately winning this gas seems to cause some quakes. As quakes go, they are not strong. However, our buildings are not made to resist quakes, before 1986 they were unheard of, so there is some damage. It is now predicted they could get stronger and more frequent. This caused a bit of a stir in the Netherlands and I wondered if I could display the data so I can understand the quakes a bit better.

Reading the data

The source of the data is knmi.nl. I found a spreadsheet with data upto 28 January 2013 called geinduceerde-bevingen-nl.pdf. The data would not convert very well in calibre, but after zooming out, I could ctrl-A, ctrl-C it from Adobe into notepad. As so often, a bit of post processing is needed. These days I tend to do that in R.
library(ggplot2)
# read as lines of text for processing
Datain <- readLines('aardbeving.txt')
head(Datain)
[1] "Geïnduceerde aardbevingen in Nederland"                             
[2] "YYMMDD TIME LOCATION LAT LON X_RD Y_RD INT MAG.DEPTH"               
[3] "19861226 074751.00 Assen 52.992 6.548 232,924 556,587 4.5 2.8 1.0"  
[4] "19871214 204951.05 Hooghalen 52.928 6.552 233,266 549,537 4 2.5 1.5"
[5] "19891201 200914.35 Purmerend 52.529 4.971 126,697 504,593 5 2.7 1.2"
[6] "19910215 021116.54 Emmen 52.771 6.914 257,992 532,491 3.5 2.2 3.0"  
Although not clear from this head, it seems there are a number of problems running this through read.table(). The worst are; each page of the .pdf contains a header, LOCATION can contain a space, INT may be missing, MAG.DEPTH is actually two variables.
#print notes, then remove them
grep('^YYMMDD|Ge|[[:digit:]]{8}',Datain,invert=TRUE,value=TRUE)
[1] "www.knmi.nl"  
[2] "Any use of this material [text, illustrations, graphics] is only permitted when the source www.knmi.nl is acknowledged."
Datain <- grep('^(YYMMDD|[[:digit:]]{8})',Datain,value=TRUE)
#column header is repeated. remove all but first
header <- which(Datain ==Datain[1])
Datain <- Datain[-header[-1]]
# Quote around location names
Datain <- gsub('([[:digit:]]) ([[:alpha:]\'])','\\1 "\\2',Datain,fixed=FALSE)
Datain <- gsub('([[:alpha:]() ]) ([[:digit:]])','\\1" \\2',Datain,fixed=FALSE)
#quote around INT, so it can be read even if it is absent
Datain <- gsub('([[:digit:]]{2,3},[[:digit:]]{3} [[:digit:]]{3},[[:digit:]]{3} )','\\1" ',Datain,fixed=FALSE)
Datain <- gsub('( -?[[:digit:]]*\\.[[:digit:]]* [[:digit:]]*\\.[[:digit:]]*$)',' "\\1',Datain,fixed=FALSE)
# remove one occurence 'not yet known'
Datain <- grep('Nog niet bekend',Datain,value=TRUE,invert=TRUE)
# and split the last two words
Datain <- gsub('MAG.DEPTH','Mag Depth',Datain,fixed=TRUE )
# now read into a dataframe
r1 <- read.table(textConnection(Datain),header=TRUE,
colClasses=c(c('character','character'),rep(NA,8)))
To use the data, I need a proper data-time variable. My interest is limited to Groningen and Drente, so some data is removed. The projection into a map I stole from another blog (rpsychologist.com) and updated for a newer version of ggplot2.
r1$DTC <- as.POSIXct(paste(r1$YYMMDD,r1$TIME),format='%Y%m%d %H%M%S')
r1 <- r1[order(r1$DTC),]
# calc time between Quakes and limit region
r1$diftime <- c(NA,diff(r1$DTC))/3600/24
ylim <- c(52.825,53.5)
xlim <- c(6.3,7.2)
# prepare for plot in map
# source: http://rpsychologist.com/parsing-data-from-a-text-file-and-plotting-where-people-live-using-ggplot2-and-openstreetmaps/
Sys.setenv(NOAWT=1) #call this before loading OpenStreetMap
library(OpenStreetMap)
r1 <- cbind(r1,projectMercator(r1$LAT,r1$LON))
r2 <- r1[r1$LAT>min(ylim) & r1$LON>min(xlim),]

How often?

The first question is: how often are these quakes? I have been using time between quakes as a measure. Since there are loads of quakes, a loess smoothed line is added. I did not use the ggplot2 function for this as I wanted to restrict my axes and that affects the smoother. As a bonus the colour represents magnitude. Nothing much happens in magnitude. KNMI writes (in Dutch) There is an increase in quakes since 2002. I have the feeling it is increasing more and more, but they are restricting in strength categories, so that may explain the difference. I do feel sorry for people having these quakes ever so often and apparently this will continue.
# how often?
lo1 <- loess(diftime ~ as.numeric(DTC),data=r2[-1,] )
lo2 <- loess(Mag ~ as.numeric(DTC),data=r2[-1,] )

topred <- data.frame(DTC=seq(r2$DTC[2],max(r1$DTC),length.out=40))
topred$diftime <- predict(lo1,topred)
topred$Mag <- predict(lo2,topred)

png('earthquake time.png')
ggplot(r2, aes(DTC,diftime,col=Mag)) +
geom_point() +
scale_x_datetime('Date',limits=as.POSIXct(c('1995-01-01','2013-02-01'))) +
scale_colour_gradient(low="blue",high="red") +
scale_y_log10('Time between quakes (days)',limits=c(1e-1,50)) +
geom_line(size=2,data=topred)
dev.off()

Where?

I wanted a good reason to plot some data in a map for ages. I am very happy this is the good occasion. And it is actually very simple. The only difficulty was using colour for progression. ggplot2 didn't like the POSIXct variable DTC, so I had to make a second variable, mydate, which was numeric and mimics years. The red is more recent. Most quakes are north-east of Groningen city, which is the region I associate them with. There are definitely 'hot spots' and 'hot regions'. I was surprised to learn of quakes in Ter Apel (bottom right), and just south of Assen, that is (relatively) far away.
#where is the quake?
mp <- openmap(upperLeft=c(max(ylim),min(xlim)),lowerRight=c(min(ylim),max(xlim)))
r2$mydate <- 1995+(2013-1995)*(as.numeric(r2$DTC-as.POSIXct('1995-01-01')))/
as.numeric(as.POSIXct('2013-01-01')-as.POSIXct('1995-01-01'))
png('earthquake map.png')
autoplot(mp) +
geom_point(aes(x,y, color=mydate,guide='legend'),
alpha = I(7/10), data=r2) +
theme(axis.line=element_blank(),axis.text.x=element_blank(),
axis.text.y=element_blank(),axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank()) +
scale_colour_gradient(low="blue",high="red",guide=guide_legend('Date')) +
theme(legend.position = "bottom")
dev.off()

No comments:

Post a Comment