Reading the data
The source of the data is knmi.nl. I found a spreadsheet with data upto 28 January 2013 called geinduceerde-bevingen-nl.pdf. The data would not convert very well in calibre, but after zooming out, I could ctrl-A, ctrl-C it from Adobe into notepad. As so often, a bit of post processing is needed. These days I tend to do that in R.
library(ggplot2)
#
read as lines of text for processing
Datain
<-
readLines('aardbeving.txt')
head(Datain)
[1] "Geïnduceerde aardbevingen in Nederland"
[2] "YYMMDD TIME LOCATION LAT LON X_RD Y_RD INT MAG.DEPTH"
[3] "19861226 074751.00 Assen 52.992 6.548 232,924 556,587 4.5 2.8 1.0"
[4] "19871214 204951.05 Hooghalen 52.928 6.552 233,266 549,537 4 2.5 1.5"
[5] "19891201 200914.35 Purmerend 52.529 4.971 126,697 504,593 5 2.7 1.2"
[6] "19910215 021116.54 Emmen 52.771 6.914 257,992 532,491 3.5 2.2 3.0"
Although not clear from this head, it seems there are a number of problems running this through read.table(). The worst are; each page of the .pdf contains a header, LOCATION can contain a space, INT may be missing, MAG.DEPTH is actually two variables.
#print
notes, then remove them
grep('^YYMMDD|Ge|[[:digit:]]{8}',Datain,invert=TRUE,value=TRUE)
[1] "www.knmi.nl"
[2] "Any use of this material [text, illustrations, graphics] is only permitted when the source www.knmi.nl is acknowledged."
Datain
<-
grep('^(YYMMDD|[[:digit:]]{8})',Datain,value=TRUE)
#column
header is repeated. remove all but first
header
<-
which(Datain ==Datain[1])
Datain
<-
Datain[-header[-1]]
#
Quote around location names
Datain
<-
gsub('([[:digit:]])
([[:alpha:]\'])','\\1
"\\2',Datain,fixed=FALSE)
Datain
<-
gsub('([[:alpha:]() ])
([[:digit:]])','\\1"
\\2',Datain,fixed=FALSE)
#quote
around INT, so it can be read even if it is absent
Datain
<-
gsub('([[:digit:]]{2,3},[[:digit:]]{3}
[[:digit:]]{3},[[:digit:]]{3} )','\\1"
',Datain,fixed=FALSE)
Datain
<-
gsub('( -?[[:digit:]]*\\.[[:digit:]]*
[[:digit:]]*\\.[[:digit:]]*$)','
"\\1',Datain,fixed=FALSE)
#
remove one occurence 'not yet known'
Datain
<-
grep('Nog niet
bekend',Datain,value=TRUE,invert=TRUE)
#
and split the last two words
Datain
<-
gsub('MAG.DEPTH','Mag
Depth',Datain,fixed=TRUE
)
#
now read into a dataframe
r1
<-
read.table(textConnection(Datain),header=TRUE,
colClasses=c(c('character','character'),rep(NA,8)))
To use the data, I need a proper data-time variable. My interest is limited to Groningen and Drente, so some data is removed. The projection into a map I stole from another blog (rpsychologist.com) and updated for a newer version of ggplot2.
r1$DTC
<-
as.POSIXct(paste(r1$YYMMDD,r1$TIME),format='%Y%m%d
%H%M%S')
r1
<-
r1[order(r1$DTC),]
# calc time between Quakes and limit region
r1$diftime
<-
c(NA,diff(r1$DTC))/3600/24
ylim
<-
c(52.825,53.5)
xlim
<-
c(6.3,7.2)
# prepare for plot in map
# source:
http://rpsychologist.com/parsing-data-from-a-text-file-and-plotting-where-people-live-using-ggplot2-and-openstreetmaps/
Sys.setenv(NOAWT=1)
#call this before loading OpenStreetMap
library(OpenStreetMap)
r1
<-
cbind(r1,projectMercator(r1$LAT,r1$LON))
r2
<-
r1[r1$LAT>min(ylim)
&
r1$LON>min(xlim),]
How often?
The first question is: how often are these quakes? I have been using time between quakes as a measure. Since there are loads of quakes, a loess smoothed line is added. I did not use the ggplot2 function for this as I wanted to restrict my axes and that affects the smoother. As a bonus the colour represents magnitude. Nothing much happens in magnitude. KNMI writes (in Dutch) There is an increase in quakes since 2002. I have the feeling it is increasing more and more, but they are restricting in strength categories, so that may explain the difference. I do feel sorry for people having these quakes ever so often and apparently this will continue.
# how often?
lo1
<-
loess(diftime ~
as.numeric(DTC),data=r2[-1,]
)
lo2
<-
loess(Mag ~
as.numeric(DTC),data=r2[-1,]
)
topred
<-
data.frame(DTC=seq(r2$DTC[2],max(r1$DTC),length.out=40))
topred$diftime
<-
predict(lo1,topred)
topred$Mag
<-
predict(lo2,topred)
png('earthquake
time.png')
ggplot(r2,
aes(DTC,diftime,col=Mag)) +
geom_point()
+
scale_x_datetime('Date',limits=as.POSIXct(c('1995-01-01','2013-02-01')))
+
scale_colour_gradient(low="blue",high="red")
+
scale_y_log10('Time
between quakes (days)',limits=c(1e-1,50))
+
geom_line(size=2,data=topred)
dev.off()
Where?
I wanted a good reason to plot some data in a map for ages. I am very happy this is the good occasion. And it is actually very simple. The only difficulty was using colour for progression. ggplot2 didn't like the POSIXct variable DTC, so I had to make a second variable, mydate, which was numeric and mimics years. The red is more recent. Most quakes are north-east of Groningen city, which is the region I associate them with. There are definitely 'hot spots' and 'hot regions'. I was surprised to learn of quakes in Ter Apel (bottom right), and just south of Assen, that is (relatively) far away.
#where
is the quake?
mp
<-
openmap(upperLeft=c(max(ylim),min(xlim)),lowerRight=c(min(ylim),max(xlim)))
r2$mydate
<-
1995+(2013-1995)*(as.numeric(r2$DTC-as.POSIXct('1995-01-01')))/
as.numeric(as.POSIXct('2013-01-01')-as.POSIXct('1995-01-01'))
png('earthquake
map.png')
autoplot(mp)
+
geom_point(aes(x,y,
color=mydate,guide='legend'),
alpha
= I(7/10),
data=r2) +
theme(axis.line=element_blank(),axis.text.x=element_blank(),
axis.text.y=element_blank(),axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank())
+
scale_colour_gradient(low="blue",high="red",guide=guide_legend('Date'))
+
theme(legend.position
= "bottom")
dev.off()
No comments:
Post a Comment