Sunday, July 14, 2013

Rain in Netherlands during the past 100 years

Climate has my interest. But discussions on climate change seem to be focused on temperature. In real life, we look at temperature, rain, sunshine and wind. I was therefor happy to find a load of rain data on Royal Netherlands Meteorological Institute. In this post I plot some of the data.


The actual page for the data is The data selected were from 6 locations; De Bilt, Groningen, Heerde, Kerkwerve, Ter Apel and Winterswijk (Sibinkweg). The reason for these specific stations is the length of the series; they start January 1906 and are still continuing. The header of each file reads (retaining only English and two lines of data);

STN      = stationnumber
YYYYMMDD = date (YYYY=year MM=month DD=day)
RD       = daily precipitation amount in 0.1 mm over the period 08.00 preceding day - 08.00 UTC present day
SX       = code for the snow cover at 08.00 UTC:

           code                           snow cover
           1                                    1 cm
           ...                                   ...
           996                                996 cm

           997 broken snow cover < 1 cm
           998 broken snow cover >=1 cm
           999 snow dunes

5 spaces represents a missing value

745,19060101,    0,     ,
It is a bit inconvenient to have a comma at the end of data fields, R expects more data after a comma. So as manual pre-processing I removed the header until the actual column headers and added dummy as final variable. The net effect is that the data can be read via read.csv. 
files <- dir(pattern='.csv$')
# extract link station number and location name 
splits <-,strsplit(files,'_|\\.'))
splits <- data.frame(STN=splits[,3],location=splits[,2])
#read and combine data
all <- lapply(files,read.csv)
all <-,all)
all <- merge(all,splits) 
all$Date <- as.Date(as.character(all$YYYYMMDD),format='%Y%m%d')
As in many countries we have seasons, so I do want to examine by month (in the proper order). For the blog, I want the months to be in English. (the formats such as %b, %d are documented in strptime, I always have difficulty finding them, so writing that down too).
Sys.setlocale(category = "LC_TIME", locale = "C")
all$Month <-  format(all$Date,'%b')
all$Month <- factor(all$Month,unique(all$Month))
all$Day <-  format(all$Date,'%d')
all$day <- as.numeric(all$Day)
all$Year <- format(all$Date,'%Y')
all$year <- as.numeric(all$Year)

Plotting the data

There are 39263 time points, which is too much for one plot. So I aggregated the data over locations, months and years. Four aggregate variables are used; mean and standard deviation of amount of rain, 75% quantile and % of days with rain. There were a seven records without data (April 1952, at Heerde), hence na.rm=TRUE. For the smoother a 0.5 level was used rather than the 0.95. By lowering the level it was possible to add the fourth aggregate variable.
rm <- aggregate(all$RD,list(Month=all$Month,
    function(x) mean(x,na.rm=TRUE))
rm$stat <- 'mean (0.1 mm)'
rpropd <- aggregate(all$RD,list(Month=all$Month,
    function(x) 100*mean(x>0,na.rm=TRUE))
rpropd$stat <- '% days'
sd <- aggregate(all$RD,list(Month=all$Month,
    function(x) sd(x,na.rm=TRUE))
sd$stat <- 'sd (0.1 mm)'

qu75 <- aggregate(all$RD,list(Month=all$Month,

     function(x) quantile(x,0.75,na.rm=TRUE))
qu75$stat <- '75 % quantile (0.1 mm)'
summ <- rbind(rm,sd,rpropd,qu75)
c1 <- ggplot(summ, aes(x=year, y=x,col=stat))
c1 + stat_smooth(method='loess',level=0.5) +
   facet_wrap( ~ Month) +
   theme(legend.position = "bottom") +
   labs(col = "Statistic") 
   geom_vline(xintercept = 1956,col='dark grey')

I never knew weather was this depressing in winter, not only is there little day light (sunrise 8:47, sunset 16:33 on Christmas ) but there is 60 % chance on rain within any in the three dark months (November, December, January). Luckily first impressions are a bit overdone, a day with rain is not necessarily a day with rain while you are going somewhere. An amount of 20 equates to 2 mm per day (or 3 on the rainy days), which is not a lot. 
The question of climate change implies there is a standard. Beginning last century, is that early enough? On the other hand, in this 100 years CO2 levels increased a lot, in Europe and the US smog is mostly a thing of the past ( UK clean air act is from 1956, the vertical line, but more recently it increased in South East Asia). There a quite a lot of shift mid 20th century, can they be explained by smog? Last years it seems rain is increasing in summer and decreasing in spring (although 2013 had a terrible March).

Plot with quantiles

To get a slightly better handle on the distributions I made a plot with quantiles. Rain gets more extreme in July August. Other months have less change but increasing, possibly with decreasing rain in spring.
qu <- lapply(seq(.5,.9,.1), function(qq) {
      aa<-  aggregate(all$RD,list(Month=all$Month,
     function(x) quantile(x,qq,na.rm=TRUE))
     aa$stat <- paste(round(100*qq),'%')

sumq <-,qu)
c1 <- ggplot(sumq, aes(x=year, y=x/10,col=stat))
c1 + stat_smooth(method='loess',level=.5) + 
    facet_wrap( ~ Month) +
    theme(legend.position = "bottom") +
    labs(col = "Quantile") + ylab('rain per day (mm)') +
    geom_vline(xintercept = c(1956),col='dark grey')


I did not think this would make a difference in such a small country. Indeed it is all so close in the figure that I decided not to use the bands. For those interested, Heerde is near Zwolle, centre of the Netherlands, while Kerkwerve is near Zierikzee (Zeeland), southwest of Netherlands, and it appears the better part of the Netherlands for summer vacation.
c1 <- ggplot(rml, aes(x=year, y=x,col=location,linetype=location))
c1 + stat_smooth(method='loess',se=FALSE) + 
    facet_wrap( ~ Month) +
    theme(legend.position = "bottom") +
    labs(col = " ",linetype=' ') +
    geom_vline(xintercept = c(1956),col='dark grey')


  1. I could only find historic data back to 1951 on the page URL you note above. The URL below gives the daily data back to earlier years. Perhaps I have not understood the Web pager options correctly.


  2. Hi George. Most of the data starts in 1951. But the few towns I selected had the complete series. At the page your link opened the bottom row (Axel) has 19060101 t/m 19950531.

  3. This is relatively old post. After putting # before the first lines until the headerline you can simply:
    path.filename= file.choose()
    MeteoDF = read.table(path.filename, header=T, sep="," ) # I prefer to go in two steps firstly a data frame which i can check by several functions including "str(MeteoDF)"
    # second one is the time series convertion by package zoo
    MeteoZ <-read.zoo(MeteoDF[,1:4] , index=2, format= "%Y%m%d", tz="" )
    # and the data in zoo timeseries can be read as simple as that. Several functions are avaible for futher analysis.