Sunday, November 29, 2015

Wind in Netherlands II

Two weeks ago I plotted how wind measurements on the edge of the North Sea changed in the past century. This week the same dataset is used for hypothesis testing.

Data

The most important things to reiterate from previous post is that the data is from KNMI and they come with a comment: "These time series are inhomogeneous because of station relocations and changes in observation techniques. As a result, these series are not suitable for trend analysis. For climate change studies we refer to the homogenized series of monthly temperatures of De Bilt or the Central Netherlands Temperature"
Data reading has slighlty changed, mostly because I needed different variables. In addition, for testing I wanted some categorical variables, these are Month and year. For year I have chosen five chunks of 22 years, 22 was chosen since it seemed large enough and resulted in approximately equal size chunks. Finally, for display purposes, wind direction was categorized in 8 directions according to the compass rose (North, North-East, East etc.).
library(circular)
library(dplyr)
library(ggplot2)
library(WRS2)

r1 <- readLines('etmgeg_235.txt')
r2 <- r1[grep('^#',r1):length(r1)]
explain <- r1[1:(grep('^#',r1)-1)]
explain
r2 <- gsub('#','',r2)
r3 <- read.csv(text=r2)

r4 <- mutate(r3,
        Date=as.Date(format(YYYYMMDD),format='%Y%m%d'),
        year=floor(YYYYMMDD/1e4),
        rDDVEC=as.circular(DDVEC,units='degrees',template='geographics'),
        # Vector mean wind direction in degrees 
        # (360=north, 90=east, 180=south, 270=west, 0=calm/variable)
        DDVECf=as.character(cut(DDVEC,breaks=c(0,seq(15,330,45),361),left=TRUE,
            labels=c('N','NE','E','SE','S','SW','W','NW','N2'))),
        DDVECf=ifelse(DDVECf=='N2','N',DDVECf),
        DDVECf=factor(DDVECf,levels=c('N','NE','E','SE','S','SW','W','NW')),
        rFHVEC=FHVEC/10, # Vector mean windspeed (in 0.1 m/s)
        yearf=cut(year,seq(1905,2015,22),labels=c('05','27','49','71','93')),
        month=factor(format(Date,'%B'),levels=month.name),
        tcat=interaction(month,yearf)
    ) %>%
    select(.,YYYYMMDD,Date,year,month,DDVEC,rDDVEC,DDVECf,rFHVEC,yearf,tcat)

Analysis

The circular package comes with an aov.circular() function, which can do one way analysis. Since I am a firm believer that direction varies according to the seasons, the presence of a time effect (the five categories) has been examined by Month. To make result compact, only p-values are displayed, they are all significant.
sapply(month.name,function(x) {
      aa <- filter(r4,month==x)
      bb <- aov.circular(aa$rDDVEC,aa$yearf,method='F.test')
      format.pval(bb$p.value,digits=4,eps=1e-5)
    }) %>% as.data.frame
January   4.633e-05
February    < 1e-05
March       < 1e-05
April       < 1e-05
May         < 1e-05
June        0.00121
July       0.000726
August    0.0001453
September   0.02316
October     < 1e-05
November  0.0001511
December   0.003236
The associated plot with this data shows frequency of directions by year and Month. The advantage here being that the time axis is the x-axis, so changes are more easily visible.
ggplot(r4[complete.cases(r4),], aes(x=yearf))+
    geom_histogram()+
    facet_grid(DDVECf ~ month)+
    ggtitle('Frequency of Wind Direction')

The other part of wind is strength. Two weeks ago I saw clear differences. However, since that may also be effect of instrument or location change. The test I am interested here is therefore not the main effect of year categories but rather the interaction Month*Year. In the objective of robustness I wanted to go nonparametric with this. However, since I did not find anything regarding two factor interaction in my second edition of Hollander and Wolfe I googled for robust interaction. This gave a hit on rcompanion for the WRS2 package.
t2way(rFHVEC ~ yearf + month + yearf:month, 
    data = r4)
                value p.value
yearf       1063.0473   0.001
month        767.5687   0.001
yearf:month  169.4807   0.001

Conclusion

The data seems to show a change in wind measurements over these 110 years. This can be due to changes in wind or measurement instrument or instrument location. The statistical testing was chosen such as to counter some effects of these changes, hence it can be thought that the change is due to changes in wind itself.

1 comment:

  1. Is there no information available to when instruments were changed? Might be worthwile to plug that info into the ANOVA...

    Cheers,
    Andrej

    ReplyDelete