Sunday, January 13, 2013

European Migration

Last week on the radio I heard a story of southern Europeans and Irish looking for better times in northern Europe. I heard the tale of an Italian academic who left Italy to end up waiting tables in an Italian restaurant in the Netherlands. Obviously this is not good, not good for Italy which loses its academics, not good for somebody who actually desires to wait tables. I cannot blame the Italian academic though. I also heard the tale of an Irish guy who ended up working for ASML, which is much better (except for Ireland obviously).
Based on this tale I wondered if this migration is big enough to be visible in population statistics and would make for nice plots. Eurostat, the European Statistical Agency, is the place to get such data. Since I tend to want to look at data then decide, I took all years and countries. My chosen format was year and country in columns, with adjacent the population, migration, natural change, births and deaths.

First thing to do, was have a go at the labels, which were extremely long.

r1 <- read.csv("population.eu.csv")
levels(r1$Region) <- gsub('European Economic Area','EEA' ,levels(r1$Region))
levels(r1$Region) <- gsub(' plus IS, LI, NO','+' ,levels(r1$Region),fixed=TRUE)
levels(r1$Region) <- sub(' countries)',')' ,levels(r1$Region),fixed=TRUE)
levels(r1$Region) <- sub(' (under United Nations Security Council Resolution 1244/99)','' ,levels(r1$Region),fixed=TRUE)
levels(r1$Region) <- sub('European Free Trade Association','EFTA' ,levels(r1$Region),fixed=TRUE)
levels(r1$Region) <- sub('Former Yugoslav Republic of Macedonia, the','FYROM' ,levels(r1$Region),fixed=TRUE)
levels(r1$Region) <- sub('including +former GDR','Incl GDR' ,levels(r1$Region))
levels(r1$Region) <- sub('European Union','EU' ,levels(r1$Region))
levels(r1$Region)

 [1] "Albania"                      "Andorra"                     
 [3] "Armenia"                      "Austria"                     
 [5] "Azerbaijan"                   "Belarus"                     
 [7] "Belgium"                      "Bosnia and Herzegovina"      
 [9] "Bulgaria"                     "Croatia"                     
[11] "Cyprus"                       "Czech Republic"              
[13] "Denmark"                      "Estonia"                     
[15] "Euro area (15)"               "Euro area (16)"              
[17] "Euro area (17)"               "EEA (EU-25+)"                
[19] "EEA (EU-27+)"                 "EFTA"                        
[21] "EU (25)"                      "EU (27)"                     
[23] "Finland"                      "FYROM"                       
[25] "France"                       "France (metropolitan)"       
[27] "Georgia"                      "Germany (Incl GDR from 1991)"
[29] "Germany (Incl GDR)"           "Greece"                      
[31] "Hungary"                      "Iceland"                     
[33] "Ireland"                      "Italy"                       
[35] "Kosovo"                       "Latvia"                      
[37] "Liechtenstein"                "Lithuania"                   
[39] "Luxembourg"                   "Malta"                       
[41] "Moldova"                      "Monaco"                      
[43] "Montenegro"                   "Netherlands"                 
[45] "Norway"                       "Poland"                      
[47] "Portugal"                     "Romania"                     
[49] "Russia"                       "San Marino"                  
[51] "Serbia"                       "Slovakia"                    
[53] "Slovenia"                     "Spain"                       
[55] "Sweden"                       "Switzerland"                 
[57] "Turkey"                       "Ukraine"                     
[59] "United Kingdom"              
As can be seen, the list of countries is fairly long. After checking I found that "France (metropolitan)" is the part of France within Europe, while France in addition contains parts outside Europe. 
Next step was to make relative changes and a plot.
r1$RelMig <- 100*r1$MigrationA/r1$Population
r1$RelNC <- 100*r1$NatChange/r1$Population
r1$RelCh <- 100*r1$TotChange/r1$Population
r1$RelB <- 100*r1$LiveBirths/r1$Population
r1$RelD <- 100*r1$Deaths/r1$Population
xyplot(RelMig+RelNC+RelCh+RelB+RelD ~ Year | Region,data=r1,type='l',
    auto.key=TRUE)
Most obvious 'features' of this plot are too many countries, still too long country names. Data shows big migration in FYROM (Macedonia), Albania, and Bosnia Herzegovina, which swamps all other information. Clearly war has big effect on migration, more than the economy. There are also a number of countries with only limited years. 
r2 <- r1[! (r1$Region %in% grep('FYROM|Albania|Bosnia|15|16|25|1991',levels(r1$Region),value=TRUE)),]
xt <- xtabs(!is.na(RelMig) ~ Region,data=r1)
table(xt)
xt
 1  6  7 14 15 16 17 40 45 48 49 51 52 
 1  1  1  1  1  7  2  1  1  1  1  8 33 
r3 <- r2[! (r2$Region %in% names(xt[xt<40])),]
levels(r3$Region)[levels(r3$Region)=='United Kingdom'] <- 'UK'
levels(r3$Region)[levels(r3$Region)=='Germany (Incl GDR)'] <- 'Germany'
levels(r3$Region)[levels(r3$Region)=='France (metropolitan)'] <- 'France'

Lattice

library(lattice)
xyplot(RelMig + RelNC + RelCh ~ Year | Region,
    data=r3[r3$Region %in% c("Portugal","Iceland","Ireland","Italy","Greece","Spain",
            "Sweden","Finland","Denmark","Germany","Netherlands","Austria",
            "UK","Luxembourg","France"),],
    type='l',
    abline = list(h = c(0,NA,NA),v=c(NA,1999,2007),col=c('grey','blue','red')),
    ylim=c(-1.6,3),ylab='% Change in Population',
    xlim=c(1989,2012),
    auto.key=list(space='bottom',columns=3,
        text=c('Migration','Natural causes','Total'),
        points=FALSE,lines=TRUE)
)


In the plot the vertical blue line represents start of the Euro zone, while the red line represents start of the crisis. The data had quite some surprises for me. Large crisis effects in Iceland and Ireland, much larger than in southern Europe. Big immigration in in Luxembourg. Big immigration in Italy, Spain, Iceland and Ireland before the crisis. And yes, the crisis did cause emigration in Spain and Greece. Not big, but 2012 data is not yet available. Finally, regarding all those Polish working in western Europe, I don't see an effect in Poland.     

Ggplot2 

Finally, I wanted to do the same in ggplot2. It took me ages to learn lattice, now it starts again with ggplot2. Unfortunately I did not manage to make the countries to cover both rows and columns, so the number of countries is more reduced. I was not able to place the legend title above the legend either. 
r4 <- r3[r3$Region %in% c("Portugal","Ireland","Italy","Greece","Spain",
        "Germany",
        "UK","France"),]
r4$Region <- factor(r4$Region)
table(r4$Region)
r5 <- reshape(r4,varying=list(c('RelMig','RelNC','RelCh')),
    idvar=c('Year','Region'),timevar='Source',direction='long',
    v.names=c('Percentage'),
    times=c('Migration','Natural Causes','Total Change'),
    drop=c('LiveBirths','Deaths','NatChange','Migration','TotChange','RelB','RelD'))

ggplot(r5,aes(x=Year,y=Percentage,colour=Source))  + #group=Source,
    geom_line() +
    facet_grid(. ~ Region, drop=TRUE) +
    scale_x_continuous(breaks=c(1990,2000,2010),
        labels=c("'90","2000","'10"),limits=c(1989,2012)) +
    scale_y_continuous('% Change',limits=c(-1,3)) +
    theme(legend.position = "bottom") 

Postscript. Following comment Facet_wrap was what I needed:
r4 <- r3[r3$Region %in% c("Portugal","Iceland","Ireland","Italy","Greece","Spain",

        "Sweden","Finland","Denmark","Germany","Netherlands","Austria",
        "UK","Luxembourg","France","Poland"),]
r4$Region <- factor(r4$Region)
table(r4$Region)
r5 <- reshape(r4,varying=list(c('RelMig','RelNC','RelCh')),
    idvar=c('Year','Region'),timevar='Source',direction='long',
    v.names=c('Percentage'),
    times=c('Migration','Natural Causes','Total Change'),
    drop=c('LiveBirths','Deaths','NatChange','Migration','TotChange','RelB','RelD'))
ggplot(r5,aes(x=Year,y=Percentage,colour=Source))  + #group=Source,
    geom_line() +
    facet_wrap( ~ Region, drop=TRUE) +
    scale_x_continuous(breaks=c(1990,2000,2010),
        labels=c("'90","2000","'10"),limits=c(1989,2012)) +
    scale_y_continuous('% Change',limits=c(-1,3)) +
    theme(legend.position = "bottom") 

2 comments:

  1. Nice post. Try replacing "facet_grid(. ~ Region, drop=TRUE)" by "facet_wrap(~ Region, ncol = 4)" - then the expected grid should be created.

    Cheers
    harald

    ReplyDelete
    Replies
    1. Thanks. Facet_wrap was the trick indeed.

      Delete