Data
Data is a fixed format file with eleven columns. Reading fixed format is not very difficult, however, it requires a bit of preparation. In this case, the first row contains a description of the columns, the second a number of '#'signs to denote the start of the columns. I found it most logical to first get the column widths, use these to create the variable names and then read the data. There is an interesting detail, the first two rows have '#' as first character, read.fwf has this as default comment character and skips them.library(dplyr)
library(ggplot2)
library(lme4)
r1 <- readLines('launchlog.txt')
colwidth <- gregexpr('#',r1[2])[[1]] %>%
c(.,max(nchar(r1))) %>%
diff(.)
cols <- read.fwf(textConnection(r1[1]),
widths=colwidth,
header=FALSE,
comment.char='@') %>%
unlist(.) %>%
make.names(.) %>%
gsub('\\.+$','',.) # remove extra . at end of string
r2 <- read.fwf('launchlog.txt',
widths=colwidth,
col.names=cols)
Some launches contain more than one pay load. These are represented by rows containing just pay load information. These are removed by removing all records without a success value. In addition, a date variable is created using the launch date. Extra spaces, a consequence of the fixed input format, are removed from the suc variable.
r3 <- filter(r2,!is.na(Suc))
Sys.setlocale(category = "LC_TIME", locale = "C")
r3$Launch.Date..UTC[1:3]
r3$Date <- as.Date(r3$Launch.Date..UTC,format='%Y %b %d')
levels(r3$Suc) <- gsub(' *','',levels(r3$Suc))
Plots
Preparation
Since I wanted proportion of days with a launch, I created a few extra data frames. alldays contains all dates. monthorder and wdorder provide additional variables intended to arrange the months and week days in a reasonable order.alldays <- data.frame(
Date=seq(
min(r3$Date),
max(r3$Date) ,
by=1 ))
mutate(.,
month=months(Date),
mn=format(Date,'%m')) %>%
select(.,month,mn) %>%
unique(.) %>%
arrange(.,mn)
wdorder <- alldays %>%
mutate(.,
wd=weekdays(Date),
wdn=format(Date,'%u')) %>%
select(.,wd,wdn) %>%
unique(.) %>%
arrange(.,wdn)
By month
January seems to get least launches. On the other hand December most. I wonder if that is publicity seeking in December of the well know January 1st deadline.By weekdays
As expected, middle of the week gets most launches. The lower value for Monday than Saturday suggests the day before a launch is just as busy as the day of launch itself.numlawd <- mutate(r3,
wd =factor(weekdays(Date),levels=wdorder$wd)) %>%
xtabs( ~ wd,.) %>%
as.data.frame(.) %>%
rename(.,Occur=Freq)
numwd <- mutate(alldays,
wd =factor(weekdays(Date),levels=wdorder$wd)) %>%
xtabs( ~ wd,.) %>%
as.data.frame(.) %>%
rename(.,NM=Freq)
propbwd <- merge(numlawd,numwd) %>%
mutate(prop=Occur/NM)
qplot(y=prop,x= wd,data=propbwd) +
ylab('Proportion') +
xlab('Day of Week') +
ggtitle('Proportion Week Days with Launch') +
coord_flip()
By year
For years I have made a split between success and fail. It is also the most interesting of plots. Proportion launches per day increases quickly in the first years, as does the chance of a success. By mid eighties the launches decrease, only to pick up in the new millennium. The end of the space race is not visible in the number of launches, but seems to coincide with less failed launches.
numlayr <- mutate(r3,year =factor(format(Date,'%Y'))) %>%
xtabs( ~ year + Suc,.) %>%
as.data.frame(.) %>%
rename(.,Occur=Freq)
numyr <- mutate(alldays,year =factor(format(Date,'%Y'))) %>%
xtabs( ~ year,.) %>%
as.data.frame(.) %>%
rename(.,NM=Freq)
propyr <- merge(numlayr,numyr) %>%
mutate(prop=Occur/NM,
Year=as.numeric(levels(year))[year])
qplot(y=prop,x= Year,colour=Suc,data=propyr) +
ylab('Proportion') +
xlab('Year') +
ggtitle('Proportion Days with Launch') +
scale_colour_discrete(name='Success')
Most busy day?
Once there were four launches on a day.r3 %>% xtabs(~factor(Date,levels=as.character(alldays$Date)),.) %>%
as.data.frame(.) %>%
rename(.,Nlaunch=Freq) %>%
xtabs(~Nlaunch,.) %>%
as.data.frame(.)
Nlaunch Freq
1 0 16154
2 1 4224
3 2 562
4 3 44
5 4 1
No comments:
Post a Comment