Current data
The current information which I have are fuel consumption of last year. I have taken a set of those data from March and April of 2014.library(MASS)
r1 <- read.table(text='l km
28.25 710.6
22.93 690.4
28.51 760.5
23.22 697.9
31.52 871.2
24.68 689.6
30.85 826.9
23.04 699
29.96 845.3
30.16 894.7
25.71 696
23.6 669.8
28.57 739
27.23 727.4
18.31 499.9
24.28 689.5',header=TRUE)
r1$usage=100*r1$l/r1$km
plot(density(r1$usage),
main='Observed normal diesel usage',
xlab='l/100 km')
The data are from a distribution with a mean around 3.6 l/100 km.
fitdistr(r1$usage,'normal')
mean sd
3.59517971 0.19314598
(0.04828649) (0.03414371)
Approach
Analysis will be a hypothesis test and an estimate of premium diesel usage.
The assumptions which I will make are similar driving patterns and weather as last year. I think that should be possible, given my driving style. A cross check may be made, especially regarding obtaining similar speed. Data with serious traffic jams may be discarded in the analysis.
A check for outliers is not planned. However, obviously faulty data will be corrected or removed from the data. No intermediate analysis is planned, unless data seems to be pointing a marked increase of fuel usage.
The assumptions which I will make are similar driving patterns and weather as last year. I think that should be possible, given my driving style. A cross check may be made, especially regarding obtaining similar speed. Data with serious traffic jams may be discarded in the analysis.
A check for outliers is not planned. However, obviously faulty data will be corrected or removed from the data. No intermediate analysis is planned, unless data seems to be pointing a marked increase of fuel usage.
Power for hypothesis test
The advice price levels of premium and standard diesel are 1.433 and 1.363 Euro/liter according to the internet. This is about 5% price increase. It should be noted that prices at the pump vary wildly from these values, especially non-brand non-manned fuel stations may be significantly cheaper. Last year's data was from such non brand fuel. Competition can force the price of both standard and premium fuel down a bit. I will take the 5% price increase as target for finding value for premium diesel. Given significance level of 10% and power of 90%, I come at 17 samples for each group. This means I will have to take a bit more data from last year, which is not a problem. The choice of alpha and beta reflect that I find both kind of errors equally bad.
power.t.test(delta=3.6*.05,
sd=0.2,
sig.level=.1,
power=.9,
alternative='one.sided')
Two-sample t test power calculation
n = 16.66118
delta = 0.18
sd = 0.2
sig.level = 0.1
power = 0.9
alternative = one.sided
NOTE: n is number in *each* group
Estimating usage of premium diesel
Besides a significance test, I desire an estimate of usage. This manner I can extrapolate the data to other scenarios. I will use a Bayesian analysis to obtain these estimates. The prior to be used is a mixture of three believes. Either it does not make a difference, or there is indeed a 5% gain to be made or something else entirely. This latter is an uninformed prior between 3 and 4 l/km. The combined density is plotted below.
usage <- seq(3.2,3.8,.01)
dens <- (dnorm(usage,3.6,.05)+
dnorm(usage,3.6/1.05,.05)+
dnorm(usage,(3.6+3.6/1.05)/2,.15))/3
plot(x=usage,y=dens,type='l',
ylim=c(0,4),
ylab='density',
xlab='l/100 km',
main='prior')
No comments:
Post a Comment