In this post I choose to use generalized additive models (GAM). The aim is to build models for SJuiciness, SSweetness, SCrispness and SMealiness. The latter because in the previous post it was possible that it has interest, so I want to keep it in the picture.
GAM appears to be a relatively simple to apply method, however, even with four variables the 17 rows in the data is too low to allow an extensive model, meaning that I had to restrict the smoother factor.
Data preparation and SJuiciness
library(xlsReadWrite)
library(ggplot2)
library(gam)
datain <- read.xls('condensed.xls')
#remove storage conditions
datain <- datain[-grep('bag|net',datain$Products,ignore.case=TRUE),]
datain$week <- sapply(strsplit(as.character(datain$Product),'_'),
function(x) x[[2]])
dataval <- datain
vars <- names(dataval)[-1]
for (descriptor in vars) {
dataval[,descriptor] <- as.numeric(gsub('[[:alpha:]]','',
dataval[,descriptor]))
}
#CJuiciness
indepV <- grep('^A',vars,value=TRUE)
MJuicL <- gam(SJuiciness ~ AInstrumental.firmness + AJuice.release +
ASoluble.solids + ATitratable.acidity ,data=dataval)
MJuics2 <- gam(SJuiciness ~ s(AInstrumental.firmness,2) + s(AJuice.release,2) +
s(ASoluble.solids,2) + s(ATitratable.acidity,2) ,data=dataval)
MJuics3 <- gam(SJuiciness ~ s(AInstrumental.firmness,3) + s(AJuice.release,3) +
s(ASoluble.solids,3) + s(ATitratable.acidity,3) ,data=dataval)
summary(MJuicL)
Call: gam(formula = SJuiciness ~ AInstrumental.firmness + AJuice.release +
ASoluble.solids + ATitratable.acidity, data = dataval)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.9405 -2.5110 -0.6978 3.1524 7.9597
(Dispersion Parameter for gaussian family taken to be 18.5651)
Null Deviance: 1002.575 on 16 degrees of freedom
Residual Deviance: 222.7816 on 12 degrees of freedom
AIC: 103.9845
1 observation deleted due to missingness
Number of Local Scoring Iterations: 2
DF for Terms
Df
(Intercept) 1
AInstrumental.firmness 1
AJuice.release 1
ASoluble.solids 1
ATitratable.acidity 1
summary(MJuics2)
Call: gam(formula = SJuiciness ~ s(AInstrumental.firmness, 2) + s(AJuice.release,
2) + s(ASoluble.solids, 2) + s(ATitratable.acidity, 2), data = dataval)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.3781 -2.1730 -0.8181 1.9755 6.2297
(Dispersion Parameter for gaussian family taken to be 19.4993)
Null Deviance: 1002.575 on 16 degrees of freedom
Residual Deviance: 155.997 on 8.0002 degrees of freedom
AIC: 105.9262
1 observation deleted due to missingness
Number of Local Scoring Iterations: 2
DF for Terms and F-values for Nonparametric Effects
Df Npar Df Npar F Pr(F)
(Intercept) 1
s(AInstrumental.firmness, 2) 1 1 0.77410 0.4046
s(AJuice.release, 2) 1 1 1.14455 0.3159
s(ASoluble.solids, 2) 1 1 0.40054 0.5444
s(ATitratable.acidity, 2) 1 1 1.04256 0.3371
summary(MJuics3)
Call: gam(formula = SJuiciness ~ s(AInstrumental.firmness, 3) + s(AJuice.release,
3) + s(ASoluble.solids, 3) + s(ATitratable.acidity, 3), data = dataval)
Deviance Residuals:
2 3 4 5 6 7 8 9
4.06906 -0.28794 -1.14197 -0.20378 -1.77053 0.68596 2.67061 -0.38020
12 13 14 15 16 17 18 19
-4.06569 3.64093 -1.10219 -0.08048 1.03191 -1.13588 2.64431 -3.02135
20
-1.55275
(Dispersion Parameter for gaussian family taken to be 20.1907)
Null Deviance: 1002.575 on 16 degrees of freedom
Residual Deviance: 80.7624 on 4 degrees of freedom
AIC: 102.735
1 observation deleted due to missingness
Number of Local Scoring Iterations: 4
DF for Terms and F-values for Nonparametric Effects
Df Npar Df Npar F Pr(F)
(Intercept) 1
s(AInstrumental.firmness, 3) 1 2 0.78051 0.5174
s(AJuice.release, 3) 1 2 1.04430 0.4316
s(ASoluble.solids, 3) 1 2 0.31434 0.7468
s(ATitratable.acidity, 3) 1 2 1.25657 0.3772
anova(MJuicL3,MJuics2,MJuicL)
Analysis of Deviance Table
Model 1: SJuiciness ~ s(AInstrumental.firmness, 3) + s(AJuice.release,
3) + s(ASoluble.solids, 3) + s(ATitratable.acidity, 3)
Model 2: SJuiciness ~ s(AInstrumental.firmness, 2) + s(AJuice.release,
2) + s(ASoluble.solids, 2) + s(ATitratable.acidity, 2)
Model 3: SJuiciness ~ AInstrumental.firmness + AJuice.release + ASoluble.solids +
ATitratable.acidity
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 4.0000 80.762
2 8.0002 155.997 -4.0002 -75.235 0.4444
3 12.0000 222.782 -3.9998 -66.785 0.5077
par(mfrow=c(2,2))
plot(MJuics2,se=TRUE)
plot(MJuics3,se=TRUE)
From this plot and tables it was concluded that ASoluble.solids is not needed. AJuice.release and ATitrable.Acidity may have non-linear effects and AInstrumental.firmness has a linear effect. Based on this, two models are defined, with different levels of curvature.
MJuicsSEL2 <- gam(SJuiciness ~ AInstrumental.firmness + s(AJuice.release,2) +
s(ATitratable.acidity,2) ,data=dataval)
MJuicsSEL3 <- gam(SJuiciness ~ AInstrumental.firmness + s(AJuice.release,3) +
s(ATitratable.acidity,3) ,data=dataval)
anova(MJuicL,MJuicsSEL2,MJuicsSEL3)
Analysis of Deviance Table
Model 1: SJuiciness ~ AInstrumental.firmness + AJuice.release + ASoluble.solids +
ATitratable.acidity
Model 2: SJuiciness ~ AInstrumental.firmness + s(AJuice.release, 2) +
s(ATitratable.acidity, 2)
Model 3: SJuiciness ~ AInstrumental.firmness + s(AJuice.release, 3) +
s(ATitratable.acidity, 3)
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 12 222.78
2 11 177.48 0.99998 45.302 0.05808 .
3 9 113.53 2.00006 63.953 0.07927 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From this it is concluded that the model MJuicsSEL3 is the best model to predict SJuiciness.
SSweetness, SCrispness and SMealiness
For briefness, it is chosen not to show all the results of the other three responses. The plot of the final models is shown
Ssweetness is mainly related to AJuicerelease (3 df) and smaller linear effects of the other three variables.
SCrispness is non-linear related to AInstrumental.firmness (3 df), with the other variables much less influential linear effects.
SMealiness again a 3 df smoother with much less influential effects.
Discussion
GAM
GAM is suitable method to examine models where there are no (expected) interactions, but there are expected non-linear effects. It is simple to use. It shows which of the variables have the larger effects and how these effects approximately look like. However, it does have its problems. For instance, there is an artifact in AInstrumental.firmness around 60 where there is a small bump in the responses. If this is a representation of a physical reality, then the rest of the curves seems fairly smooth. It would also indicate that the amount of data (types of apples) needs to be much larger to get a good understanding of the relations between the variables. A second problem is the absence of any interactions. In much of the data interactions between variables are needed.
Overall model for liking
The various parts of the model have now been defined. But, these separate parts do not make a whole model. This needs to be assembled and subsequently compared with a more traditional model such as (multiblock) PLS.
No comments:
Post a Comment