In this post I choose to use generalized additive models (GAM). The aim is to build models for SJuiciness, SSweetness, SCrispness and SMealiness. The latter because in the previous post it was possible that it has interest, so I want to keep it in the picture.

GAM appears to be a relatively simple to apply method, however, even with four variables the 17 rows in the data is too low to allow an extensive model, meaning that I had to restrict the smoother factor.

#### Data preparation and SJuiciness

library(xlsReadWrite)

library(ggplot2)

library(gam)

datain <- read.xls('condensed.xls')

#remove storage conditions

datain <- datain[-grep('bag|net',datain$Products,ignore.case=TRUE),]

datain$week <- sapply(strsplit(as.character(datain$Product),'_'),

function(x) x[[2]])

dataval <- datain

vars <- names(dataval)[-1]

for (descriptor in vars) {

dataval[,descriptor] <- as.numeric(gsub('[[:alpha:]]','',

dataval[,descriptor]))

}

#CJuiciness

indepV <- grep('^A',vars,value=TRUE)

MJuicL <- gam(SJuiciness ~ AInstrumental.firmness + AJuice.release +

ASoluble.solids + ATitratable.acidity ,data=dataval)

MJuics2 <- gam(SJuiciness ~ s(AInstrumental.firmness,2) + s(AJuice.release,2) +

s(ASoluble.solids,2) + s(ATitratable.acidity,2) ,data=dataval)

MJuics3 <- gam(SJuiciness ~ s(AInstrumental.firmness,3) + s(AJuice.release,3) +

s(ASoluble.solids,3) + s(ATitratable.acidity,3) ,data=dataval)

summary(MJuicL)

Call: gam(formula = SJuiciness ~ AInstrumental.firmness + AJuice.release +

ASoluble.solids + ATitratable.acidity, data = dataval)

Deviance Residuals:

Min 1Q Median 3Q Max

-5.9405 -2.5110 -0.6978 3.1524 7.9597

(Dispersion Parameter for gaussian family taken to be 18.5651)

Null Deviance: 1002.575 on 16 degrees of freedom

Residual Deviance: 222.7816 on 12 degrees of freedom

AIC: 103.9845

1 observation deleted due to missingness

Number of Local Scoring Iterations: 2

DF for Terms

Df

(Intercept) 1

AInstrumental.firmness 1

AJuice.release 1

ASoluble.solids 1

ATitratable.acidity 1

summary(MJuics2)

Call: gam(formula = SJuiciness ~ s(AInstrumental.firmness, 2) + s(AJuice.release,

2) + s(ASoluble.solids, 2) + s(ATitratable.acidity, 2), data = dataval)

Deviance Residuals:

Min 1Q Median 3Q Max

-5.3781 -2.1730 -0.8181 1.9755 6.2297

(Dispersion Parameter for gaussian family taken to be 19.4993)

Null Deviance: 1002.575 on 16 degrees of freedom

Residual Deviance: 155.997 on 8.0002 degrees of freedom

AIC: 105.9262

1 observation deleted due to missingness

Number of Local Scoring Iterations: 2

DF for Terms and F-values for Nonparametric Effects

Df Npar Df Npar F Pr(F)

(Intercept) 1

s(AInstrumental.firmness, 2) 1 1 0.77410 0.4046

s(AJuice.release, 2) 1 1 1.14455 0.3159

s(ASoluble.solids, 2) 1 1 0.40054 0.5444

s(ATitratable.acidity, 2) 1 1 1.04256 0.3371

summary(MJuics3)

Call: gam(formula = SJuiciness ~ s(AInstrumental.firmness, 3) + s(AJuice.release,

3) + s(ASoluble.solids, 3) + s(ATitratable.acidity, 3), data = dataval)

Deviance Residuals:

2 3 4 5 6 7 8 9

4.06906 -0.28794 -1.14197 -0.20378 -1.77053 0.68596 2.67061 -0.38020

12 13 14 15 16 17 18 19

-4.06569 3.64093 -1.10219 -0.08048 1.03191 -1.13588 2.64431 -3.02135

20

-1.55275

(Dispersion Parameter for gaussian family taken to be 20.1907)

Null Deviance: 1002.575 on 16 degrees of freedom

Residual Deviance: 80.7624 on 4 degrees of freedom

AIC: 102.735

1 observation deleted due to missingness

Number of Local Scoring Iterations: 4

DF for Terms and F-values for Nonparametric Effects

Df Npar Df Npar F Pr(F)

(Intercept) 1

s(AInstrumental.firmness, 3) 1 2 0.78051 0.5174

s(AJuice.release, 3) 1 2 1.04430 0.4316

s(ASoluble.solids, 3) 1 2 0.31434 0.7468

s(ATitratable.acidity, 3) 1 2 1.25657 0.3772

anova(MJuicL3,MJuics2,MJuicL)

Analysis of Deviance Table

Model 1: SJuiciness ~ s(AInstrumental.firmness, 3) + s(AJuice.release,

3) + s(ASoluble.solids, 3) + s(ATitratable.acidity, 3)

Model 2: SJuiciness ~ s(AInstrumental.firmness, 2) + s(AJuice.release,

2) + s(ASoluble.solids, 2) + s(ATitratable.acidity, 2)

Model 3: SJuiciness ~ AInstrumental.firmness + AJuice.release + ASoluble.solids +

ATitratable.acidity

Resid. Df Resid. Dev Df Deviance Pr(>Chi)

1 4.0000 80.762

2 8.0002 155.997 -4.0002 -75.235 0.4444

3 12.0000 222.782 -3.9998 -66.785 0.5077

par(mfrow=c(2,2))

plot(MJuics2,se=TRUE)

plot(MJuics3,se=TRUE)

From this plot and tables it was concluded that ASoluble.solids is not needed. AJuice.release and ATitrable.Acidity may have non-linear effects and AInstrumental.firmness has a linear effect. Based on this, two models are defined, with different levels of curvature.

MJuicsSEL2 <- gam(SJuiciness ~ AInstrumental.firmness + s(AJuice.release,2) +

s(ATitratable.acidity,2) ,data=dataval)

MJuicsSEL3 <- gam(SJuiciness ~ AInstrumental.firmness + s(AJuice.release,3) +

s(ATitratable.acidity,3) ,data=dataval)

anova(MJuicL,MJuicsSEL2,MJuicsSEL3)

Analysis of Deviance Table

Model 1: SJuiciness ~ AInstrumental.firmness + AJuice.release + ASoluble.solids +

ATitratable.acidity

Model 2: SJuiciness ~ AInstrumental.firmness + s(AJuice.release, 2) +

s(ATitratable.acidity, 2)

Model 3: SJuiciness ~ AInstrumental.firmness + s(AJuice.release, 3) +

s(ATitratable.acidity, 3)

Resid. Df Resid. Dev Df Deviance Pr(>Chi)

1 12 222.78

2 11 177.48 0.99998 45.302 0.05808 .

3 9 113.53 2.00006 63.953 0.07927 .

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From this it is concluded that the model MJuicsSEL3 is the best model to predict SJuiciness.

### SSweetness, SCrispness and SMealiness

For briefness, it is chosen not to show all the results of the other three responses. The plot of the final models is shown

Ssweetness is mainly related to AJuicerelease (3 df) and smaller linear effects of the other three variables.

SCrispness is non-linear related to AInstrumental.firmness (3 df), with the other variables much less influential linear effects.

SMealiness again a 3 df smoother with much less influential effects.

### Discussion

#### GAM

GAM is suitable method to examine models where there are no (expected) interactions, but there are expected non-linear effects. It is simple to use. It shows which of the variables have the larger effects and how these effects approximately look like. However, it does have its problems. For instance, there is an artifact in AInstrumental.firmness around 60 where there is a small bump in the responses. If this is a representation of a physical reality, then the rest of the curves seems fairly smooth. It would also indicate that the amount of data (types of apples) needs to be much larger to get a good understanding of the relations between the variables. A second problem is the absence of any interactions. In much of the data interactions between variables are needed.

#### Overall model for liking

The various parts of the model have now been defined. But, these separate parts do not make a whole model. This needs to be assembled and subsequently compared with a more traditional model such as (multiblock) PLS.

## No comments:

## Post a Comment