Data
Data are the coctail data from sensominer package.Algorithm
The calculation is mostly inspired by the many PLS algorithms to which I was exposed when I was doing chemometrics. Scores and loadings may be obtained from each other by multiplying with the data matrix. In this case it means I just take a set of product scores and obtain the associated descriptors via a simple matrix multiplication. The resulting product and descriptor vectors can be used to reconstruct the original matrix; the best solution minimizes difference between the constructed and original data. For dimension two subtract reconstructed data from original data and repeat on residuals.Scaling
PCA has the possibility to have unit length scores or loadings, or R and Q mode if that is your favorite jargon. If one has a more singular value decomposition look, it is just where the eigenvalues go. At this point I made the choice to do that in the variable space.
Unique solution
PCA is known not to have one unique solution; each solution is equivalent to its mirror image. It seemed most elegant to do this completely at the end, after inspection of the data it seemed the location of product 12 was suitable for making the solution unique, since it was extreme on both dimensions. The final step (generated quantities) forces the location to be top right quadrant for data reported.Code
library(rstan)nprod <- 16
ndescr <- 13
sprofile <- as.matrix(scale(senso.cocktail,scale=FALSE))
datain <- list(
nproduct=nprod,
ndescriptor=ndescr,
profile=sprofile
)
model1 <- "
data {
int<lower=0> ndescriptor;
int<lower=0> nproduct;
matrix[nproduct,ndescriptor] profile;
}
parameters {
row_vector[nproduct] prodsp1;
row_vector[nproduct] prodsp2;
real<lower=0,upper=1> sigma1;
real<lower=0,upper=1> sigma2;
}
transformed parameters {
vector [ndescriptor] descrsp1;
vector [ndescriptor] descrsp2;
matrix[nproduct,ndescriptor] expected1;
matrix[nproduct,ndescriptor] expected2;
matrix[nproduct,ndescriptor] residual1;
descrsp1 <- profile'*prodsp1';
expected1 <- (descrsp1*prodsp1)';
residual1 <- profile-expected1;
descrsp2 <- profile'*prodsp2';
expected2 <- (descrsp2*prodsp2)';
}
model {
for (r in 1:nproduct) {
prodsp1[r] ~ normal(0,1);
prodsp2[r] ~ normal(0,1);
for (c in 1:ndescriptor) {
profile[r,c] ~ normal(expected1[r,c],sigma1);
residual1[r,c] ~ normal(expected2[r,c],sigma2);
}
}
}
generated quantities {
vector [ndescriptor] descrspace1;
vector [ndescriptor] descrspace2;
row_vector [nproduct] prodspace1;
row_vector [nproduct] prodspace2;
prodspace1 <-(
((prodsp1[12]>0)*prodsp1)-
((prodsp1[12]<0)*prodsp1)
);
prodspace2 <-(
((prodsp2[12]>0)*prodsp2)-
((prodsp2[12]<0)*prodsp2)
);
descrspace1 <-(
((prodsp1[12]>0)*descrsp1)-
((prodsp1[12]<0)*descrsp1)
);
descrspace2 <-(
((prodsp2[12]>0)*descrsp2)-
((prodsp2[12]<0)*descrsp2)
);
}
"
pars <- c('prodspace1','prodspace2','descrspace1','descrspace2')
fit1 <- stan(model_code = model1,
data = datain,
pars=pars)
Results
For comparison, first a standard biplot.Product space
It is not difficult to extract the samples and plot them. See end of post. One notable property of the plot is that the products are in ellipses with the minor axis towards the center. Apparently part of variation between MCMC samples is rotational freedom between dimensions. Other than that the solution is actually pretty close to the PCADescriptor space
The rotational freedom is even more clear here.Additional code
data
library(SensoMineR)data(cocktail)
biplot
pr <- prcomp(senso.cocktail)plot(pr)
biplot(pr)
product plot
fit1samps <- as.data.frame(fit1)prod <- reshape(fit1samps,
drop=names(fit1samps)[33:59],
direction='long',
varying=list(names(fit1samps)[1:16],
names(fit1samps)[17:32]),
timevar='sample',
times=1:16,
v.names=c('PDim1','PDim2')
)
prod <- prod[order(prod$PDim1),]
plot(prod$PDim1,prod$PDim2,
col=c(2,17,3,4,6,5,7:10,13,12,11,14:16)[prod$sample],
pch=46,
cex=2,
xlim=c(-1,1)*.75,
ylim=c(-1,1)*.75)
sa <- sapply(1:16,function(x)
c(sample=x,
Dim1=mean(prod$PDim1[prod$sample==x]),
Dim2=mean(prod$PDim2[prod$sample==x])))
sa <- as.data.frame(t(sa))
text(x=sa$Dim1,y=sa$Dim2,labels=sa$sample,cex=1.5)
descriptor plot
descr <- reshape(fit1samps,drop=names(fit1samps)[c(1:32,59)],
direction='long',
varying=list(names(fit1samps)[33:45],
names(fit1samps)[46:58]),
timevar='sample',
times=1:13,
v.names=c('DDim1','DDim2')
)
descr <- descr[order(descr$DDim1),]
plot(descr$DDim1,descr$DDim2,
col=c(2,1,3:13)[descr$sample],
pch=46,
cex=2,
xlim=c(-1,1)*9,
ylim=c(-1,1)*9)
sa <- sapply(1:13,function(x)
c(sample=x,
Dim1=mean(descr$DDim1[descr$sample==x]),
Dim2=mean(descr$DDim2[descr$sample==x])))
sa <- as.data.frame(t(sa))
text(x=sa$Dim1,y=sa$Dim2,labels=names(senso.cocktail))
Sir thanks for wonderful code, while I could reproduce the figure, I got a warning
ReplyDeleteTRANSLATING MODEL 'model1' FROM Stan CODE TO C++ CODE NOW.
DIAGNOSTIC(S) FROM PARSER:
Warning (non-fatal): Left-hand side of sampling statement (~) contains a non-linear transform of a parameter or local variable.
You must call increment_log_prob() with the log absolute determinant of the Jacobian of the transform.
Sampling Statement left-hand-side expression:
get_base1(residual1,r,c,"residual1",1) ~ normal_log(...)
(on win 8.1, R 3.1.1)
I am not able to load my data, How can I load a test.csv, please
Hi,
DeleteI know it gives that message, it does so for me too. It comes from the second dimension calculation. I actually think it is a linear transform, so I am ignoring it.
Regarding reading a csv, R has a function read.csv which works pretty well.