Sunday, September 7, 2014

Mapping products in a space

I have read about people doing a Bayesian PCA at some points and always wondered how that would work. Then, at some point I thought of a way to do so. As ideas evolved my interest became not PCA as such, but rather in a prefmap. As a first step in that this post contains the mapping from a sensory space to a two dimensional space. For prefmap this step is commonly done via a PCA.

Data

Data are the coctail data from sensominer package.

Algorithm

The calculation is mostly inspired by the many PLS algorithms to which I was exposed when I was doing chemometrics. Scores and loadings may be obtained from each other by multiplying with the data matrix. In this case it means I just take a set of product scores and obtain the associated descriptors via a simple matrix multiplication. The resulting product and descriptor vectors can be used to reconstruct the original matrix; the best solution minimizes difference between the constructed and original data. For dimension two subtract reconstructed data from original data and repeat on residuals.

Scaling

PCA has the possibility to have unit length scores or loadings, or R and Q mode if that is your favorite jargon. If one has a more singular value decomposition look, it is just where the eigenvalues go. At this point I made the choice to do that in the variable space. 

Unique solution

PCA is known not to have one unique solution; each solution is equivalent to its mirror image. It seemed most elegant to do this completely at the end, after inspection of the data it seemed the location of product 12 was suitable for making the solution unique, since it was extreme on both dimensions. The final step (generated quantities) forces the location to be top right quadrant for data reported.

Code

library(rstan)
nprod <- 16
ndescr <- 13
sprofile <- as.matrix(scale(senso.cocktail,scale=FALSE))
datain <- list(
    nproduct=nprod,
    ndescriptor=ndescr,
    profile=sprofile
    )
   
model1 <- "
data {
        int<lower=0> ndescriptor;
        int<lower=0> nproduct;
        matrix[nproduct,ndescriptor] profile;
}
parameters {
    row_vector[nproduct] prodsp1;
    row_vector[nproduct] prodsp2;
    real<lower=0,upper=1> sigma1;
    real<lower=0,upper=1> sigma2;
}
transformed parameters {
   vector [ndescriptor] descrsp1;
   vector [ndescriptor] descrsp2;
   matrix[nproduct,ndescriptor] expected1;  
   matrix[nproduct,ndescriptor] expected2;  
   matrix[nproduct,ndescriptor] residual1;  

   descrsp1 <- profile'*prodsp1';
   expected1 <- (descrsp1*prodsp1)';
   residual1 <- profile-expected1;
   descrsp2 <- profile'*prodsp2';
   expected2 <- (descrsp2*prodsp2)';
}
model {  
     for (r in 1:nproduct) {
        prodsp1[r] ~ normal(0,1);
        prodsp2[r] ~ normal(0,1);
        for (c in 1:ndescriptor) {
           profile[r,c] ~ normal(expected1[r,c],sigma1);
           residual1[r,c] ~ normal(expected2[r,c],sigma2);
        }
     }
}
generated quantities {
   vector [ndescriptor] descrspace1;
   vector [ndescriptor] descrspace2;
   row_vector [nproduct] prodspace1;
   row_vector [nproduct] prodspace2;
   prodspace1 <-(
                     ((prodsp1[12]>0)*prodsp1)-
                     ((prodsp1[12]<0)*prodsp1)
                  );
   prodspace2 <-(
                     ((prodsp2[12]>0)*prodsp2)-
                     ((prodsp2[12]<0)*prodsp2)
                  ); 
   descrspace1 <-(
                     ((prodsp1[12]>0)*descrsp1)-
                     ((prodsp1[12]<0)*descrsp1)
                  );
   descrspace2 <-(
                     ((prodsp2[12]>0)*descrsp2)-
                     ((prodsp2[12]<0)*descrsp2)
                  ); 
}
"
pars <- c('prodspace1','prodspace2','descrspace1','descrspace2')
fit1 <- stan(model_code = model1,
    data = datain,
    pars=pars)

Results

For comparison, first a standard biplot.

Product space

It is not difficult to extract the samples and plot them. See end of post. One notable property of the plot is that the products are in ellipses with the minor axis towards the center. Apparently part of variation between MCMC samples is rotational freedom between dimensions. Other than that the solution is actually pretty close to the PCA

Descriptor space

The rotational freedom is even more clear here.

Additional code

data

library(SensoMineR)
data(cocktail)

biplot

pr <- prcomp(senso.cocktail) 
plot(pr)
biplot(pr)

product plot

fit1samps <- as.data.frame(fit1)

prod <- reshape(fit1samps,
    drop=names(fit1samps)[33:59],
    direction='long',
    varying=list(names(fit1samps)[1:16],
        names(fit1samps)[17:32]),
    timevar='sample',
    times=1:16,
    v.names=c('PDim1','PDim2')
)
   
prod <- prod[order(prod$PDim1),]
plot(prod$PDim1,prod$PDim2,
    col=c(2,17,3,4,6,5,7:10,13,12,11,14:16)[prod$sample],
    pch=46,
    cex=2,
    xlim=c(-1,1)*.75,
    ylim=c(-1,1)*.75)
sa <- sapply(1:16,function(x)
        c(sample=x,
            Dim1=mean(prod$PDim1[prod$sample==x]),
            Dim2=mean(prod$PDim2[prod$sample==x])))
sa <- as.data.frame(t(sa))
text(x=sa$Dim1,y=sa$Dim2,labels=sa$sample,cex=1.5)

descriptor plot

descr <- reshape(fit1samps,
    drop=names(fit1samps)[c(1:32,59)],
    direction='long',
    varying=list(names(fit1samps)[33:45],
        names(fit1samps)[46:58]),
    timevar='sample',
    times=1:13,
    v.names=c('DDim1','DDim2')
)

descr <- descr[order(descr$DDim1),]
plot(descr$DDim1,descr$DDim2,
    col=c(2,1,3:13)[descr$sample],
    pch=46,
    cex=2,
    xlim=c(-1,1)*9,
    ylim=c(-1,1)*9)
sa <- sapply(1:13,function(x)
        c(sample=x,
            Dim1=mean(descr$DDim1[descr$sample==x]),
            Dim2=mean(descr$DDim2[descr$sample==x])))
sa <- as.data.frame(t(sa))
text(x=sa$Dim1,y=sa$Dim2,labels=names(senso.cocktail))

2 comments:

  1. Sir thanks for wonderful code, while I could reproduce the figure, I got a warning

    TRANSLATING MODEL 'model1' FROM Stan CODE TO C++ CODE NOW.
    DIAGNOSTIC(S) FROM PARSER:

    Warning (non-fatal): Left-hand side of sampling statement (~) contains a non-linear transform of a parameter or local variable.
    You must call increment_log_prob() with the log absolute determinant of the Jacobian of the transform.
    Sampling Statement left-hand-side expression:
    get_base1(residual1,r,c,"residual1",1) ~ normal_log(...)
    (on win 8.1, R 3.1.1)

    I am not able to load my data, How can I load a test.csv, please

    ReplyDelete
    Replies
    1. Hi,

      I know it gives that message, it does so for me too. It comes from the second dimension calculation. I actually think it is a linear transform, so I am ignoring it.

      Regarding reading a csv, R has a function read.csv which works pretty well.

      Delete