In: Operations Management
Does LASSO select the Prinicpal components according to the ranks based on variance explained by it? If the sequence to select Prinicipal components is different from the order of Prinicipal components ranked by variance, what does this indicate? Can you think of other pros (apart from removing collinearity) and cons of preprocessing predictors with PCA in regression models?
The columns in TT, the scores from PCA, are orthogonal to each other, obtaining independence for the least-squares step.
These TT scores can be calculated even if there are missing data in XX.
We have reduced the assumption of errors in XX, since Xˆ=TP′+EX^=TP′+E. We have replaced it with the assumption that there is no error in TT, a more realistic assumption, since PCA separates the noise from the systematic variation in XX. The T'sT's are expected to have much less noise than the X'sX's.
The relationship of each score column in TT to vector yy can be interpreted independently of each other.
Using MLR requires that N>KN>K, but with PCR this changes to N>AN>A; an assumption that is more easily met for short and wide XX matrices with many correlated columns.
There is much less need to resort to selecting variables from XX; the general approach is to use the entire XX matrix to fit the PCA model. We actually use the correlated columns in XX to stabilize the PCA solution, much in the same way that extra data improves the estimate of a mean (recall the central limit theorem).
But by far one of the greatest advantages of PCR though is the free consistency check that one gets on the raw data, which you don’t have for MLR. Always check the SPE and Hotelling’s T2T2 value for a new observation during the first step. If SPE is close to the model plane, and T2T2 is within the range of the previous T2T2 values, then the prediction from the second step should be reasonable.