Can someone knowledgable please give us examples of real life use of PCA. Not co...

anthony_doan · on Oct 5, 2019

If you want to do inference and hypothesis testing.

You need to save your degree of freedom. You want 10 observations to 20 observations per predictor. So you can use PCA to collapse a subset of predictors and keep the predictors you are inferring. This will help the sensitivity of your test.

Another thing is when you do linear regression or any modeling where it multicolinearity is a problem. This problem is where predictors are confounder or affect each other. PCA change basis so that new predictors are orthogonal to each other getting rid of multicolinearity problem.

A toy example is:

student gpa, sat score, math grade, height, hour of study

Where student GPA is the response or what you want to predict.

If you apply PCA to sat score (x1), math grade(x2), height(x3), and hour of study (x4) then it'll give you new predictors that is a linear combination of those predictors. Some statistic book will refer this to a sort of regression.

Anyway you may get new variables as:

new_x1 = 0.4sat score + 1.2math grade

new_x2 = 0.1* height + 0.5* hour of study

These new predictors are orthogonal to each other so they don't suffer from multicolinearity. You can now do linear regression using these predictors.

The problem is explanation, something you get grouping like height + hour of study.

Actually just look here for example: https://www.whitman.edu/Documents/Academics/Mathematics/2017...

Under "6.4 Example: Principal Component Analysis and Linear Regression"

asplake · on Oct 5, 2019

I use it for exactly that kind of purpose - highlighting interesting relative strengths and weaknesses in a 42-point assessment. So much better than benchmarking against some average, with the added advantage that it will keep finding interesting points even as scores improve.

Amazingly little code too. Numpy and Scypi are awesome :-)

overlordalex · on Oct 5, 2019

Its been a long time but we used PCA in remote sensing to reduce the number of bands into a smaller subset that are easier to handle.

Satellite data is collected using sensors that are multispecteral/hyperspectral (for example LandSat has 11 bands, but sometimes there are over 100) but this can be cumbersome to work with. PCA can be applied to the data so that you have a smaller subset that contains most of the original information that makes further processing faster/easier

objektif · on Oct 5, 2019

Sounds very cool. Howvever, when you transform the data using PCA the interpretation of the signals are different right? How do you approach that problem?

anthony_doan · on Oct 5, 2019

https://en.wikipedia.org/wiki/Principal_component_regression...

Step 3.

objektif · on Oct 5, 2019

I see this is another way to look at it. I was asking about how to interpret the components themselves. Your link suggests converting the the coefficients of PCA regression back to coefficients for the original variables.

markoman · on Oct 5, 2019

Since PCA is geared towards reducing dimensions, it would be an example of data which has many features (aka dimensions). Data on 'errors in a manufacturing line' would be a good example because you could be capturing a large number of variables which may be contributing towards a defective product. You would be capturing features like ambient temperature, speed of the line, which employees were present, etc. You would (virtually) be throwing in the kitchen sink for features (variables) in the hope of finding what could be causing defective Teslas, for instance.

What PCA does (to reduce this large number of dimensions) is hang this data on new set of dimensions by letting the data itself indicate them. PCA starts off by choosing its first axis based on the direction of the highest degree of variance. The second axis is then chosen by looking perpendicular (orthogonal) to the first and finding the highest variance here. Basically, you continue until you've captured a majority of the variance, which should be feasible within a lower number of dimensions than that which you started. Mathematically, these features are found via eigenvectors of the covariance matrix.

oriettaxx · on Oct 5, 2019

This real life example I am sure will help:

My teacher wanted to buy a car and he needed help on choosing; he wanted a "good" deal, and applied PCA to all models of car for sales:

His real question was:

* what are the most important variables that makes up car's PRICE? or, said in another way

* if I have to compare two cars that have the same price: with which car I get the best out of my money?

The answer was pretty surprising:

the most important variable is WEIGHT

:)

So, while you choose a car, always check for its weight! do two car have the same price? take the heavier one :) (this results relates to the 90s, are they still valid now? not sure: we need PCA)

I've been using this result since then, applying it in different context (which is, of course, not correct): when I am in doubt on which product to choose I always choose the heavier one. I would not use this 'method' to buy a speed bicycle, ...or to choose the best girl ;)

Then, you even have somebody stating that we are using this method even without this explanation https://www.securityinfowatch.com/integrators/article/122343...

objektif · on Oct 5, 2019

Haha nice one.

theophrastus · on Oct 5, 2019

In the area of chemoinformatice, in order to discover new types of chemical to address some disease one approach is to associate members of a large chemical database with some coordinate space and consider those chemicals which fall in some sense close to known useful pharmaceuticals. (As a simple example, let's say molecular weight along one axis, polarizability along another, number of hydrogen bond donors/acceptors, rotatable bonds, radius of gyration, and perhaps hundreds more) But there are problems with such a high dimensional space[1] particularly if one wants to do some useful statistics, cluster analysis, etc. So enter PCA as a means to lower the dimensionality to something more tractable. At the same time it gives you eigenvalues with a sense of what your target "cares" about among known chemical descriptors (low variability along one axis might indicate relative importance) versus physical factors with more permissible variation.

[1] https://en.wikipedia.org/wiki/Curse_of_dimensionality

magicalhippo · on Oct 5, 2019

It's been used in computer graphics to speed up rendering. One technique which was quite popular IIRC back in the days was to use clustered PCA for precomputed radiance transfer[1]. It even made its way into DirectX 9[2].

Can't comment on longevity, I went for realism over real-time not long after.

[1]: https://www.microsoft.com/en-us/research/video/clustered-pri...

[2]: https://docs.microsoft.com/en-us/windows/win32/direct3d9/pre...

breck · on Oct 5, 2019

We just published a typical GWAS paper that used PCA to sanity check whether the "ethnicity" reported by our patients aligned with what their genome told us.

We had 200,000 dimensions (ACGT's), which we reduced into 2 via PCA and sure enough if someone said they were "Filipino" then they generally appeared close to the other folks who said they were "Filipino".

https://breckuh.github.io/eopegwas/src/main.nb.html (chart titled: QC: PCA of SNPs shows clustering by reported ethnicity, as expected)

uoaei · on Oct 5, 2019

You can perform outlier detection with the 'autoencoder' architecture. Usually you hear this term in the context of neural networks but actually applies for any method which performs dimensionality reduction and which also has an inverse transform defined.

---

1) Reduce the dimensionality of your data, then perform the inverse transform. This will project your data onto a subset of the original space.

2) Measure the distance between the original data and this 'autoencoded' data. This measures the distance from the data to that particular subspace. Data which is 'described better' by the transform will be closer to the subspace and is more 'typical' of the data and its associated underlying generative process. Conversely, the data which is far away is atypical and can be considered an outlier or anomalous.

---

Precisely which dimensionality reduction technique (PCA, neural networks, etc.) is chosen depends on which assumptions you wish to encode into the model. The vanilla technique for anomaly/outlier detection using neural networks relies on this idea, but encodes almost zero assumptions beyond smoothness in the reduction operation and its inverse.

psv1 · on Oct 5, 2019

In addition to everything else that's been mentioned - you can simple use PCA as a preprocessing step to other algorithms. For example, you can apply a linear regression algorithm using the principal components as input instead of the original features in the dataset.

mr_toad · on Oct 6, 2019

They’re often used to construct Deprivation Indexes for geographic areas (neighbourhoods and administrative areas). They combine multiple socioeconomic indicators into a single measure (usually the length of the first principal component).

tomrod · on Oct 6, 2019

PCA

An intro example: principal component regression -- simplifies the inputs into a regression technique (can be used with other ML)

General algorithm is called Singular Value Decomposition, and can be used in lossy compression and other similar simplications.

analog31 · on Oct 5, 2019

Chemical spectroscopy. You might have spectra collected from a variety of samples, and want to highlight how they actually differ from one another, possibly en route to identifying an impurity or a manufacturing variation.

the_decider · on Oct 5, 2019

Topic modeling of text documents. The so-called LSA topic-modeling technique is basically SVD applied to text. And, as we all know, SVD is simply PCA without data-centralization.