5.1.3 Dimensionality reduction
If some characters seem to be missing, it's because MathJax is not loaded correctly. Refreshing the page should fix it.
- [E] Why do we need dimensionality reduction?
- [E] Eigendecomposition is a common factorization technique used for dimensionality reduction. Is the eigendecomposition of a matrix always unique?
- [M] Name some applications of eigenvalues and eigenvectors.
- [M] We want to do PCA on a dataset of multiple features in different ranges. For example, one is in the range 0-1 and one is in the range 10 - 1000. Will PCA work on this dataset?
- [H] Under what conditions can one apply eigendecomposition? What about SVD?
- What is the relationship between SVD and eigendecomposition?
- What’s the relationship between PCA and SVD?
- [H] How does t-SNE (T-distributed Stochastic Neighbor Embedding) work? Why do we need it?
In case you need a refresh on PCA, here's an explanation without any math.
Assume that your grandma likes wine and would like to find characteristics that best describe wine bottles sitting in her cellar. There are many characteristics we can use to describe a bottle of wine including age, price, color, alcoholic content, sweetness, acidity, etc. Many of these characteristics are related and therefore redundant. Is there a way we can choose fewer characteristics to describe our wine and answer questions such as: which two bottles of wine differ the most?
PCA is a technique to construct new characteristics out of the existing characteristics. For example, a new characteristic might be computed as
age - acidity + price
or something like that, which we call a linear combination.To differentiate our wines, we'd like to find characteristics that strongly differ across wines. If we find a new characteristic that is the same for most of the wines, then it wouldn't be very useful. PCA looks for characteristics that show as much variation across wines as possible, out of all linear combinations of existing characteristics. These constructed characteristics are principal components of our wines.
If you want to see a more detailed, intuitive explanation of PCA with visualization, check out amoeba's answer on StackOverflow. This is possibly the best PCA explanation I've ever read.