📒
Machine & Deep Learning Compendium
  • The Machine & Deep Learning Compendium
    • Thanks Page
  • The Ops Compendium
  • Types Of Machine Learning
    • Overview
    • Model Families
    • Weakly Supervised
    • Semi Supervised
    • Active Learning
    • Online Learning
    • N-Shot Learning
    • Unlearning
  • Foundation Knowledge
    • Data Science
    • Data Science Tools
    • Management
    • Project & Program Management
    • Data Science Management
    • Calculus
    • Probability & Statistics
    • Probability
    • Hypothesis Testing
    • Feature Types
    • Multi Label Classification
    • Distribution
    • Distribution Transformation
    • Normalization & Scaling
    • Regularization
    • Information Theory
    • Game Theory
    • Multi CPU Processing
    • Benchmarking
  • Validation & Evaluation
    • Features
    • Evaluation Metrics
    • Datasets
    • Dataset Confidence
    • Hyper Parameter Optimization
    • Training Strategies
    • Calibration
    • Datasets Reliability & Correctness
    • Data & Model Tests
    • Fairness, Accountability, and Transparency
    • Interpretable & Explainable AI (XAI)
    • Federated Learning
  • Machine Learning
    • Algorithms 101
    • Meta Learning (AutoML)
    • Probabilistic, Regression
    • Data Mining
    • Process Mining
    • Label Algorithms
    • Clustering Algorithms
    • Anomaly Detection
    • Decision Trees
    • Active Learning Algorithms
    • Linear Separator Algorithms
    • Regression
    • Ensembles
    • Reinforcement Learning
    • Incremental Learning
    • Dimensionality Reduction Methods
    • Genetic Algorithms & Genetic Programming
    • Learning Classifier Systems
    • Recommender Systems
    • Timeseries
    • Fourier Transform
    • Digital Signal Processing (DSP)
    • Propensity Score Matching
    • Diffusion models
  • Classical Graph Models
    • Graph Theory
    • Social Network Analysis
  • Deep Learning
    • Deep Neural Nets Basics
    • Deep Neural Frameworks
    • Embedding
    • Deep Learning Models
    • Deep Network Optimization
    • Attention
    • Deep Neural Machine Vision
    • Deep Neural Tabular
    • Deep Neural Time Series
  • Audio
    • Basics
    • Terminology
    • Feature Engineering
    • Deep Neural Audio
    • Algorithms
  • Natural Language Processing
    • A Reality Check
    • NLP Tools
    • Foundation NLP
    • Name Matching
    • String Matching
    • TF-IDF
    • Language Detection Identification Generation (NLD, NLI, NLG)
    • Topics Modeling
    • Named Entity Recognition (NER)
    • SEARCH
    • Neural NLP
    • Tokenization
    • Decoding Algorithms For NLP
    • Multi Language
    • Augmentation
    • Knowledge Graphs
    • Annotation & Disagreement
    • Sentiment Analysis
    • Question Answering
    • Summarization
    • Chat Bots
    • Conversation
  • Generative AI
    • Methods
    • Gen AI Industry
    • Speech
    • Prompt
    • Fairness, Accountability, and Transparency In Prompts
    • Large Language Models (LLMs)
    • Vision
    • GPT
    • Mix N Match
    • Diffusion Models
    • GenAI Applications
    • Agents
    • RAG
    • Chat UI/UX
  • Experimental Design
    • Design Of Experiments
    • DOE Tools
    • A/B Testing
    • Multi Armed Bandits
    • Contextual Bandits
    • Factorial Design
  • Business Domains
    • Follow the regularized leader
    • Growth
    • Root Cause Effects (RCE/RCA)
    • Log Parsing / Templatization
    • Fraud Detection
    • Life Time Value (LTV)
    • Survival Analysis
    • Propaganda Detection
    • NYC TAXI
    • Drug Discovery
    • Intent Recognition
    • Churn Prediction
    • Electronic Network Frequency Analysis
    • Marketing
  • Product Management
    • Expanding Your Data Science Skills
    • Product Vision & Strategy
    • Product / Program Managers
    • Product Management Resources
    • Product Tools
    • User Experience Design (UX)
    • Business
    • Marketing
    • Ideation
  • MLOps (www.OpsCompendium.com)
  • DataOps (www.OpsCompendium.com)
  • Humor
Powered by GitBook
On this page
  • TSNE
  • PCA
  • SVD
  • KPCA
  • LDA - Linear discriminant analysis
  • KDA - KERNEL DISCRIMINANT ANALYSIS
  • LSA
  • ICA
  • MANIFOLD

Was this helpful?

  1. Machine Learning

Dimensionality Reduction Methods

PreviousIncremental LearningNextGenetic Algorithms & Genetic Programming

Last updated 3 years ago

Was this helpful?

  • A series on DR for dummies on medium part

  • in tensorflow

TSNE

PCA

  1. Machine learning mastery:

    1. What is missing is how the EigenDecomposition is calculated.

    1. Randomized svd

    2. Incremental svd

There are two things we are trying to accomplish with whitening:

  1. Make the features less correlated with one another.

  2. Give all of the features the same variance.

Whitening has two simple steps:

  1. Project the dataset onto the eigenvectors. This rotates the dataset so that there is no correlation between the components.

  2. Normalize the the dataset to have a variance of 1 for all components. This is done by simply dividing each component by the square root of its eigenvalue.

SVD

KPCA

  1. Finally, showing results how KPCA works well on noisy images, compared to PCA.

LDA - Linear discriminant analysis

PCA vs LDA:

Both Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are linear transformation techniques used for dimensionality reduction.

  • PCA can be described as an “unsupervised” algorithm, since it “ignores” class labels and its goal is to find the directions (the so-called principal components) that maximize the variance in a dataset.

  • In contrast to PCA, LDA is “supervised” and computes the directions (“linear discriminants”) that will represent the axes that maximize the separation between multiple classes.

Although it might sound intuitive that LDA is superior to PCA for a multi-class classification task where the class labels are known, this might not always the case.

For example, comparisons between classification accuracies for image recognition after using PCA or LDA show that :

  • In practice, it is also not uncommon to use both LDA and PCA in combination:

Best Practice: PCA for dimensionality reduction can be followed by an LDA. But before we skip to the results of the respective linear transformations, let us quickly recapitulate the purposes of PCA and LDA: PCA finds the axes with maximum variance for the whole data set where LDA tries to find the axes for best class separability. In practice, often a LDA is done followed by a PCA for dimensionality reduction.

** To fully understand the details please follow the LDA link to the original and very informative article

*** TODO: need some benchmarking for PCA\LDA\LSA\ETC..

KDA - KERNEL DISCRIMINANT ANALYSIS

LSA

  • reduction of the dimensionality

  • noise reduction

  • incorporating relations between terms into the representation.

  • SVD and PCA and "total least-squares" (and several other names) are the same thing. It computes the orthogonal transform that decorrelates the variables and keeps the ones with the largest variance. There are two numerical approaches: one by SVD of the (centered) data matrix, and one by Eigen decomposition of this matrix "squared" (covariance).

ICA

  1. While PCA is global, it finds global variables (with images we get eigen faces, good for reconstruction) that maximizes variance in orthogonal directions, and is not influenced by the TRANSPOSE of the data matrix.

  2. On the other hand, ICA is local and finds local variables (with images we get eyes ears, mouth, basically edges!, etc), ICA will result differently on TRANSPOSED matrices, unlike PCA, its also “directional” - consider the “cocktail party” problem. On documents, ICA gives topics.

  3. It helps, similarly to PCA, to help us analyze our data.

MANIFOLD

T-SNE

Sammons embedding mapping

IVIS

- the jist of it is that we assume a t- distribution on distances and remove those that are farther.normalized for density. T-dist used so that clusters are not clamped in the middle.

Iteratively moving from the left to the right

How why, all here.

(remove the mean from A, calculate cov(A), calculate eig(cov), A*eigK = PCA)

- what is an eigen vector - simply put its a vector that satisfies A*v = lambda*v, how to use eig() and how to confirm an eigenvector/eigenvalue and reconstruct the original A matrix.

(did not read)

(did not read)

(did not read)

, it is unclear what he is trying to show.

**

**

. (bottom line, do it on the train only.)

, (pca/zca whitening), ,

Autoencoder is PCA based on their equation, i.e. minimize the reconstruction error formula.

Then they say that PCA cant separate certain non-linear situations (circle within a circle), therefore they introduce kernel based PCA (using the kernel trick - like svm) which mapps the space to another linearly separable space, and performs PCA on it,

() - Linear Discriminant Analysis (LDA) is most commonly used as dimensionality reduction technique in the pre-processing step for pattern-classification and machine learning applications. The goal is to project a dataset onto a lower-dimensional space with good class-separability in order avoid overfitting (“curse of dimensionality”) and also reduce computational costs.

PCA tends to outperform LDA if the number of samples per class is relatively small (, A.M. Martinez et al., 2001).

- has KDA - This package provides the classic algorithms of supervised distance metric learning, together with some of the newest proposals.

is quite simple, you just use SVD to perform dimensionality reduction on the tf-idf vectors–that’s really all there is to it! And

Here is a very nice with code, explaining what are the three matrices, word clustering, sentence clustering and vector importance. They say that for sentence space we need to remove the first vector as it is correlated with sentence length.

*how to

PCA vs LSA: (, )

Sparse

(pca, sammon, isomap, tsne)

****

, this is a very important read.

In contrary to what it says on sklearn’s website, TSNE is not suited ONLY for visualization, you

“t-Distributed Stochastic Neighbor Embedding (t-SNE) is a () technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets.”

,

Parallex by uber for tsne \ pca visualization
About tsne / ae / pca
Does dim-reduction loses information - yes and no, in pca yes only if you use less than the entire matrix
Stat quest
TSNE algorithm
Are there cases where PCA more suitable than TSNE
PCA preserving pairwise distances over tSNE?
Another advice about using tsne and the possible misinterpetations
Expected value, variance, covariance
PCA
EigenDecomposition
SVD
PCA on large matrices!
PCA on Iris
What is PCA?
What is a covariance matrix?
Variance covariance matrix
Visualization of the first PCA vectors
A very nice introductory tutorial on how to use PCA
An in-depth tutorial on PCA (paper)
yet another tutorial paper on PCA (looks good)
How to use PCA in Cross validation and for train\test split
Another tutorial paper - looks decent
PCA whitening
Stanford tutorial
Stackoverflow (really good)
An explanation about SVD’s formulas.
A comparison / tutorial with code on pca vs lda - read!
A comprehensive tutorial on LDA - read!
Dim reduction with LDA - nice examples
Not to be confused with the other LDA
PCA vs. LDA
pyDML package
LSA
LSA CLUSTERING
tutorial about LSA,
interpret LSA vectors
intuition1
intuition2
LSA vs W2V
info on ICA with security returns.
The best tutorial that explains manifold (high to low dim projection/mapping/visuzation)
Many manifold methods used to visualize high dimensional data.
Comparing manifold methods
Code and in-depth tutorial on TSNE, mapping probabilities to distributions
A great example of using PCA and then TSNE to see clusters that arent visible with PCA only.
Misreading T-SNE
can also use it for data reduction
prize-winning
Comparing PCA and TSNE, then pushing PCA to TSNE and seeing what happens (as recommended in SKLEARN
TSNE + AUTOENCODER example
In tensorflow
Paper:
Git
docs
Ivis animate
Ivis explain
1
2
3
A small blog post about PCA, AE & TSNE
Visualizing PCA/TSNE using plots
Performance comparison between dim-reduction implementations, tsne etc.
First they say that