Shilpa Blog

PCA— Basic technique in Machine learning to reduce features with basic Math

Principal component Analysis is the technique used to reduce the dimensions or the columns or the features of a dataset. It also preserves as much information as possible.

To do this process we should know 3 main concepts

Eigen Vectors and Eigen Values
Variance, Covariance and Covariance Matrix
Projection Matrix with the number of features/columns required.

Let us understand each one of them in detail

Eigen Vectors and Eigen values

Linear Algebra made simple — Eigen Vectors and Eigen Values

Variance, Covariance and Covariance Matrix

Variance — This is simply a measure of how spread out the values are from its mean. For a data set X with n values, the variance is given by

If the variance is large the data points are spread out from the mean

If the variance is small, the data points are close to the mean.

Covariance — Covariance is a measure of how two variables change together. It tell us whether there is a positive or negative linear relationship between two variables.

For two datasets X and Y, the covariance is calculated as:

Positive Covariance means that as X increases, Y tends to increase
Negative Covariance means that as X increases, Y tends to decrease.
Zero Covariance means there is no linear relationship between the two variables.

Covariance Matrix — It is a square matrix that shows the covariance between multiple variables.

For Example, X, Y and Z the covariance matrix is

It contains Variances of each variable along the diagonal and Covariances between the pairs of variables on the off-diagonal elements.

Let’s consider a simple dataset of two variables, X and Y:

Step 1: Calculate Variance

Mean of X is Xˉ = 3

Mean of Y is Yˉ = 6

Variance of X :

Variance of Y :

Step 2: Calculate covariance X and Y

Step 3 : Covariance Matrix

Plot to show covariance matrix along with best fit line

The formula for Covariance matrix in terms of matrices can be given as

Projection Matrix

Projection is nothing but the shadow of the point in a vector in the other plane or line.

Think of a flashlight shining onto an object. The shadow it casts on a flat surface is the “projection” of the object. In the same way, a projection matrix calculates the shadow (or projection) of a vector onto a lower-dimensional space like a line or plane.

A projection matrix is a mathematical tool used to project a vector onto a subspace (like a line or plane). In simpler terms, it “projects” or “shadows” a vector onto a lower dimension while preserving some geometric properties.

Let’s take an example with vectors in 2D space:

Suppose you have a vector v, and you want to project it onto another vector a, which lies along a line.

The projection of v onto a is essentially a “shadow” of v on the line defined by a.

If you want to project v onto a, the projection is given by

This gives you the vector projection of v onto a. The projection matrix helps automate this process for any vector you want to project onto a subspace.

For any vector a, the projection matrix P that projects any vector onto a is