PCA — Principal Component Analysis Explained

Principal component analysis, or PCA, turns several numeric variables into a smaller set of new variables that preserve as much variation as possible. If you searched for "what is PCA," the short answer is: it rotates the data to a new set of axes, then keeps the axes that explain the most spread.

Those new axes are called principal components. In standard PCA, the first component captures the greatest possible variance, the second captures the greatest remaining variance while staying orthogonal to the first, and later components continue the same pattern.

What PCA Is Trying To Find

Imagine a cloud of points in a high-dimensional space. PCA looks for the directions where that cloud spreads out the most.

If most of the spread happens along one or two directions, the data may be summarized well with one or two principal components instead of the full original variable set. That is why PCA is used for dimensionality reduction, visualization, compression, and preprocessing.

For centered data, the first principal component solves

\text{maximize } \mathrm{Var}(Xw) \quad \text{subject to } \|w\| = 1,

where $X$ is the centered data matrix and $w$ is a direction vector.

The centering condition matters. Without centering, the chosen directions can be driven by the mean level of the variables instead of by how the data varies around that mean.

How To Compute PCA

The standard workflow is short:

Put observations in rows and variables in columns.
Center each variable by subtracting its mean.
If the variables use very different units and scale should not dominate, standardize them as well.
Compute the covariance matrix of the centered data.
Find its eigenvectors and eigenvalues.

The eigenvectors give the principal directions. The eigenvalues tell you how much variance each direction explains.

You will also see PCA computed with the singular value decomposition, or SVD. For centered data, that gives the same principal subspaces and is often the preferred numerical method in practice.

PCA Worked Example In 2D

Take three 2D observations:

(1,1), \quad (2,2), \quad (3,3).

These points lie exactly on the line $y=x$ , so we already expect one dominant direction.

First center the data by subtracting the mean $(2,2)$ :

(-1,-1), \quad (0,0), \quad (1,1).

For this centered dataset, the covariance matrix is proportional to

\begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix}.

Its two orthogonal eigenvector directions are

\frac{1}{\sqrt{2}}(1,1) \quad \text{and} \quad \frac{1}{\sqrt{2}}(1,-1).

The first direction points along the line where the data actually varies. The second points across that line.

Project the centered points onto the first direction:

(-1,-1) \mapsto -\sqrt{2}, \quad (0,0) \mapsto 0, \quad (1,1) \mapsto \sqrt{2}.

Project them onto the second direction:

(-1,-1) \mapsto 0, \quad (0,0) \mapsto 0, \quad (1,1) \mapsto 0.

So all of the variation is along $\frac{1}{\sqrt{2}}(1,1)$ , and none is along $\frac{1}{\sqrt{2}}(1,-1)$ . In this special case, one principal component keeps the full pattern of variation with one number per point.

That is PCA in its simplest form. It rotates the coordinate system to line up with the data, then asks which rotated coordinates are worth keeping.

What Principal Components Mean

Each principal component is a linear combination of the original variables.

If the first component looks like

z_1 = 0.7x_1 + 0.7x_2,

that means the main direction of variation is roughly an equal-weight combination of the first two variables. The exact interpretation depends on the variables and on whether the data was only centered or also standardized.

The scores are the coordinates of each observation after projection onto the principal directions. The loadings describe how strongly each original variable contributes to a component.

Common PCA Mistakes

Skipping Centering

Standard PCA is usually applied to centered data. If you skip centering, the result may reflect the average level of the variables more than the variation you actually care about.

Ignoring Scale

If one variable is measured in dollars and another in millimeters, the larger-scale variable can dominate the variance calculation. Standardizing is often appropriate when units differ and relative scale should not decide the answer.

Thinking PCA Finds The Most Meaningful Feature

PCA finds directions of large variance, not necessarily directions with the best causal meaning or the best class separation. High variance and high usefulness are not always the same thing.

Treating Low-Dimensional Projections As Lossless

Keeping only the first few components is an approximation. It can be excellent, but it still discards some information unless the remaining components have exactly zero variance.

When PCA Is Useful

PCA is common when variables are correlated and you want a simpler representation of the data.

Typical uses include:

reducing the number of input features before modeling
visualizing high-dimensional data in two or three dimensions
compressing measurements while keeping most of the variance
identifying dominant patterns in finance, biology, image analysis, and signal processing

The method is most useful when variance-based structure is a reasonable summary of the problem.

Try A Similar Problem

Plot the points $(1,2)$ , $(2,3)$ , $(3,4)$ , and $(4,5)$ . Center them, then compare their spread along the directions $(1,1)$ and $(1,-1)$ . That small exercise makes it clear why PCA picks one direction as important and treats the other as mostly redundant.

If you want to go one step further, try your own version with points that are not perfectly on a line and compare how much variance the first component explains with how much the second explains.

Need help with a problem?

Upload your question and get a verified, step-by-step solution in seconds.

Open GPAI Solver →