Principal component analysis, or PCA, turns several numeric variables into a smaller set of new variables that preserve as much variation as possible. If you searched for "what is PCA," the short answer is: it rotates the data to a new set of axes, then keeps the axes that explain the most spread.

Those new axes are called principal components. In standard PCA, the first component captures the greatest possible variance, the second captures the greatest remaining variance while staying orthogonal to the first, and later components continue the same pattern.

What PCA Is Trying To Find

Imagine a cloud of points in a high-dimensional space. PCA looks for the directions where that cloud spreads out the most.

If most of the spread happens along one or two directions, the data may be summarized well with one or two principal components instead of the full original variable set. That is why PCA is used for dimensionality reduction, visualization, compression, and preprocessing.

For centered data, the first principal component solves

maximize Var(Xw)subject to w=1,\text{maximize } \mathrm{Var}(Xw) \quad \text{subject to } \|w\| = 1,

where XX is the centered data matrix and ww is a direction vector.

The centering condition matters. Without centering, the chosen directions can be driven by the mean level of the variables instead of by how the data varies around that mean.

How To Compute PCA

The standard workflow is short:

  1. Put observations in rows and variables in columns.
  2. Center each variable by subtracting its mean.
  3. If the variables use very different units and scale should not dominate, standardize them as well.
  4. Compute the covariance matrix of the centered data.
  5. Find its eigenvectors and eigenvalues.

The eigenvectors give the principal directions. The eigenvalues tell you how much variance each direction explains.

You will also see PCA computed with the singular value decomposition, or SVD. For centered data, that gives the same principal subspaces and is often the preferred numerical method in practice.

PCA Worked Example In 2D

Take three 2D observations:

(1,1),(2,2),(3,3).(1,1), \quad (2,2), \quad (3,3).

These points lie exactly on the line y=xy=x, so we already expect one dominant direction.

First center the data by subtracting the mean (2,2)(2,2):

(1,1),(0,0),(1,1).(-1,-1), \quad (0,0), \quad (1,1).

For this centered dataset, the covariance matrix is proportional to

(1111).\begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix}.

Its two orthogonal eigenvector directions are

12(1,1)and12(1,1).\frac{1}{\sqrt{2}}(1,1) \quad \text{and} \quad \frac{1}{\sqrt{2}}(1,-1).

The first direction points along the line where the data actually varies. The second points across that line.

Project the centered points onto the first direction:

(1,1)2,(0,0)0,(1,1)2.(-1,-1) \mapsto -\sqrt{2}, \quad (0,0) \mapsto 0, \quad (1,1) \mapsto \sqrt{2}.

Project them onto the second direction:

(1,1)0,(0,0)0,(1,1)0.(-1,-1) \mapsto 0, \quad (0,0) \mapsto 0, \quad (1,1) \mapsto 0.

So all of the variation is along 12(1,1)\frac{1}{\sqrt{2}}(1,1), and none is along 12(1,1)\frac{1}{\sqrt{2}}(1,-1). In this special case, one principal component keeps the full pattern of variation with one number per point.

That is PCA in its simplest form. It rotates the coordinate system to line up with the data, then asks which rotated coordinates are worth keeping.

What Principal Components Mean

Each principal component is a linear combination of the original variables.

If the first component looks like

z1=0.7x1+0.7x2,z_1 = 0.7x_1 + 0.7x_2,

that means the main direction of variation is roughly an equal-weight combination of the first two variables. The exact interpretation depends on the variables and on whether the data was only centered or also standardized.

The scores are the coordinates of each observation after projection onto the principal directions. The loadings describe how strongly each original variable contributes to a component.

Common PCA Mistakes

Skipping Centering

Standard PCA is usually applied to centered data. If you skip centering, the result may reflect the average level of the variables more than the variation you actually care about.

Ignoring Scale

If one variable is measured in dollars and another in millimeters, the larger-scale variable can dominate the variance calculation. Standardizing is often appropriate when units differ and relative scale should not decide the answer.

Thinking PCA Finds The Most Meaningful Feature

PCA finds directions of large variance, not necessarily directions with the best causal meaning or the best class separation. High variance and high usefulness are not always the same thing.

Treating Low-Dimensional Projections As Lossless

Keeping only the first few components is an approximation. It can be excellent, but it still discards some information unless the remaining components have exactly zero variance.

When PCA Is Useful

PCA is common when variables are correlated and you want a simpler representation of the data.

Typical uses include:

  • reducing the number of input features before modeling
  • visualizing high-dimensional data in two or three dimensions
  • compressing measurements while keeping most of the variance
  • identifying dominant patterns in finance, biology, image analysis, and signal processing

The method is most useful when variance-based structure is a reasonable summary of the problem.

Key Takeaway

PCA finds the directions of greatest variance and treats them as the most informative axes of the data. Centering first is essential, and the share of variance each component explains tells you how much structure you keep when you reduce dimensions.

Frequently Asked Questions

What does PCA actually do?
Principal component analysis turns several numeric variables into a smaller set of new variables that preserve as much variation as possible. It rotates the data to a new set of axes, called principal components, and keeps the axes that explain the most spread. The first component captures the greatest variance, and each later component captures the most remaining variance while staying orthogonal to the earlier ones.
Why do you center the data before PCA?
Centering means subtracting each variable's mean. Without it, the chosen directions can be driven by the mean level of the variables instead of by how the data varies around that mean. Standard PCA is defined on centered data, where the first component is the unit direction that maximizes the variance of the projected data.
When should you standardize variables before PCA?
Standardize when the variables use very different units and you do not want scale to dominate the result. Because PCA looks for directions of maximum variance, a variable measured in large units can dominate the components purely because of its scale. Standardizing puts variables on a comparable footing before computing the covariance matrix.
How are eigenvalues and eigenvectors used in PCA?
After centering the data, you compute the covariance matrix and find its eigenvectors and eigenvalues. The eigenvectors give the principal directions, and the eigenvalues tell you how much variance each direction explains. PCA can also be computed with the singular value decomposition, which gives the same principal subspaces for centered data and is often preferred numerically.
What is PCA used for?
PCA is used for dimensionality reduction, visualization, compression, and preprocessing. If most of the spread in a high-dimensional point cloud happens along one or two directions, the data can be summarized well with one or two principal components instead of the full original variable set, making it easier to plot, store, or feed into other methods.

Need help with a problem?

Upload your question and get a verified, step-by-step solution in seconds.

Open GPAI Solver →