Covariance and correlation both describe how two variables move together, but they answer slightly different questions. The one-line version: covariance gives the direction of joint movement and keeps the original units, while correlation standardizes that relationship into a unitless number between 1-1 and 11.

Covariance itself measures whether two variables tend to be above or below their means together. A positive covariance means the variables usually move the same way relative to their averages; a negative covariance means one tends to be above average when the other is below.

Covariance Vs. Correlation, Side By Side

                 Covariance                  Correlation
Measures         direction of joint movement direction + standardized strength
Units            original units (x times y)  unitless
Range            no fixed range              between -1 and 1
Best for         original-units variation,   comparing across data sets
                 covariance matrices
Formula          s_xy                        r = s_xy / (s_x s_y)

Correlation standardizes covariance by dividing by the standard deviations, when those are nonzero:

r=sxysxsyr = \frac{s_{xy}}{s_x s_y}

That is why correlation is unitless and easy to compare across data sets, while covariance has no fixed range.

The Formulas, For Samples And Populations

For a sample of paired data, a common formula is

sxy=1n1i=1n(xixˉ)(yiyˉ)s_{xy} = \frac{1}{n-1}\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})

where xˉ\bar{x} and yˉ\bar{y} are the sample means. Each product (xixˉ)(yiyˉ)(x_i-\bar{x})(y_i-\bar{y}) is positive when the pair falls on the same side of both means, and negative when the pair falls on opposite sides.

For a full population rather than a sample, the denominator is typically NN instead of n1n-1:

Cov(X,Y)=1Ni=1N(xiμx)(yiμy)\mathrm{Cov}(X,Y) = \frac{1}{N}\sum_{i=1}^N (x_i-\mu_x)(y_i-\mu_y)

Use the sample version for sample data and the population version only when the data represents the entire population.

Reading The Sign

Covariance is built from paired deviations from the mean. If both deviations are positive, their product is positive; if both are negative, their product is also positive. Those pairs push covariance upward, because the variables move together relative to their centers. If one deviation is positive and the other negative, the product is negative, pulling covariance downward. So covariance is really an average of joint movement around the mean.

When To Use Which

  • Use covariance when you care about joint variation in the original units, or when it appears inside a larger calculation such as a covariance matrix.
  • Use correlation when you want a unitless summary that is easier to compare across data sets.

Covariance is especially common in covariance matrices, where each entry summarizes how two variables vary jointly. That matters in portfolio risk, principal component analysis, and multivariable modeling.

Worked Example: Study Hours And Quiz Scores

Suppose a small sample records study hours and quiz scores:

(1,70), (2,80), (3,90)(1,70),\ (2,80),\ (3,90)

First find the means:

xˉ=1+2+33=2\bar{x} = \frac{1+2+3}{3} = 2 yˉ=70+80+903=80\bar{y} = \frac{70+80+90}{3} = 80

Now the deviations and their products:

  • For (1,70)(1,70): (12)(7080)=(1)(10)=10(1-2)(70-80) = (-1)(-10) = 10
  • For (2,80)(2,80): (22)(8080)=0(2-2)(80-80) = 0
  • For (3,90)(3,90): (32)(9080)=(1)(10)=10(3-2)(90-80) = (1)(10) = 10

Add the products:

10+0+10=2010 + 0 + 10 = 20

Because this is sample covariance, divide by n1=2n-1 = 2:

sxy=202=10s_{xy} = \frac{20}{2} = 10

The covariance is positive, so more study time goes with higher quiz scores here. But 1010 is not a universal strength scale: its size depends on the units, hours times score points. Change the measurement scale and the covariance changes too, even if the pattern stays similar. This is exactly the case where correlation helps, because it strips out the units. To feel the contrast, recompute this data with the correlation coefficient and notice how standardizing the scales changes the interpretation.

Confusion Points To Watch

Treating A Large Covariance As Automatically Strong

A covariance of 100100 is not automatically stronger than a covariance of 55. The variables may simply be measured on larger scales.

Mixing Up Sample And Population Formulas

If your data is a sample, divide by n1n-1. If it is the whole population, divide by NN.

Thinking Zero Covariance Means No Relationship At All

A covariance near 00 means little linear co-movement around the means. It does not rule out a nonlinear relationship. If two variables are independent and the covariance exists, the covariance is 00; the reverse is not always true.

Reading Covariance As Causation

Covariance only describes how variables vary together. It does not explain why.

Frequently Asked Questions

What does covariance measure?
Covariance measures whether two variables tend to be above their means together, below their means together, or move in opposite directions.
Can covariance be negative?
Yes. A negative covariance means higher values of one variable tend to occur with lower values of the other, relative to their means.
What is the difference between covariance and correlation?
Covariance keeps the original units and scale, while correlation standardizes the relationship so the result is unitless and easier to compare across data sets.

Need help with a problem?

Upload your question and get a verified, step-by-step solution in seconds.

Open GPAI Solver →