Covariance measures whether two variables tend to be above or below their means together. A positive covariance means the variables usually move the same way relative to their averages. A negative covariance means one tends to be above average when the other is below average.

For most students, the key idea is this: the sign is usually more useful than the raw number. The size of covariance depends on the units of both variables, so it is not a clean strength scale by itself.

Covariance Formula For Samples And Populations

For a sample of paired data, a common formula is

sxy=1n1i=1n(xixˉ)(yiyˉ)s_{xy} = \frac{1}{n-1}\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})

Here xˉ\bar{x} and yˉ\bar{y} are the sample means. Each product (xixˉ)(yiyˉ)(x_i-\bar{x})(y_i-\bar{y}) is positive when the pair falls on the same side of both means, and negative when the pair falls on opposite sides.

If you are working with a full population rather than a sample, the denominator is typically NN instead of n1n-1:

Cov(X,Y)=1Ni=1N(xiμx)(yiμy)\mathrm{Cov}(X,Y) = \frac{1}{N}\sum_{i=1}^N (x_i-\mu_x)(y_i-\mu_y)

Use the sample version for sample data and the population version only when the data represents the entire population you want to describe.

How To Read The Sign Of Covariance

Covariance is built from paired deviations from the mean.

If both deviations are positive, their product is positive. If both are negative, their product is also positive. Those pairs push covariance upward because the variables are moving together relative to their centers.

If one deviation is positive and the other is negative, the product is negative. Those pairs pull covariance downward because the variables are moving in opposite directions.

So covariance is really an average of "joint movement around the mean."

Worked Example: Study Hours And Quiz Scores

Suppose a small sample records study hours and quiz scores:

(1,70), (2,80), (3,90)(1,70),\ (2,80),\ (3,90)

First find the means:

xˉ=1+2+33=2\bar{x} = \frac{1+2+3}{3} = 2 yˉ=70+80+903=80\bar{y} = \frac{70+80+90}{3} = 80

Now compute the deviations and their products:

  • For (1,70)(1,70): (12)(7080)=(1)(10)=10(1-2)(70-80) = (-1)(-10) = 10
  • For (2,80)(2,80): (22)(8080)=0(2-2)(80-80) = 0
  • For (3,90)(3,90): (32)(9080)=(1)(10)=10(3-2)(90-80) = (1)(10) = 10

Add the products:

10+0+10=2010 + 0 + 10 = 20

Because this is sample covariance, divide by n1=2n-1 = 2:

sxy=202=10s_{xy} = \frac{20}{2} = 10

The covariance is positive, so the variables move together in this sample. More study time goes with higher quiz scores here.

The important caution is that 1010 is not a universal strength scale. Its size depends on the units here: hours times score points. If you changed the measurement scale, the covariance would change too, even if the overall pattern stayed similar.

Covariance Vs Correlation: The Key Difference

Covariance and correlation are closely related, but they answer slightly different questions.

Covariance tells you the direction of joint movement and keeps the original scale. Correlation standardizes that relationship by dividing covariance by the standard deviations, when those standard deviations are nonzero:

r=sxysxsyr = \frac{s_{xy}}{s_x s_y}

That is why correlation is unitless and easier to compare across different data sets. Its value stays between 1-1 and 11, while covariance has no fixed range.

In practice:

  • Use covariance when you care about joint variation in the original units or when it appears inside a larger calculation, such as a covariance matrix.
  • Use correlation when you want a unitless summary that is easier to compare across data sets.

Common Mistakes With Covariance

Treating A Large Covariance As Automatically Strong

A covariance of 100100 is not automatically "stronger" than a covariance of 55. The variables may simply be measured on larger scales.

Mixing Up Sample And Population Formulas

If your data is a sample, dividing by n1n-1 is standard. If your data is the whole population of interest, dividing by NN is the population version.

Thinking Zero Covariance Means No Relationship At All

A covariance near 00 means little linear co-movement around the means. It does not rule out a nonlinear relationship.

If two variables are independent and the covariance exists, then the covariance is 00. The reverse is not always true.

Reading Covariance As Causation

Covariance only describes how variables vary together. It does not explain why they vary together.

When Covariance Is Used

Covariance appears in statistics, finance, machine learning, and data analysis whenever paired variables need to be studied together.

It is especially common in covariance matrices, where each entry summarizes how two variables vary jointly. That matters in areas like portfolio risk, principal component analysis, and multivariable modeling.

Try A Similar Problem

Take any three or four paired values, compute the two means, then multiply the paired deviations before averaging them. That one routine makes the sign of covariance feel much more concrete.

If you want the next step, compare the same data with the correlation coefficient and notice how standardizing the scales changes the interpretation.

Need help with a problem?

Upload your question and get a verified, step-by-step solution in seconds.

Open GPAI Solver →