Probability and statistics share one toolkit for handling uncertainty. Probability turns assumptions about random events into numbers between 00 and 11, while descriptive statistics compress a dataset into a few summary numbers — the mean, the variance, and the standard deviation. This guide covers the core probability rules, conditional probability, and the spread measures, with the formulas, short derivations, and worked examples you need to actually compute answers.

0P(A)1,σ=(xiμ)2N0 \le P(A) \le 1, \qquad \sigma = \sqrt{\frac{\sum (x_i - \mu)^2}{N}}

Probability rules

The probability of an event AA is the long-run fraction of trials in which AA happens. For a finite sample space where every outcome is equally likely,

P(A)=number of favorable outcomestotal number of outcomes.P(A) = \frac{\text{number of favorable outcomes}}{\text{total number of outcomes}}.

Three rules generate almost everything else.

Complement rule

Every event either happens or it does not, and those two cases cover the whole sample space:

P(A)+P(Ac)=1P(Ac)=1P(A).P(A) + P(A^c) = 1 \quad\Longrightarrow\quad P(A^c) = 1 - P(A).

This is the fastest route to "at least one" problems, where the complement ("none") is easier to count.

Addition rule

For the probability that AA or BB occurs, add the two probabilities but subtract the overlap so it is not counted twice:

P(AB)=P(A)+P(B)P(AB).P(A \cup B) = P(A) + P(B) - P(A \cap B).

If AA and BB are mutually exclusive (they cannot both happen), then P(AB)=0P(A \cap B) = 0 and the rule reduces to P(AB)=P(A)+P(B)P(A \cup B) = P(A) + P(B).

Multiplication rule

For the probability that AA and BB both occur,

P(AB)=P(A)P(BA).P(A \cap B) = P(A)\, P(B \mid A).

If the events are independent, knowing AA tells you nothing about BB, so P(BA)=P(B)P(B \mid A) = P(B) and the rule simplifies to P(AB)=P(A)P(B)P(A \cap B) = P(A)\,P(B).

Situation              Use this
---------------------  ----------------------------------
"at least one"         complement: 1 - P(none)
"A or B"               addition: P(A)+P(B)-P(A∩B)
"A and B"              multiplication: P(A)·P(B|A)
mutually exclusive     P(A∩B) = 0
independent            P(B|A) = P(B)

Conditional probability

Conditional probability asks: given that BB already happened, how likely is AA? It is defined as

P(AB)=P(AB)P(B),P(B)>0.P(A \mid B) = \frac{P(A \cap B)}{P(B)}, \qquad P(B) > 0.

Why it is defined this way

Once you know BB occurred, BB becomes the new sample space — outcomes outside BB are impossible now. So you restrict attention to the part of AA that lives inside BB, namely ABA \cap B, and rescale by the total probability of BB so the conditional probabilities still sum to 11. Dividing by P(B)P(B) is exactly that rescaling. Rearranging the definition gives back the multiplication rule P(AB)=P(B)P(AB)P(A \cap B) = P(B)\,P(A \mid B), which is why the two are really one idea.

From here, swapping the roles of AA and BB leads to Bayes' theorem:

P(AB)=P(BA)P(A)P(B).P(A \mid B) = \frac{P(B \mid A)\,P(A)}{P(B)}.

Worked example 1 — conditional probability

A box has 1010 marbles: 44 red and 66 blue. You draw two without replacement. What is the probability that both are red?

Let AA be "first red" and BB be "second red." Then P(A)=410P(A) = \tfrac{4}{10}. After removing one red, 33 reds remain out of 99 marbles, so P(BA)=39P(B \mid A) = \tfrac{3}{9}. By the multiplication rule,

P(AB)=410×39=1290=2150.133.P(A \cap B) = \frac{4}{10} \times \frac{3}{9} = \frac{12}{90} = \frac{2}{15} \approx 0.133.

The "without replacement" detail is what makes P(BA)P(B)P(B \mid A) \ne P(B): the draws are dependent.

Mean, variance, and standard deviation

These three numbers describe a dataset's center and spread.

The mean (average) is

μ=1Ni=1Nxi.\mu = \frac{1}{N}\sum_{i=1}^{N} x_i.

The variance is the average squared distance from the mean:

σ2=1Ni=1N(xiμ)2.\sigma^2 = \frac{1}{N}\sum_{i=1}^{N} (x_i - \mu)^2.

The standard deviation is the square root of the variance:

σ=σ2=1Ni=1N(xiμ)2.\sigma = \sqrt{\sigma^2} = \sqrt{\frac{1}{N}\sum_{i=1}^{N} (x_i - \mu)^2}.

Why we square the deviations

If you simply added the raw deviations xiμx_i - \mu, the positives and negatives would cancel and the sum would always be 00 — useless as a spread measure. Squaring removes the sign and weights large deviations more heavily. But squaring also changes the units (dollars become dollars-squared), so taking the square root at the end returns the standard deviation to the original units, which is why σ\sigma is the number people actually report.

Population vs. sample

When your data is a sample meant to estimate a larger population, divide by N1N-1 instead of NN:

s2=1N1i=1N(xixˉ)2.s^2 = \frac{1}{N-1}\sum_{i=1}^{N} (x_i - \bar{x})^2.

Dividing by the smaller N1N-1 corrects a bias that otherwise makes the sample variance too small. Use NN for a full population, N1N-1 for a sample.

Worked example 2 — mean, variance, standard deviation

Find the mean, variance, and standard deviation of the dataset {2,4,4,4,5,5,7,9}\{2, 4, 4, 4, 5, 5, 7, 9\}, treating it as a full population (N=8N = 8).

Step 1 — mean:

μ=2+4+4+4+5+5+7+98=408=5.\mu = \frac{2+4+4+4+5+5+7+9}{8} = \frac{40}{8} = 5.

Step 2 — squared deviations:

x    x - μ    (x - μ)²
2     -3         9
4     -1         1
4     -1         1
4     -1         1
5      0         0
5      0         0
7      2         4
9      4        16
                ---
   sum =         32

Step 3 — variance and standard deviation:

σ2=328=4,σ=4=2.\sigma^2 = \frac{32}{8} = 4, \qquad \sigma = \sqrt{4} = 2.

So the data is centered at 55 and typically lies about 22 units away from the mean.

Try it yourself, then check the answer

A fair six-sided die is rolled once. What is the probability of rolling at least a 5 (that is, a 55 or a 66), and what is the probability of not doing so?

There are 22 favorable outcomes out of 66, so P(5)=26=13P(\ge 5) = \tfrac{2}{6} = \tfrac{1}{3}. By the complement rule, P(<5)=113=23P(< 5) = 1 - \tfrac{1}{3} = \tfrac{2}{3}. Notice it was faster to count the small favorable set and use the complement than to count all four "not" outcomes by hand.

If you want to confirm a longer calculation quickly, drop the dataset or probability setup into the GPAI Solver and compare each intermediate line against your own work.

Calculation traps to watch for

  • Adding probabilities that overlap. For P(AB)P(A \cup B), forgetting to subtract P(AB)P(A \cap B) double-counts the shared outcomes. Only skip the subtraction when the events are mutually exclusive.
  • Confusing P(AB)P(A \mid B) with P(BA)P(B \mid A). These are generally different. The condition (what is given) goes in the denominator. Reversing them is the classic base-rate mistake.
  • Forgetting the square root for standard deviation. Variance is in squared units; the standard deviation is its square root. Reporting σ2\sigma^2 as if it were σ\sigma overstates the spread.
  • Mixing up NN and N1N-1. Divide by NN for a full population and by N1N-1 for a sample estimate. Using the wrong one shifts every variance and standard deviation you report.
  • Treating dependent draws as independent. "Without replacement" changes the conditional probability of later draws; using P(B)P(B) instead of P(BA)P(B \mid A) gives the wrong joint probability.

Master these three ideas — the probability rules, the conditioning step, and the spread formulas — and most introductory statistics problems become a matter of identifying which tool the wording is pointing at.

Frequently Asked Questions

What is the difference between probability and statistics?
Probability starts from assumptions about random events and predicts how likely outcomes are. Statistics starts from observed data and summarizes or draws conclusions from it. They use the same core ideas of distributions, mean, and variance.
How do you calculate conditional probability?
Use P(A given B) = P(A and B) divided by P(B), with P(B) greater than zero. You restrict attention to outcomes inside B, then rescale by the total probability of B.
Why do we square the deviations when finding variance?
If you added raw deviations from the mean, positives and negatives would cancel to zero. Squaring removes the sign and weights large deviations more, giving a usable measure of spread.
What is the difference between variance and standard deviation?
Variance is the average squared distance from the mean, so it is in squared units. Standard deviation is the square root of the variance, which returns the measure to the original units of the data.
When do you divide by N versus N minus 1?
Divide by N for a full population. Divide by N minus 1 for a sample used to estimate a larger population, which corrects a bias that would otherwise make the variance too small.

Need help with a problem?

Upload your question and get a verified, step-by-step solution in seconds.

Open GPAI Solver →