Probability and statistics share one toolkit for handling uncertainty. Probability turns assumptions about random events into numbers between and , while descriptive statistics compress a dataset into a few summary numbers — the mean, the variance, and the standard deviation. This guide covers the core probability rules, conditional probability, and the spread measures, with the formulas, short derivations, and worked examples you need to actually compute answers.
Probability rules
The probability of an event is the long-run fraction of trials in which happens. For a finite sample space where every outcome is equally likely,
Three rules generate almost everything else.
Complement rule
Every event either happens or it does not, and those two cases cover the whole sample space:
This is the fastest route to "at least one" problems, where the complement ("none") is easier to count.
Addition rule
For the probability that or occurs, add the two probabilities but subtract the overlap so it is not counted twice:
If and are mutually exclusive (they cannot both happen), then and the rule reduces to .
Multiplication rule
For the probability that and both occur,
If the events are independent, knowing tells you nothing about , so and the rule simplifies to .
Situation Use this
--------------------- ----------------------------------
"at least one" complement: 1 - P(none)
"A or B" addition: P(A)+P(B)-P(A∩B)
"A and B" multiplication: P(A)·P(B|A)
mutually exclusive P(A∩B) = 0
independent P(B|A) = P(B)
Conditional probability
Conditional probability asks: given that already happened, how likely is ? It is defined as
Why it is defined this way
Once you know occurred, becomes the new sample space — outcomes outside are impossible now. So you restrict attention to the part of that lives inside , namely , and rescale by the total probability of so the conditional probabilities still sum to . Dividing by is exactly that rescaling. Rearranging the definition gives back the multiplication rule , which is why the two are really one idea.
From here, swapping the roles of and leads to Bayes' theorem:
Worked example 1 — conditional probability
A box has marbles: red and blue. You draw two without replacement. What is the probability that both are red?
Let be "first red" and be "second red." Then . After removing one red, reds remain out of marbles, so . By the multiplication rule,
The "without replacement" detail is what makes : the draws are dependent.
Mean, variance, and standard deviation
These three numbers describe a dataset's center and spread.
The mean (average) is
The variance is the average squared distance from the mean:
The standard deviation is the square root of the variance:
Why we square the deviations
If you simply added the raw deviations , the positives and negatives would cancel and the sum would always be — useless as a spread measure. Squaring removes the sign and weights large deviations more heavily. But squaring also changes the units (dollars become dollars-squared), so taking the square root at the end returns the standard deviation to the original units, which is why is the number people actually report.
Population vs. sample
When your data is a sample meant to estimate a larger population, divide by instead of :
Dividing by the smaller corrects a bias that otherwise makes the sample variance too small. Use for a full population, for a sample.
Worked example 2 — mean, variance, standard deviation
Find the mean, variance, and standard deviation of the dataset , treating it as a full population ().
Step 1 — mean:
Step 2 — squared deviations:
x x - μ (x - μ)²
2 -3 9
4 -1 1
4 -1 1
4 -1 1
5 0 0
5 0 0
7 2 4
9 4 16
---
sum = 32
Step 3 — variance and standard deviation:
So the data is centered at and typically lies about units away from the mean.
Try it yourself, then check the answer
A fair six-sided die is rolled once. What is the probability of rolling at least a 5 (that is, a or a ), and what is the probability of not doing so?
There are favorable outcomes out of , so . By the complement rule, . Notice it was faster to count the small favorable set and use the complement than to count all four "not" outcomes by hand.
If you want to confirm a longer calculation quickly, drop the dataset or probability setup into the GPAI Solver and compare each intermediate line against your own work.
Calculation traps to watch for
- Adding probabilities that overlap. For , forgetting to subtract double-counts the shared outcomes. Only skip the subtraction when the events are mutually exclusive.
- Confusing with . These are generally different. The condition (what is given) goes in the denominator. Reversing them is the classic base-rate mistake.
- Forgetting the square root for standard deviation. Variance is in squared units; the standard deviation is its square root. Reporting as if it were overstates the spread.
- Mixing up and . Divide by for a full population and by for a sample estimate. Using the wrong one shifts every variance and standard deviation you report.
- Treating dependent draws as independent. "Without replacement" changes the conditional probability of later draws; using instead of gives the wrong joint probability.
Master these three ideas — the probability rules, the conditioning step, and the spread formulas — and most introductory statistics problems become a matter of identifying which tool the wording is pointing at.
Frequently Asked Questions
- What is the difference between probability and statistics?
- Probability starts from assumptions about random events and predicts how likely outcomes are. Statistics starts from observed data and summarizes or draws conclusions from it. They use the same core ideas of distributions, mean, and variance.
- How do you calculate conditional probability?
- Use P(A given B) = P(A and B) divided by P(B), with P(B) greater than zero. You restrict attention to outcomes inside B, then rescale by the total probability of B.
- Why do we square the deviations when finding variance?
- If you added raw deviations from the mean, positives and negatives would cancel to zero. Squaring removes the sign and weights large deviations more, giving a usable measure of spread.
- What is the difference between variance and standard deviation?
- Variance is the average squared distance from the mean, so it is in squared units. Standard deviation is the square root of the variance, which returns the measure to the original units of the data.
- When do you divide by N versus N minus 1?
- Divide by N for a full population. Divide by N minus 1 for a sample used to estimate a larger population, which corrects a bias that would otherwise make the variance too small.
Need help with a problem?
Upload your question and get a verified, step-by-step solution in seconds.
Open GPAI Solver →