A chi square test checks whether categorical count data sits too far from what a model would expect by chance alone. It works on counts in categories, never on averages or raw measurements.
The Statistic And Its Symbols
In the most common setup you have an observed count and an expected count for each category. The test statistic is
The number grows when observed counts drift farther from expected counts. Squaring makes bigger mismatches matter more, and dividing by scales each category by how large its expected count is. The core comparison is always the same: what you observed versus what the null hypothesis predicts.
The two common versions differ only in how is computed. A goodness-of-fit test asks whether one categorical variable follows a claimed distribution; a test of independence asks whether two categorical variables are associated in a contingency table.
Why The Formula Has This Shape
Each term is a squared deviation measured in units of its own expected count. Squaring removes the sign so surpluses and shortfalls both add to the evidence, and dividing by is what lets you compare categories of very different sizes on the same scale: a gap of is dramatic when only were expected but trivial when thousands were. Summing those scaled gaps gives a single number whose distribution, under the null hypothesis, is known, which is what makes a threshold comparison possible.
Worked Example: Goodness Of Fit
A cafe wants to know whether three drink sizes are chosen equally often. Over orders, the observed counts are Small , Medium , Large .
If the null hypothesis says all three sizes are equally likely, the expected count in each category is
Now compute the statistic:
That is the test statistic, not the conclusion by itself. You compare to a chi square distribution with the appropriate degrees of freedom. Here the degrees of freedom are , because there are three categories and no parameters were estimated from the data. With , a statistic of is not strong evidence against equal preference at the level. The practical reading: the counts differ from perfect equality, but not by enough to confidently call the true preferences unequal from this sample alone.
Practice And Conditions To Check
Before trusting any chi square result, confirm three conditions hold: the data are counts in categories, the observations are independent (or close enough for your model), and the expected counts are not too small for the approximation you are using. A common introductory rule of thumb is that each expected count should be at least about ; that is a useful warning sign, not a universal law.
For your own practice, take a small table of category counts and write down the null hypothesis before doing any arithmetic. That single step prevents the biggest error in chi square problems: using the right formula with the wrong expected counts.
Calculation Pitfalls To Watch
- Using the test on means, measurements, or percentages instead of category counts.
- Treating the observed counts as expected counts; the expected counts must come from the null hypothesis.
- Ignoring small expected counts, which can make the usual chi square approximation unreliable.
- Reading "statistically significant" as "important in practice"; the test only addresses evidence against the null model.
Chi square tests appear in surveys, genetics, quality control, and market research, anywhere outcomes fall into categories and the real question is whether a pattern is surprising or whether two categorical variables seem related. If the data are numerical rather than categorical, a different tool such as a test or ANOVA is usually better.
Frequently Asked Questions
- What does a chi square test check?
- It checks whether categorical count data looks too far from what a null model would expect by chance alone. It applies to counts in categories, not to averages or raw measurements. If the gaps between observed and expected counts are large enough, the data counts as evidence against the null hypothesis.
- How do you calculate the chi square statistic?
- For each category, take the observed count minus the expected count, square it, and divide by the expected count, then add the results across categories. The statistic grows when observed counts drift farther from expected counts, with bigger mismatches mattering more.
- What is the difference between a goodness-of-fit test and a test of independence?
- A goodness-of-fit test asks whether one categorical variable follows a claimed distribution, while a test of independence asks whether two categorical variables are associated in a contingency table. Both use the same statistic family, but the expected counts are computed differently depending on the version.
- How do you find degrees of freedom for a chi square goodness-of-fit test?
- Use the number of categories minus one, provided no parameters were estimated from the data. For example, with three drink sizes the degrees of freedom are 3 minus 1, which is 2. The statistic is then compared to a chi square distribution with that many degrees of freedom.
- When should you not use a chi square test?
- Avoid it when the data are not counts in categories, when observations are not reasonably independent, or when expected counts are too small for the chi square approximation. In introductory settings a common rule of thumb sets a minimum expected count per category before the approximation is trusted.
Need help with a problem?
Upload your question and get a verified, step-by-step solution in seconds.
Open GPAI Solver →