Chi Square Test | GPAI STEM

A chi square test checks whether categorical count data looks too far from what a model would expect by chance alone. It is used for counts in categories, not for averages or raw measurements.

The core idea is simple: compare what you observed to what you would expect if the null hypothesis were true. If the gaps are large enough, the chi square statistic becomes large, and the data counts as evidence against that null model.

What The Test Is Actually Comparing

In the most common setup, you have observed counts $O$ and expected counts $E$ for each category. The test statistic is

\chi^2 = \sum \frac{(O - E)^2}{E}

This number gets bigger when observed counts drift farther from expected counts. Bigger mismatches matter more, and categories with larger expected counts are scaled accordingly.

The expected counts are not guessed casually. They come from the null hypothesis. For a goodness-of-fit test, the null hypothesis might say categories should be equally likely. For a test of independence, the null hypothesis says two categorical variables are unrelated.

Two Common Versions

The phrase "chi square test" usually refers to one of these:

A goodness-of-fit test, which asks whether one categorical variable follows a claimed distribution.
A test of independence, which asks whether two categorical variables are associated in a contingency table.

The same statistic family is used in both cases, but the way you compute the expected counts depends on the version.

Worked Example: Goodness Of Fit

Suppose a cafe wants to know whether three drink sizes are chosen equally often. Over $60$ orders, the observed counts are:

Small: $26$
Medium: $18$
Large: $16$

If the null hypothesis says all three sizes are equally likely, the expected count in each category is

E = \frac{60}{3} = 20

Now compute the statistic:

\chi^2 = \frac{(26-20)^2}{20} + \frac{(18-20)^2}{20} + \frac{(16-20)^2}{20}

= \frac{36}{20} + \frac{4}{20} + \frac{16}{20}

= 1.8 + 0.2 + 0.8 = 2.8

That is the test statistic, not the final conclusion by itself. You would compare $\chi^2 = 2.8$ to a chi square distribution with the appropriate degrees of freedom. Here the degrees of freedom are $3 - 1 = 2$ , because there are three categories and no parameters were estimated from the data. With $df = 2$ , a statistic of $2.8$ is not strong evidence against equal preference at the $5\%$ level.

The practical reading is: the counts differ from perfect equality, but not by enough to confidently say the true preferences are unequal based on this sample alone.

When The Test Makes Sense

Use a chi square test when all of these are true:

Your data is a set of counts in categories.
The observations are independent, or close enough for the model you are using.
The expected counts are not too small for the chi square approximation you plan to use.

In many introductory settings, people use the rule of thumb that expected counts should be at least about $5$ in each category. That is a practical guideline, not a universal law, but it is a useful warning sign.

Common Mistakes

Using the test on means, measurements, or percentages instead of category counts.
Treating the observed counts as expected counts. The expected counts must come from the null hypothesis.
Ignoring small expected counts, which can make the usual chi square approximation unreliable.
Thinking "statistically significant" means "important in practice." The test only addresses evidence against the null model.

Where You See It

Chi square tests show up in surveys, genetics, quality control, market research, and any setting where outcomes fall into categories. They are especially common when the real question is whether a pattern is surprising or whether two categorical variables seem related.

If the data is numerical rather than categorical, a different tool is usually better. For example, comparing means often leads to a $t$ test or ANOVA instead.

Try Your Own Version

Take a small table of category counts and write down the null hypothesis before doing any arithmetic. That one step usually prevents the biggest mistake in chi square problems: using the right formula with the wrong expected counts.

Need help with a problem?

Upload your question and get a verified, step-by-step solution in seconds.

Open GPAI Solver →