Hypothesis testing is a way to ask whether sample data looks too inconsistent with a starting claim. That starting claim is called the null hypothesis, written H0H_0.

The method does not prove H0H_0 true or false. It asks a narrower question: if H0H_0 were true, would data this extreme be unusual enough that we should doubt it?

The Core Idea

Every hypothesis test has two competing statements:

  1. The null hypothesis H0H_0, which is the default claim being tested.
  2. The alternative hypothesis H1H_1 or HaH_a, which is what you would support if the data gives enough evidence against H0H_0.

You then choose a significance level α\alpha, often 0.050.05, before looking at the result. This is the cutoff for how much evidence you require before rejecting H0H_0.

Two outcomes are possible:

  1. Reject H0H_0: the data is sufficiently inconsistent with the null model.
  2. Fail to reject H0H_0: the data is not strong enough to rule out the null model.

"Fail to reject" is not the same as "accept as true." It only means the sample did not provide strong enough evidence against H0H_0.

The Usual Steps

The workflow is usually:

  1. State H0H_0 and H1H_1 clearly.
  2. Choose α\alpha and a test that matches the data and assumptions.
  3. Compute a test statistic from the sample.
  4. Turn that statistic into a pp-value or compare it with a critical value.
  5. Make the decision and interpret it in context.

The test statistic depends on the situation. A zz-test, tt-test, chi square test, and many others are all examples of hypothesis tests. There is no single formula for all of hypothesis testing.

What The pp-Value Means

A pp-value is the probability, assuming H0H_0 is true and the test assumptions hold, of getting a result at least as extreme as the one observed.

A small pp-value means the data would be unusual under H0H_0. That is why small pp-values count as evidence against the null hypothesis.

It does not mean:

  1. The probability that H0H_0 is false.
  2. The probability that your result happened "by random chance" in a vague everyday sense.
  3. The size or importance of the effect.

Main Types Of Hypothesis Tests

There are two useful ways to group tests.

By Direction

A one-tailed test looks for change in one direction only.

  • Right-tailed: values larger than the null claim support H1H_1.
  • Left-tailed: values smaller than the null claim support H1H_1.

A two-tailed test looks for a difference in either direction. If H1H_1 is "not equal," the rejection region is split across both tails.

By Data Situation

  • A zz-test is used for some mean-testing settings when population standard deviation is known or a justified large-sample approximation is being used.
  • A tt-test is common for means when the population standard deviation is unknown and conditions are reasonable.
  • A chi square test is used for categorical count data.

The right test depends on the variable type, sample design, and assumptions. Choosing the formula first and the question second is a common mistake.

Worked Example

Suppose a filling machine is supposed to average 500500 mL per bottle. A quality-control team takes a sample of 3636 bottles and gets a sample mean of 496496 mL.

Assume, for this example, that the population standard deviation is known to be σ=12\sigma = 12 mL and the sampling conditions justify a one-sample zz-test.

Set up the hypotheses:

H0:μ=500H_0: \mu = 500 H1:μ<500H_1: \mu < 500

This is a left-tailed test because the concern is underfilling.

The standard error is

σn=1236=2\frac{\sigma}{\sqrt{n}} = \frac{12}{\sqrt{36}} = 2

So the test statistic is

z=xˉμ0σ/n=4965002=2z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} = \frac{496 - 500}{2} = -2

If α=0.05\alpha = 0.05 for a left-tailed zz-test, the critical value is about 1.645-1.645. Because 2<1.645-2 < -1.645, the result falls in the rejection region.

So the decision is to reject H0H_0 at the 5%5\% level. In context, the sample provides evidence that the machine is underfilling on average.

That conclusion depends on the test assumptions. If the assumptions are poor, the conclusion may be unreliable even if the arithmetic is correct.

Type I And Type II Errors

Hypothesis testing always involves error risk.

A Type I error means rejecting H0H_0 even though it is true. Its probability is controlled by α\alpha.

A Type II error means failing to reject H0H_0 even though H1H_1 is true. Its probability is usually written β\beta.

Lowering α\alpha makes false alarms less likely, but it can also make true effects harder to detect if nothing else changes. That tradeoff is one reason sample size matters.

Common Mistakes

One common mistake is saying a non-significant result proves there is no effect. Usually it only shows the data was not strong enough to detect one.

Another mistake is treating statistical significance as practical importance. A tiny effect can be statistically significant in a very large sample.

People also misuse tests by ignoring assumptions about independence, distribution shape, variance, or data type. A clean-looking pp-value does not rescue a mismatched test.

When Hypothesis Testing Is Used

Hypothesis testing is used in science, manufacturing, medicine, surveys, A/B testing, and policy analysis. The goal is usually the same: decide whether the sample gives enough evidence to question a default claim.

In practice, good testing is not just about the calculation. It also requires a sensible null hypothesis, a defensible design, and an interpretation that matches what the test can actually say.

Try Your Own Version

Take the same bottle-filling example, but change the sample mean to 498498 mL. Recompute the test statistic and see whether the decision changes at α=0.05\alpha = 0.05. That is a quick way to see how evidence gets stronger or weaker as the sample result moves closer to the null value.

Need help with a problem?

Upload your question and get a verified, step-by-step solution in seconds.

Open GPAI Solver →