Hypothesis testing turns a vague worry like "is this machine off-target?" into a number you can compute and compare against a cutoff. You start from a default claim, the null hypothesis H0H_0, and ask: if H0H_0 were true, would data this extreme be unusual enough to doubt it? The method never proves H0H_0 true or false; it only measures how inconsistent the sample looks with the null model.

The Formula And Its Symbols

For a one-sample mean test with known population standard deviation, the test statistic is

z=xˉμ0σ/nz = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}

where xˉ\bar{x} is the sample mean, μ0\mu_0 is the value claimed by H0H_0, σ\sigma is the population standard deviation, and nn is the sample size. The denominator σ/n\sigma/\sqrt{n} is the standard error, the typical spread of sample means. The test statistic itself depends on the situation: a zz-test, tt-test, chi square test, and many others are all hypothesis tests, so there is no single formula for all of hypothesis testing.

Why The Formula Works

The statistic answers a simple question: how many standard errors is the sample mean away from the null value? The numerator xˉμ0\bar{x} - \mu_0 is the raw gap between what you saw and what H0H_0 predicted. Dividing by the standard error rescales that gap into a universal unit, so a zz of 2-2 always means "two standard errors below the claim," whatever the original units were. Because sample means cluster tightly around the true mean (the standard error shrinks as nn grows), a large z|z| is hard to produce by chance alone if H0H_0 holds, and that is exactly what makes it evidence.

This also explains the surrounding machinery. Every test pits two statements against each other:

  1. The null hypothesis H0H_0, the default claim being tested.
  2. The alternative H1H_1 or HaH_a, what you support if the data argues strongly enough against H0H_0.

You fix a significance level α\alpha, often 0.050.05, before looking at the result. It is the amount of evidence you demand before rejecting H0H_0. Two outcomes follow: reject H0H_0 when the data is sufficiently inconsistent with the null model, or fail to reject H0H_0 when it is not strong enough to rule the null model out. "Fail to reject" is not "accept as true"; it only means the sample did not provide strong enough evidence against H0H_0.

Worked Example, Step By Step

A filling machine is supposed to average 500500 mL per bottle. A quality-control team samples 3636 bottles and gets a sample mean of 496496 mL. Assume the population standard deviation is known, σ=12\sigma = 12 mL, and the conditions justify a one-sample zz-test.

Set up the hypotheses:

H0:μ=500H_0: \mu = 500 H1:μ<500H_1: \mu < 500

This is a left-tailed test, since the concern is underfilling. Compute the standard error:

σn=1236=2\frac{\sigma}{\sqrt{n}} = \frac{12}{\sqrt{36}} = 2

Then the test statistic:

z=xˉμ0σ/n=4965002=2z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} = \frac{496 - 500}{2} = -2

For α=0.05\alpha = 0.05 on a left-tailed zz-test, the critical value is about 1.645-1.645. Because 2<1.645-2 < -1.645, the result falls in the rejection region, so reject H0H_0 at the 5%5\% level. In context, the sample provides evidence that the machine is underfilling on average. That conclusion still depends on the test assumptions; weak assumptions can make it unreliable even when the arithmetic is correct.

Try It Yourself

Take the same bottle-filling setup but change the sample mean to 498498 mL. Recompute the standard error (it is unchanged), then the test statistic, and decide at α=0.05\alpha = 0.05 whether the decision flips. You should find the evidence weakens as the sample mean slides toward 500500, which shows how the gap in the numerator drives the whole result.

Calculation Pitfalls

  • Misreading the tail. A left-tailed test compares zz against a negative critical value; flipping the sign sends you to the wrong region.
  • Wrong denominator. The standard error is σ/n\sigma/\sqrt{n}, not σ\sigma alone. Forgetting the n\sqrt{n} inflates the standard error and shrinks z|z|.
  • Misreading the pp-value. A pp-value is the probability, assuming H0H_0 is true and the assumptions hold, of a result at least as extreme as observed. It is not the probability that H0H_0 is false, nor a vague "happened by chance," nor the size of the effect.
  • Type I vs Type II error. A Type I error rejects a true H0H_0 (its probability is controlled by α\alpha); a Type II error fails to reject a false H0H_0 (probability β\beta). Lowering α\alpha cuts false alarms but can make true effects harder to detect, which is why sample size matters.
  • Significance is not importance. A tiny effect can be statistically significant in a very large sample, and a clean-looking pp-value never rescues a test whose independence, distribution, variance, or data-type assumptions are wrong.

When Hypothesis Testing Is Used

It appears in science, manufacturing, medicine, surveys, A/B testing, and policy analysis, always to decide whether a sample gives enough evidence to question a default claim. Good testing is not only the calculation: it needs a sensible null hypothesis, a defensible design, and an interpretation that matches what the test can actually say.

Frequently Asked Questions

What is the null hypothesis in hypothesis testing?
The null hypothesis, written H0, is the default claim being tested. The alternative hypothesis is what you would support if the data gives enough evidence against it. The test asks a narrow question: if the null hypothesis were true, would data this extreme be unusual enough to make us doubt the claim?
What does a p-value actually mean?
A p-value is the probability, assuming the null hypothesis is true and the test assumptions hold, of getting a result at least as extreme as the one observed. A small p-value means the data would be unusual under the null model. It is not the probability that the null hypothesis is false, and it says nothing about effect size.
What is the difference between rejecting and failing to reject the null hypothesis?
Rejecting means the data is sufficiently inconsistent with the null model at your chosen significance level. Failing to reject means the sample did not provide strong enough evidence against it. Failing to reject is not the same as accepting the null hypothesis as true; the test simply could not rule it out.
What are the usual steps of a hypothesis test?
State the null and alternative hypotheses clearly, choose a significance level such as 0.05 and a test that matches the data and assumptions, compute a test statistic from the sample, convert it into a p-value or compare it with a critical value, then make the decision and interpret it in context.

Need help with a problem?

Upload your question and get a verified, step-by-step solution in seconds.

Open GPAI Solver →