Hypothesis testing turns a vague worry like "is this machine off-target?" into a number you can compute and compare against a cutoff. You start from a default claim, the null hypothesis , and ask: if were true, would data this extreme be unusual enough to doubt it? The method never proves true or false; it only measures how inconsistent the sample looks with the null model.
The Formula And Its Symbols
For a one-sample mean test with known population standard deviation, the test statistic is
where is the sample mean, is the value claimed by , is the population standard deviation, and is the sample size. The denominator is the standard error, the typical spread of sample means. The test statistic itself depends on the situation: a -test, -test, chi square test, and many others are all hypothesis tests, so there is no single formula for all of hypothesis testing.
Why The Formula Works
The statistic answers a simple question: how many standard errors is the sample mean away from the null value? The numerator is the raw gap between what you saw and what predicted. Dividing by the standard error rescales that gap into a universal unit, so a of always means "two standard errors below the claim," whatever the original units were. Because sample means cluster tightly around the true mean (the standard error shrinks as grows), a large is hard to produce by chance alone if holds, and that is exactly what makes it evidence.
This also explains the surrounding machinery. Every test pits two statements against each other:
- The null hypothesis , the default claim being tested.
- The alternative or , what you support if the data argues strongly enough against .
You fix a significance level , often , before looking at the result. It is the amount of evidence you demand before rejecting . Two outcomes follow: reject when the data is sufficiently inconsistent with the null model, or fail to reject when it is not strong enough to rule the null model out. "Fail to reject" is not "accept as true"; it only means the sample did not provide strong enough evidence against .
Worked Example, Step By Step
A filling machine is supposed to average mL per bottle. A quality-control team samples bottles and gets a sample mean of mL. Assume the population standard deviation is known, mL, and the conditions justify a one-sample -test.
Set up the hypotheses:
This is a left-tailed test, since the concern is underfilling. Compute the standard error:
Then the test statistic:
For on a left-tailed -test, the critical value is about . Because , the result falls in the rejection region, so reject at the level. In context, the sample provides evidence that the machine is underfilling on average. That conclusion still depends on the test assumptions; weak assumptions can make it unreliable even when the arithmetic is correct.
Try It Yourself
Take the same bottle-filling setup but change the sample mean to mL. Recompute the standard error (it is unchanged), then the test statistic, and decide at whether the decision flips. You should find the evidence weakens as the sample mean slides toward , which shows how the gap in the numerator drives the whole result.
Calculation Pitfalls
- Misreading the tail. A left-tailed test compares against a negative critical value; flipping the sign sends you to the wrong region.
- Wrong denominator. The standard error is , not alone. Forgetting the inflates the standard error and shrinks .
- Misreading the -value. A -value is the probability, assuming is true and the assumptions hold, of a result at least as extreme as observed. It is not the probability that is false, nor a vague "happened by chance," nor the size of the effect.
- Type I vs Type II error. A Type I error rejects a true (its probability is controlled by ); a Type II error fails to reject a false (probability ). Lowering cuts false alarms but can make true effects harder to detect, which is why sample size matters.
- Significance is not importance. A tiny effect can be statistically significant in a very large sample, and a clean-looking -value never rescues a test whose independence, distribution, variance, or data-type assumptions are wrong.
When Hypothesis Testing Is Used
It appears in science, manufacturing, medicine, surveys, A/B testing, and policy analysis, always to decide whether a sample gives enough evidence to question a default claim. Good testing is not only the calculation: it needs a sensible null hypothesis, a defensible design, and an interpretation that matches what the test can actually say.
Frequently Asked Questions
- What is the null hypothesis in hypothesis testing?
- The null hypothesis, written H0, is the default claim being tested. The alternative hypothesis is what you would support if the data gives enough evidence against it. The test asks a narrow question: if the null hypothesis were true, would data this extreme be unusual enough to make us doubt the claim?
- What does a p-value actually mean?
- A p-value is the probability, assuming the null hypothesis is true and the test assumptions hold, of getting a result at least as extreme as the one observed. A small p-value means the data would be unusual under the null model. It is not the probability that the null hypothesis is false, and it says nothing about effect size.
- What is the difference between rejecting and failing to reject the null hypothesis?
- Rejecting means the data is sufficiently inconsistent with the null model at your chosen significance level. Failing to reject means the sample did not provide strong enough evidence against it. Failing to reject is not the same as accepting the null hypothesis as true; the test simply could not rule it out.
- What are the usual steps of a hypothesis test?
- State the null and alternative hypotheses clearly, choose a significance level such as 0.05 and a test that matches the data and assumptions, compute a test statistic from the sample, convert it into a p-value or compare it with a critical value, then make the decision and interpret it in context.
Need help with a problem?
Upload your question and get a verified, step-by-step solution in seconds.
Open GPAI Solver →