Reach for a t-test when you need to decide whether a sample mean, or the difference between two sample means, is bigger than random variation alone would produce. The method fits when the outcome is numeric and the population standard deviation is unknown — the usual real-world case.

When this method applies

A t-test is a good fit when all of these are reasonably true:

  1. The outcome variable is numeric.
  2. Observations are independent within the chosen design, unless you intentionally use a paired setup.
  3. The question is about a mean or a mean difference.
  4. The sample is not so small and distorted by outliers or strong skew that the mean and standard deviation become misleading.

If the population standard deviation were known exactly, a textbook zz-test would be the direct alternative. In practice σ\sigma is usually unknown, which is why t-tests are so common. The test must match the design of the data: it is for mean-based questions, not categorical counts.

The procedure, step by step

Step 1 — Match the design. Decide whether you have one sample, two independent groups, or paired observations from the same units. This choice selects the formula:

  • One-sample — compare one sample mean to a benchmark μ0\mu_0:
t=xˉμ0s/nt = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}
  • Two-sample (Welch) — compare two independent group means without assuming equal variances:
t=xˉ1xˉ2s12n1+s22n2t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}

Welch's degrees of freedom are not simply n1+n22n_1 + n_2 - 2, so software usually handles that part.

  • Paired — for before-and-after or matched pairs, run the test on the pairwise differences, not the two raw columns:
t=dˉμd,0sd/nt = \frac{\bar{d} - \mu_{d,0}}{s_d / \sqrt{n}}

Often the null value is μd,0=0\mu_{d,0} = 0, meaning the average change is zero.

Step 2 — State the hypotheses. Write the null and alternative mean statements and decide one-sided or two-sided.

Step 3 — Compute the t statistic. Every version has the same shape,

t=observed differenceestimated standard errort = \frac{\text{observed difference}}{\text{estimated standard error}}

so tt grows when the mean difference is large and shrinks when the data are noisy or the sample is small.

Step 4 — Read the result. Under the null hypothesis the statistic follows a tt distribution, not a normal zz distribution. The tt distribution has heavier tails, especially for small samples, so it is more cautious about significance. Use the right degrees of freedom to get a p-value or compare with a critical value.

Step 5 — Interpret carefully. Check the assumptions and explain the conclusion in context, not just the arithmetic.

A full worked example: one-sample t-test

A packaging process should average 100100 grams. A random sample of 2525 packages gives

xˉ=102,s=4\bar{x} = 102, \quad s = 4

and you want to know whether the true mean differs from 100100 grams. One sample against a target value means a one-sample t-test.

State the hypotheses:

H0:μ=100H_0: \mu = 100 H1:μ100H_1: \mu \ne 100

Standard error:

sn=425=45=0.8\frac{s}{\sqrt{n}} = \frac{4}{\sqrt{25}} = \frac{4}{5} = 0.8

Test statistic:

t=1021000.8=2.5t = \frac{102 - 100}{0.8} = 2.5

Degrees of freedom:

df=n1=24df = n - 1 = 24

For a two-sided test with df=24df = 24, t=2.5t = 2.5 gives a p-value below 0.050.05. The result is significant at the 5%5\% level, so you reject H0H_0: the sample gives evidence the process mean differs from 100100 grams — provided the sample is reasonably independent and not badly distorted by outliers.

Where each step trips people up, and how to self-check

  • Design step: Picking the wrong version. If the same people, machines, or units are measured twice, the data are paired — an independent two-sample t-test is wrong.
  • Read step: Treating the tt value as if it followed a normal distribution; small samples need the heavier-tailed tt.
  • Interpret step: Reading "not statistically significant" as "there is no difference." It usually means the sample lacked strong enough evidence against the null.
  • Data check: Skipping it. A tiny sample with one extreme outlier still produces a number, but the conclusion may not be trustworthy.

Your turn

Change the example so the sample mean is 101101 instead of 102102, keeping n=25n = 25 and s=4s = 4. Recompute tt and decide whether the evidence still clears the 5%5\% level — a clean way to see how the conclusion shifts as the mean moves toward the null. T-tests like this run throughout experiments, quality control, medicine, psychology, education, and A/B-style comparisons whenever the outcome is numeric.

Frequently Asked Questions

What is a t-test used for?
A t-test helps you decide whether a sample mean, or the difference between two sample means, is larger than you would expect from random variation alone. You use it when the outcome is numeric and the population standard deviation is unknown, which is the usual real-world case. It is for mean-based questions, not categorical counts.
What is the difference between one-sample, two-sample, and paired t-tests?
A one-sample t-test compares one sample mean against a benchmark value, such as a target weight. A two-sample t-test compares the means of two independent groups, like two classes taught with different methods. A paired t-test handles before-and-after or matched-pair data, and it is run on the pairwise differences, not on the two raw columns.
When should you use Welch's t-test?
Use Welch's t-test when comparing two independent groups and you do not have a strong reason to assume the population variances are equal. It is usually the safer default. Its degrees of freedom are not simply the two sample sizes minus two, so statistical software typically handles that calculation for you.
Why use a t-distribution instead of a normal distribution?
When the population standard deviation is unknown and estimated from the sample, the test statistic follows a t distribution rather than a normal z distribution under the null hypothesis. The t distribution has heavier tails, especially for small samples, so it is more cautious about declaring a result statistically significant.
When is a t-test not appropriate?
A t-test is a poor fit when the outcome is categorical rather than numeric, when observations are not independent within the chosen design, or when very small samples contain strong skew or obvious outliers. The test must match the design of the data, so check these conditions before interpreting any t statistic.

Need help with a problem?

Upload your question and get a verified, step-by-step solution in seconds.

Open GPAI Solver →