A t-test helps you decide whether a sample mean, or the difference between two sample means, is larger than you would expect from random variation alone. You use it when the outcome is numeric and the population standard deviation is unknown, which is the usual real-world case.

The key condition is that the test must match the design of the data. A t-test is for mean-based questions, not categorical counts, and very small samples need caution if they contain strong skew or obvious outliers.

What a t-test measures

The basic idea is always the same:

t=observed differenceestimated standard errort = \frac{\text{observed difference}}{\text{estimated standard error}}

The statistic gets larger when the mean difference is large, and smaller when the data are noisy or the sample is small.

Under the null hypothesis, and if the conditions are reasonable, this statistic follows a tt distribution rather than a normal zz distribution. The tt distribution has heavier tails, especially for small samples, so it is more cautious about declaring a result significant.

Which type of t-test should you use

One-sample t-test

Use this when you have one sample and want to compare its mean with a benchmark value μ0\mu_0.

t=xˉμ0s/nt = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}

Example: compare the average package weight in one sample against a target of 100100 grams.

Two-sample t-test

Use this when you want to compare the means of two independent groups, such as two classes taught with different methods.

If you do not have a strong reason to assume equal population variances, Welch's t-test is usually the safer default:

t=xˉ1xˉ2s12n1+s22n2t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}

The degrees of freedom for Welch's test are not simply n1+n22n_1 + n_2 - 2, so software usually handles that part for you.

Paired t-test

Use this for before-and-after data or matched pairs. The test is not run on the two raw columns separately. It is run on the pairwise differences.

t=dˉμd,0sd/nt = \frac{\bar{d} - \mu_{d,0}}{s_d / \sqrt{n}}

In many paired problems, the null value is μd,0=0\mu_{d,0} = 0, meaning the average change is zero.

When a t-test is appropriate

A t-test is a good fit when all of these are reasonably true:

  1. The outcome variable is numeric.
  2. The observations are independent within the chosen design, unless you are intentionally using a paired setup.
  3. The question is about a mean or a mean difference.
  4. The sample is not so small and distorted by outliers or strong skew that the mean and standard deviation become misleading.

If the population standard deviation were known exactly, a textbook zz-test would be the direct alternative. In practice, t-tests are common because σ\sigma is usually unknown.

Worked example: a one-sample t-test

Suppose a packaging process is supposed to average 100100 grams. You take a random sample of 2525 packages and find

xˉ=102,s=4\bar{x} = 102, \quad s = 4

You want to know whether the true mean differs from 100100 grams.

Because this is one sample compared with a target value, the correct test is a one-sample t-test.

Start with the hypotheses:

H0:μ=100H_0: \mu = 100 H1:μ100H_1: \mu \ne 100

The standard error is

sn=425=45=0.8\frac{s}{\sqrt{n}} = \frac{4}{\sqrt{25}} = \frac{4}{5} = 0.8

Now compute the test statistic:

t=1021000.8=2.5t = \frac{102 - 100}{0.8} = 2.5

The degrees of freedom are

df=n1=24df = n - 1 = 24

For a two-sided test with df=24df = 24, a value of t=2.5t = 2.5 gives a p-value below 0.050.05. That means the result is statistically significant at the 5%5\% level, so you reject H0H_0.

In context, the sample gives evidence that the process mean is different from 100100 grams. That conclusion depends on the sample being reasonably independent and not badly distorted by outliers.

Common mistakes with t-tests

One common mistake is choosing the wrong version of the test. If the same people, machines, or units are measured twice, the data are paired, so an independent two-sample t-test is not appropriate.

Another mistake is reading "not statistically significant" as "there is no difference." Usually it means the sample did not provide strong enough evidence against the null hypothesis.

A third mistake is skipping the data check. With a tiny sample and one extreme outlier, the formula still produces a number, but the conclusion may not be trustworthy.

Where t-tests are used

T-tests are common in experiments, quality control, medicine, psychology, education, and A/B-style comparisons when the outcome is numeric. They are one of the standard entry points into statistical inference because they connect means, variability, uncertainty, and decision-making in one method.

Try a similar problem

Change the example so the sample mean is 101101 instead of 102102, while keeping n=25n = 25 and s=4s = 4. Recompute the t statistic and decide whether the evidence is still strong enough at the 5%5\% level. That is a useful next step if you want to see how the conclusion changes as the sample mean moves closer to the null value.

Need help with a problem?

Upload your question and get a verified, step-by-step solution in seconds.

Open GPAI Solver →