What is linear regression in simple terms?

Linear regression fits a straight line to data so you can describe or predict how one variable changes as another variable changes.

What equation does linear regression use?

In simple linear regression, the fitted line is written as $\hat{y} = b_0 + b_1x$, where $b_0$ is the intercept and $b_1$ is the slope.

Does linear regression prove cause and effect?

No. A regression line can describe association and support prediction, but it does not by itself establish causation.

Linear Regression — Equation, Formula & Examples

Linear regression is a way to describe how one variable changes with another using a best-fit straight line. In simple linear regression, with one input variable $x$ and one output variable $y$ , the model is

\hat{y} = b_0 + b_1x

Here $\hat{y}$ is the predicted value, $b_1$ is the slope, and $b_0$ is the intercept. The usual fitting method is ordinary least squares, which picks the line that makes the squared residuals as small as possible:

\sum_{i=1}^n \left(y_i - \hat{y}_i\right)^2 = \sum_{i=1}^n \left(y_i - (b_0 + b_1x_i)\right)^2

If you only need the main idea, remember this: the slope tells you the model's predicted change in $y$ for a one-unit increase in $x$ , as long as a straight-line model is a reasonable fit.

Linear Regression Equation: What It Tells You

The slope $b_1$ tells you the predicted change in $y$ when $x$ increases by $1$ , if a linear model is a reasonable description of the data. The intercept $b_0$ is the predicted value of $y$ when $x = 0$ .

The word "predicted" matters. A regression line usually does not pass through every point. It balances the errors across all points instead, so it summarizes the trend rather than matching every observation.

Linear Regression Formula For $b_0$ And $b_1$

For simple linear regression, if the $x$ values are not all the same, the least-squares coefficients can be written as

b_1 = \frac{\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^n (x_i-\bar{x})^2}

and

b_0 = \bar{y} - b_1\bar{x}

Here $\bar{x}$ is the mean of the $x$ values and $\bar{y}$ is the mean of the $y$ values. These formulas are for simple linear regression. If you have more than one input variable, the setup changes.

Why Least Squares Uses Squared Residuals

Think of the data points as a cloud on a scatter plot. Many straight lines could pass near that cloud. Linear regression chooses the line that keeps the vertical misses, called residuals, small overall.

Squaring the residuals does two useful things. It prevents positive and negative errors from canceling out, and it gives extra weight to large misses.

Simple Linear Regression Example

Suppose the data points are $(1,2)$ , $(2,2)$ , $(3,4)$ , and $(4,4)$ . We will fit a simple linear regression line.

First find the means:

\bar{x} = \frac{1+2+3+4}{4} = 2.5

\bar{y} = \frac{2+2+4+4}{4} = 3

Now compute the slope:

b_1 = \frac{(-1.5)(-1)+(-0.5)(-1)+(0.5)(1)+(1.5)(1)}{(-1.5)^2+(-0.5)^2+(0.5)^2+(1.5)^2}

b_1 = \frac{4}{5} = 0.8

Then compute the intercept:

b_0 = \bar{y} - b_1\bar{x} = 3 - 0.8(2.5) = 1

So the regression equation is

\hat{y} = 1 + 0.8x

If $x=5$ , the model predicts

\hat{y} = 1 + 0.8(5) = 5

You can also check one residual. At $x=2$ , the predicted value is

\hat{y} = 1 + 0.8(2) = 2.6

The actual value is $2$ , so the residual is

y-\hat{y} = 2 - 2.6 = -0.6

That point lies $0.6$ units below the regression line. One residual does not tell you whether the whole model is good, but it does show how regression measures error.

Common Linear Regression Mistakes

One mistake is assuming the line must pass through every point. Regression is about best fit, not perfect fit.

Another mistake is reading the slope as an exact rule for every data point. The slope is an average predicted change from the model.

A third mistake is treating regression as proof of causation. A strong linear pattern can support prediction or describe association, but it does not by itself explain why the variables move together.

It is also easy to overtrust predictions outside the observed data range. Extrapolation can fail even when the fitted line looks good inside the original range.

When To Use Linear Regression

Linear regression is used when a straight-line summary is useful and the relationship is at least roughly linear over the range you care about. Common uses include estimating price from size, score from study time, or output from input under stable conditions.

It is especially useful when you want an interpretable model. The slope, intercept, and residuals are simple enough to explain without hiding what the model is doing.

A Quick Check Before You Trust The Line

Before using a regression line, ask two questions. Does a scatter plot look roughly linear? Does the context make the slope meaningful rather than misleading? If either answer is no, a different model may be better.

Try A Similar Problem

Pick four points, sketch them, and fit a line by calculator or software. Then compare the predicted values to the actual ones. Looking at the residuals is often the fastest way to understand what the regression line is really doing.

Need help with a problem?

Upload your question and get a verified, step-by-step solution in seconds.

Open GPAI Solver →