Linear regression is a way to describe how one variable changes with another using a best-fit straight line. In simple linear regression, with one input variable xx and one output variable yy, the model is

y^=b0+b1x\hat{y} = b_0 + b_1x

Here y^\hat{y} is the predicted value, b1b_1 is the slope, and b0b_0 is the intercept. The usual fitting method is ordinary least squares, which picks the line that makes the squared residuals as small as possible:

i=1n(yiy^i)2=i=1n(yi(b0+b1xi))2\sum_{i=1}^n \left(y_i - \hat{y}_i\right)^2 = \sum_{i=1}^n \left(y_i - (b_0 + b_1x_i)\right)^2

If you only need the main idea, remember this: the slope tells you the model's predicted change in yy for a one-unit increase in xx, as long as a straight-line model is a reasonable fit.

Linear Regression Equation: What It Tells You

The slope b1b_1 tells you the predicted change in yy when xx increases by 11, if a linear model is a reasonable description of the data. The intercept b0b_0 is the predicted value of yy when x=0x = 0.

The word "predicted" matters. A regression line usually does not pass through every point. It balances the errors across all points instead, so it summarizes the trend rather than matching every observation.

Linear Regression Formula For b0b_0 And b1b_1

For simple linear regression, if the xx values are not all the same, the least-squares coefficients can be written as

b1=i=1n(xixˉ)(yiyˉ)i=1n(xixˉ)2b_1 = \frac{\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^n (x_i-\bar{x})^2}

and

b0=yˉb1xˉb_0 = \bar{y} - b_1\bar{x}

Here xˉ\bar{x} is the mean of the xx values and yˉ\bar{y} is the mean of the yy values. These formulas are for simple linear regression. If you have more than one input variable, the setup changes.

Why Least Squares Uses Squared Residuals

Think of the data points as a cloud on a scatter plot. Many straight lines could pass near that cloud. Linear regression chooses the line that keeps the vertical misses, called residuals, small overall.

Squaring the residuals does two useful things. It prevents positive and negative errors from canceling out, and it gives extra weight to large misses.

Simple Linear Regression Example

Suppose the data points are (1,2)(1,2), (2,2)(2,2), (3,4)(3,4), and (4,4)(4,4). We will fit a simple linear regression line.

First find the means:

xˉ=1+2+3+44=2.5\bar{x} = \frac{1+2+3+4}{4} = 2.5 yˉ=2+2+4+44=3\bar{y} = \frac{2+2+4+4}{4} = 3

Now compute the slope:

b1=(1.5)(1)+(0.5)(1)+(0.5)(1)+(1.5)(1)(1.5)2+(0.5)2+(0.5)2+(1.5)2b_1 = \frac{(-1.5)(-1)+(-0.5)(-1)+(0.5)(1)+(1.5)(1)}{(-1.5)^2+(-0.5)^2+(0.5)^2+(1.5)^2} b1=45=0.8b_1 = \frac{4}{5} = 0.8

Then compute the intercept:

b0=yˉb1xˉ=30.8(2.5)=1b_0 = \bar{y} - b_1\bar{x} = 3 - 0.8(2.5) = 1

So the regression equation is

y^=1+0.8x\hat{y} = 1 + 0.8x

If x=5x=5, the model predicts

y^=1+0.8(5)=5\hat{y} = 1 + 0.8(5) = 5

You can also check one residual. At x=2x=2, the predicted value is

y^=1+0.8(2)=2.6\hat{y} = 1 + 0.8(2) = 2.6

The actual value is 22, so the residual is

yy^=22.6=0.6y-\hat{y} = 2 - 2.6 = -0.6

That point lies 0.60.6 units below the regression line. One residual does not tell you whether the whole model is good, but it does show how regression measures error.

Common Linear Regression Mistakes

One mistake is assuming the line must pass through every point. Regression is about best fit, not perfect fit.

Another mistake is reading the slope as an exact rule for every data point. The slope is an average predicted change from the model.

A third mistake is treating regression as proof of causation. A strong linear pattern can support prediction or describe association, but it does not by itself explain why the variables move together.

It is also easy to overtrust predictions outside the observed data range. Extrapolation can fail even when the fitted line looks good inside the original range.

When To Use Linear Regression

Linear regression is used when a straight-line summary is useful and the relationship is at least roughly linear over the range you care about. Common uses include estimating price from size, score from study time, or output from input under stable conditions.

It is especially useful when you want an interpretable model. The slope, intercept, and residuals are simple enough to explain without hiding what the model is doing.

A Quick Check Before You Trust The Line

Before using a regression line, ask two questions. Does a scatter plot look roughly linear? Does the context make the slope meaningful rather than misleading? If either answer is no, a different model may be better.

Try A Similar Problem

Pick four points, sketch them, and fit a line by calculator or software. Then compare the predicted values to the actual ones. Looking at the residuals is often the fastest way to understand what the regression line is really doing.

Need help with a problem?

Upload your question and get a verified, step-by-step solution in seconds.

Open GPAI Solver →