Logistic regression is a model for binary classification. It combines the input features into a linear score, sends that score through the sigmoid function, and produces a number between 00 and 11 that is interpreted, under the fitted model, as the estimated probability of the positive class.

Despite the name, logistic regression is usually used to decide between two classes such as pass/fail, spam/not spam, or default/no default. The word "regression" refers to the linear formula inside the model, not to predicting a continuous output.

Logistic regression formula at a glance

Binary logistic regression uses

p(y=1x)=σ(z),z=β0+β1x1++βnxnp(y=1 \mid x) = \sigma(z), \qquad z = \beta_0 + \beta_1 x_1 + \cdots + \beta_n x_n

with the sigmoid function

σ(z)=11+ez\sigma(z) = \frac{1}{1 + e^{-z}}

The linear part zz can be any real number. The sigmoid squeezes that value into (0,1)(0,1), which is why the output can be used as a probability estimate.

Why the sigmoid function matters

If you used the raw linear score zz as a probability, you could get impossible values such as 1.71.7 or 0.4-0.4. The sigmoid fixes that by mapping large negative scores close to 00, large positive scores close to 11, and scores near 00 close to 0.50.5.

That gives a practical reading:

  • if zz is very negative, the model leans toward class 00
  • if zz is near 00, the model is uncertain
  • if zz is very positive, the model leans toward class 11

The curve is steepest near z=0z=0. So a small change in the score can change the probability a lot near 0.50.5, but much less when the probability is already close to 00 or 11.

Worked logistic regression example

Suppose a model uses one feature xx and has

z=7+0.1xz = -7 + 0.1x

You can think of xx as a test score and y=1y=1 as "pass." The coefficients here are just an example to show the mechanics.

If x=65x = 65, then

z=7+0.1(65)=0.5z = -7 + 0.1(65) = -0.5

So the predicted probability is

p(y=1x=65)=σ(0.5)=11+e0.50.378p(y=1 \mid x=65) = \sigma(-0.5) = \frac{1}{1 + e^{0.5}} \approx 0.378

If x=80x = 80, then

z=7+0.1(80)=1z = -7 + 0.1(80) = 1

and

p(y=1x=80)=σ(1)=11+e10.731p(y=1 \mid x=80) = \sigma(1) = \frac{1}{1 + e^{-1}} \approx 0.731

So the same model gives about a 37.8%37.8\% chance of passing at x=65x=65 and about a 73.1%73.1\% chance at x=80x=80. The score rose by 1.51.5, but the final output stayed between 00 and 11 because the sigmoid bends the result into a probability.

If you now choose a threshold of 0.50.5, the first case is classified as class 00 and the second as class 11. That last step depends on the threshold. The probability estimate itself does not.

One useful shortcut: with a 0.50.5 threshold, the class flips exactly when z=0z=0, because σ(0)=0.5\sigma(0)=0.5.

How logistic regression becomes a classifier

The model output is a probability estimate. A classification rule is added afterward.

For example, with threshold 0.50.5:

  • predict class 11 if p(y=1x)0.5p(y=1 \mid x) \ge 0.5
  • predict class 00 if p(y=1x)<0.5p(y=1 \mid x) < 0.5

But 0.50.5 is not always the right threshold. If false positives and false negatives have different costs, or if the classes are highly imbalanced, another threshold may work better.

What the coefficients mean

The sign of a coefficient tells you the direction of the effect on the linear score zz:

  • if βi>0\beta_i > 0, increasing xix_i raises zz and tends to increase p(y=1x)p(y=1 \mid x)
  • if βi<0\beta_i < 0, increasing xix_i lowers zz and tends to decrease p(y=1x)p(y=1 \mid x)

That part is straightforward. The subtle point is that the probability does not change linearly with the feature, because the sigmoid curve is not a straight line.

In standard logistic regression, the linear model is on the log-odds scale:

log(p1p)=β0+β1x1++βnxn\log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 x_1 + \cdots + \beta_n x_n

This means each one-unit increase in a feature changes the log-odds linearly when the other features are held fixed. That is more precise than saying it changes the probability by a fixed amount.

Common logistic regression mistakes

Treating the output as a guaranteed class

A prediction like 0.730.73 does not mean the event will happen. It means the model assigns about a 73%73\% estimated probability to the positive class for that input.

Assuming the threshold must be 0.50.5

0.50.5 is common, but it is a choice, not a law. The best threshold depends on the application.

Thinking the probability changes linearly

The score zz is linear in the inputs, but the probability is not. A one-unit change in a feature can have a different effect near p=0.5p=0.5 than near p=0.95p=0.95.

Forgetting the model is binary unless extended

Basic logistic regression handles two classes. Multi-class versions exist, but they are extensions, not the same binary setup written in a different way.

When logistic regression is used

Logistic regression is often used when the target is yes/no, such as spam detection, disease presence, customer churn, loan default, or pass/fail outcomes.

It remains popular because it is simple, fast, and reasonably interpretable. It is especially useful when you want a baseline classifier, when the dataset is not huge, or when you need estimated probabilities rather than only hard labels.

A simple way to picture it

Think of logistic regression as a two-step machine:

  1. Add up evidence with a linear score.
  2. Convert that score into a probability with the sigmoid.

That picture is enough to understand most introductory examples and to see why logistic regression sits between linear models and classification tasks.

Try a similar logistic regression problem

Pick a simple score such as

z=3+0.5xz = -3 + 0.5x

Compute σ(z)\sigma(z) for a few values of xx, such as 22, 66, and 1010. Watch how the linear score changes steadily while the probability bends through an S-shaped curve. Then try a different threshold and see when the predicted class changes.

Need help with a problem?

Upload your question and get a verified, step-by-step solution in seconds.

Open GPAI Solver →