Friday, December 19, 2025

The Machine Studying “Creation Calendar” Day 12: Logistic Regression in Excel


At this time’s mannequin is Logistic Regression.

If you happen to already know this mannequin, here’s a query for you:

Is Logistic Regression a regressor or a classifier?

Properly, this query is precisely like: Is a tomato a fruit or a vegetable?

From a botanist’s viewpoint, a tomato is a fruit, as a result of they take a look at construction: seeds, flowers, plant biology.

From a cook dinner’s viewpoint, a tomato is a vegetable, as a result of they take a look at style, how it’s utilized in a recipe, whether or not it goes in a salad or a dessert.

The identical object, two legitimate solutions, as a result of the standpoint is totally different.

Logistic Regression is precisely like that.

  • Within the Statistical / GLM perspective, it’s a regression. And there may be not the idea of “classification” on this framework anyway. There are gamma regression, logistic regression, Poisson regression…
  • Within the machine studying perspective, it’s used for classification. So it’s a classifier.

We’ll come again to this later.

For now, one factor is bound:

Logistic Regression may be very properly tailored when the goal variable is binary, and normally y is coded as 0 or 1.

However…

What’s a classifier for a weight-based mannequin?

So, y might be 0 or 1.

0 or 1, they’re numbers, proper?

So we are able to simply contemplate y as steady!

Sure, y = a x + b, with y = 0 or 1.

Why not?

Now, chances are you’ll ask: why this query, now? Why it was not requested earlier than.

Properly, for distance-based and tree-based fashions, a categorical y is really categorical.

When y is categorical, like purple, blue, inexperienced, or just 0 and 1:

  • In Okay-NN, you classify by neighbors of every class.
  • In centroid fashions, you examine with the centroid of every class.
  • In a choice tree, you compute class proportions at every node.

In all these fashions:

Class labels usually are not numbers.
They’re classes.
The algorithms by no means deal with them as values.

So classification is pure and speedy.

However for weight-based fashions, issues work otherwise.

In a weight-based mannequin, we all the time compute one thing like:

y = a x + b

or, later, a extra complicated perform with coefficients.

This implies:

The mannequin works with numbers all over the place.

So right here is the important thing thought:

If the mannequin does regression, then this identical mannequin can be utilized for binary classification.

Sure, we are able to use linear regression for binary classification!

Since binary labels are 0 and 1, they’re already numeric.

And on this particular case: we can apply Bizarre Least Squares (OLS) instantly on y = 0 and y = 1.

The mannequin will match a line, and we are able to use the identical closed-form method, as we are able to see under.

Logistic Regression in Excel – all photographs by creator

We are able to do the identical gradient descent, and it’ll completely work:

After which, to acquire the ultimate class prediction, we merely select a threshold.
It’s normally 0.5 (or 50 %), however relying on how strict you wish to be, you may decide one other worth.

  • If the anticipated y≥0.5, predict class 1
  • In any other case, class 0

It is a classifier.

And since the mannequin produces a numeric output, we are able to even establish the purpose the place: y=0.5.

This worth of x defines the choice frontier.

Within the earlier instance, this occurs at x=9.
At this threshold, we already noticed one misclassification.

However an issue seems as quickly as we introduce a degree with a giant worth of x.

For instance, suppose we add a degree with: x= 50 and y = 1.

As a result of linear regression tries to suit a straight line by way of all the information, this single giant worth of x pulls the road upward.
The choice frontier shifts from x= to roughly x=12.

And now, with this new boundary, we find yourself with two misclassifications.

This illustrates the principle problem:

A linear regression used as a classifier is extraordinarily delicate to excessive values of x. The choice frontier strikes dramatically, and the classification turns into unstable.

This is without doubt one of the causes we’d like a mannequin that doesn’t behave linearly without end. A mannequin that stays between 0 and 1, even when x turns into very giant.

And that is precisely what the logistic perform will give us.

How Logistic Regression works

We begin with : ax + b, similar to the linear regression.

Then we apply one perform referred to as sigmoid, or logistic perform.

As we are able to see within the screenshot under, the worth of p is then between 0 and 1, so that is excellent.

  • p(x) is the predicted chance that y = 1
  • 1 − p(x) is the anticipated chance that y = 0

For classification, we are able to merely say:

  • If p(x) ≥ 0.5, predict class 1
  • In any other case, predict class 0

From probability to log-loss

Now, the OLS Linear Regression tries to reduce the MSE (Imply Squared Error).

Logistic regression for a binary goal makes use of the Bernoulli probability. For every commentary i:

  • If yᵢ = 1, the chance of the information level is pᵢ
  • If yᵢ = 0, the chance of the information level is 1 − pᵢ

For the entire dataset, the chances are the product over all i. In apply, we take the logarithm, which turns the product right into a sum.

Within the GLM perspective, we attempt to maximize this log probability.

Within the machine studying perspective, we outline the loss because the adverse log probability and we reduce it. This provides the same old log-loss.

And it’s equal. We is not going to do the demonstration right here

Gradient Descent for Logistic Regression

Precept

Simply as we did for Linear Regression, we are able to additionally use Gradient Descent right here. The concept is all the time the identical:

  1. Begin from some preliminary values of a and b.
  2. Compute the loss and its gradient (derivatives) with respect to a and b.
  3. Transfer a and b a bit of bit within the course that reduces the loss.
  4. Repeat.

Nothing mysterious.
Simply the identical mechanical course of as earlier than.

Step 1. Gradient Calculation

For logistic regression, the gradients of the common log-loss comply with a quite simple construction.

That is merely the common residual.

We’ll simply give the end result under, for the method that we are able to implement in Excel. As you may see, it’s fairly easy on the finish, even when the log-loss method might be complicated at first look.

Excel can compute these two portions with easy SUMPRODUCT formulation.

Step 2. Parameter Replace

As soon as the gradients are identified, we replace the parameters.

This replace step is repeated at every iteration.
And iteration after iteration, the loss goes down, and the parameters converge to the optimum values.

We now have the entire image.
You could have seen the mannequin, the loss, the gradients, and the parameter updates.
And with the detailed view of every iteration in Excel, you may truly play with the mannequin: change a price, watch the curve transfer, and see the loss lower step-by-step.

It’s surprisingly satisfying to look at how all the pieces suits collectively so clearly.

What about multiclass classification?

For distance-based and tree-based fashions:

No problem in any respect.
They naturally deal with a number of courses as a result of they by no means interpret the labels as numbers.

However for weight-based fashions?

Right here we hit an issue.

If we write numbers for the category: 1, 2, 3, and many others.

Then the mannequin will interpret these numbers as actual numeric values.
Which results in issues:

  • the mannequin thinks class 3 is “larger” than class 1
  • the midpoint between class 1 and sophistication 3 is class 2
  • distances between courses turn into significant

However none of that is true in classification.

So:

For weight-based fashions, we can’t simply use y = 1, 2, 3 for multiclass classification.

This encoding is inaccurate.

We’ll see later tips on how to repair this.

Conclusion

Ranging from a easy binary dataset, we noticed how a weight-based mannequin can act as a classifier, why linear regression shortly reaches its limits, and the way the logistic perform solves these issues by retaining predictions between 0 and 1.

Then, by expressing the mannequin by way of probability and log-loss, we obtained a formulation that’s each mathematically sound and simple to implement.
And as soon as all the pieces is positioned in Excel, your entire studying course of turns into seen: the possibilities, the loss, the gradients, the updates, and at last the convergence of the parameters.

With the detailed iteration desk, you may truly see how the mannequin improves step-by-step.
You may change a price, modify the training price, or add a degree, and immediately observe how the curve and the loss react.
That is the actual worth of doing machine studying in a spreadsheet: nothing is hidden, and each calculation is clear.

By constructing logistic regression this manner, you not solely perceive the mannequin, you perceive why it’s educated.
And this instinct will stick with you as we transfer to extra superior fashions later within the Creation Calendar.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles

PHP Code Snippets Powered By : XYZScripts.com