Understanding Convolutional Neural Networks (CNNs) By Excel

November 18, 2025

6

as a black field. We all know that it learns from knowledge, however the query is how it actually learns.

On this article, we’ll construct a tiny Convolutional Neural Community (CNN) straight in Excel to know, step-by-step, how a CNN really works for photos.

We are going to open this black field, and watch every step occur proper earlier than our eyes. We are going to perceive all of the calculations which can be the inspiration of what we name “deep studying”.

This text is in a sequence of articles about implementing machine studying and deep studying algorithms in Excel. And you’ll find all of the Excel information on this Kofi hyperlink.

1. How Pictures are Seen by Machines

1.1 Two Methods to Detect One thing in an Picture

Once we attempt to detect an object in an image, like a cat, there are two principal methods: the deterministic strategy and the machine studying strategy. Let’s see how these two approaches work for this instance of recognizing a cat in an image.

The deterministic approach means writing guidelines by hand.

For instance, we are able to say {that a} cat has a spherical face, two triangle ears, a physique, a tail, and so on. So the developer will do all of the work to outline the foundations.

Then the pc runs all these guidelines, and provides a rating of similarity.

Deterministic strategy to detect a cat on an image — picture by creator

The machine studying strategy implies that we don’t write guidelines by ourselves.

As an alternative, we give the pc many examples, footage with cats and footage with out cats. Then it learns by itself what makes a cat a cat.

Machine studying strategy to detect a cat on an image — picture by creator (cats are generated by AI)

That’s the place issues might develop into mysterious.

We normally say that the machine will determine it out by itself, however the actual query is how.

In truth, we nonetheless have to inform the machines the right way to create these guidelines. And guidelines must be learnable. So the important thing level is: how can we outline the type of guidelines that can be used?

To know the right way to outline guidelines, we first have to know what a picture is.

1.2 Understanding What an Picture Is

A cat is complicated type, however we are able to take a easy and clear instance: recognizing handwritten digits from the MNIST dataset.

First, what’s a picture?

A digital picture might be seen as a grid of pixels. Every pixel is a quantity that reveals how vivid it’s, from 0 for white to 255 for black.

In Excel, we are able to signify this grid with a desk the place every cell corresponds to 1 pixel.

MNIST Handwritten digits – picture from the MNIST dataset https://en.wikipedia.org/wiki/MNIST_database (CC BY-SA 3.0)

The unique dimension of the digits is 28 x 28. However to maintain issues easy, we’ll use a ten×10 desk. It’s sufficiently small for fast calculations however nonetheless massive sufficient to point out the overall form.

So we’ll cut back the dimension.

For instance, the handwritten quantity “1” might be represented by a ten×10 grid as beneath in Excel.

Picture is a grid of numbers — picture by creator

1.3 Earlier than Deep Studying: Traditional Machine Studying for Pictures

Earlier than utilizing CNNs or any deep studying technique, we are able to already acknowledge easy photos with traditional machine studying algorithms corresponding to logistic regression or resolution timber.

On this strategy, every pixel turns into one function. For instance, a ten×10 picture has 100 pixels, so there are 100 options as enter.

The algorithm then learns to affiliate patterns of pixel values with labels corresponding to “0”, “1”, or “2”.

Traditional ML for picture recognition — picture by creator

In truth with this straightforward machine studying strategy, logistic regression can obtain fairly good outcomes with an accuracy round 90%.

This reveals that traditional fashions are in a position to study helpful info from uncooked pixel values.

Nonetheless, they’ve a serious limitation. They deal with every pixel as an unbiased worth, with out contemplating its neighbors. Because of this, they can not perceive spatial relationships with the pixels.

So intuitively, we all know that the efficiency won’t be good for complicated photos. So this technique is just not scalable.

Now, in case you already understand how traditional machine studying works, you already know that there is no such thing as a magic. And actually, you already know what to do: you must enhance the function engineering step, you must rework the options, so as to get extra significant info from the pixels.

2. Constructing a CNN Step by Step in Excel

2.1 From complicated CNNs to a easy one in Excel

Once we discuss Convolutional Neural Networks, we frequently see very deep and sophisticated architectures, like VGG-16. Many layers, 1000’s of parameters, and numerous operations, it appears very complicated, and say that it’s inconceivable to know precisely the way it works.

The principle thought behind the layers is: detecting patterns step-by-step.

With the instance of handwritten digits, let’s ask a query: what might be the only attainable CNN structure?

First, for the hidden layers, earlier than doing all of the layers, let’s cut back the quantity. What number of? Let’s do one. That’s proper: just one.

As for the filters, what about their dimensions? In actual CNN layers, we normally use 3×3 filters to detect small sample. However let’s start with massive ones.

How massive? 10×10!

Sure, why not?

This additionally implies that you don’t have to slip the filter throughout the picture. This manner, we are able to straight examine the enter picture with the filter and see how nicely they match.

This easy case is just not about efficiency, however about readability.
It’s going to present how CNNs detect patterns step-by-step.

Now, now we have to outline the variety of filters. We are going to say 10, it’s the minimal. Why? As a result of there are 10 digits, so now we have to have a minimal of 10 filters. And we’ll see how they are often discovered within the subsequent part.

Within the picture beneath, you’ve got the diagram of this easiest structure of a CNN neural community:

The only CNN structure – picture by creator

2.2 Coaching the Filters (or Designing Them Ourselves)

In an actual CNN, the filters should not written by hand. They’re realized throughout coaching.

The neural community adjusts the values inside every filter to detect the patterns that finest assist to acknowledge the pictures.

In our easy Excel instance, we won’t practice the filters.

As an alternative, we’ll create them ourselves to know what they signify.

Since we already know the shapes of handwritten digits, we are able to design filters that seem like every digit.

For instance, we are able to draw a filter that matches the type of 0, one other for 1, and so forth.

Another choice is to take the common picture of all examples for every digit and use that because the filter.

Every filter will then signify the “common form” of a quantity.

That is the place the frontier between human and machine turns into seen once more. We are able to both let the machine uncover the filters, or we are able to use our personal information to construct them manually.

That’s proper: machines don’t outline the character of the operations. Machine studying researchers outline them. Machines are solely good to do loops, to seek out the optimum values for these defines guidelines. And in easy instances, people are all the time higher than machines.

So, if there are solely 10 filters to outline, we all know that we are able to straight outline the ten digits. So we all know, intuitively, the character of those filters. However there are different choices, in fact.

Now, to outline the numerical values of those filters, we are able to straight use our information. And we can also use the coaching dataset.

Under you may see the ten filters created by averaging all the pictures of every handwritten digit. Every one reveals the everyday sample that defines a quantity.

Common values as filters — picture by creator

2.3 How a CNN Detects Patterns

Now that now we have the filters, now we have to match the enter picture to those filters.

The central operation in a CNN is known as cross-correlation. It’s the key mechanism that enables the pc to match patterns in a picture.

It really works in two easy steps:

Multiply values/dot product: we take every pixel within the enter picture, and we’ll multiply it by the pixel in the identical place of the filter. Because of this the filter “seems to be” at every pixel of the picture and measures how comparable it’s to the sample saved within the filter. Sure, if the 2 values are massive, then the result’s massive.
Add outcomes/sum: The merchandise of those multiplications are then added collectively to supply a single quantity. This quantity expresses how strongly the enter picture matches the filter.

Instance of Cross Correlation for one image – picture by creator

In our simplified structure, the filter has the identical measurement because the enter picture (10×10).

Due to this, the filter doesn’t want to maneuver throughout the picture.
As an alternative, the cross-correlation is utilized as soon as, evaluating the entire picture with the filter straight.

This quantity represents how nicely the picture matches the sample contained in the filter.

If the filter seems to be like the common form of a handwritten “5”, a excessive worth implies that the picture might be a “5”.

By repeating this operation with all filters, one per digit, we are able to see which sample provides the best match.

2.4 Constructing a Easy CNN in Excel

We are able to now create a small CNN from finish to finish to see how the total course of works in apply.

Enter: A ten×10 matrix represents the picture to categorise.
Filters: We outline ten filters of measurement 10×10, every one representing the common picture of a handwritten digit from 0 to 9. These filters act as sample detectors for every quantity.
Cross correlation: Every filter is utilized to the enter picture, producing a single rating that measures how nicely the picture matches that filter’s sample.
Resolution: The filter with the best rating provides the anticipated digit. In deep studying frameworks, this step is usually dealt with by a Softmax perform, which converts all scores into possibilities.
In our easy Excel model, taking the most rating is sufficient to decide which digit the picture almost definitely represents.

Each 10×10 filter represents the average shape of a handwritten digit (0–9).
The input image is compared with all filters using cross-correlation.
The filter that produces the highest score — after normalization with Softmax — corresponds to the detected digit. — Cross-correlation of the enter digit with ten common digit filters. The very best rating, normalized by Softmax, identifies the enter as “6.” – picture by creator

2.5 Convolution or Cross Correlation?

At this level, you would possibly marvel why we name it a Convolutional Neural Community when the operation we described is definitely cross-correlation.

The distinction is refined however easy:

Convolution means flipping the filter each horizontally and vertically earlier than sliding it over the picture.
Cross-correlation means making use of the filter straight, with out flipping.

For extra info, you may learn this text:

Understanding Convolutional Neural Networks (CNNs) By Excel

1. How Pictures are Seen by Machines

1.1 Two Methods to Detect One thing in an Picture

1.2 Understanding What an Picture Is

1.3 Earlier than Deep Studying: Traditional Machine Studying for Pictures

2. Constructing a CNN Step by Step in Excel

2.1 From complicated CNNs to a easy one in Excel

2.2 Coaching the Filters (or Designing Them Ourselves)

2.3 How a CNN Detects Patterns

2.4 Constructing a Easy CNN in Excel

2.5 Convolution or Cross Correlation?

3. Constructing Extra Advanced Architectures

3.1 Small filters to detect extra detailed patterns

3.2 What if the digit is just not within the middle of the picture

3.3 Different Operations Utilized in CNNs

Conclusion

Related Articles

Robots-Weblog | Kunst oder KI: Wer ist der Künstler?

Added Scientific secures strategic funding

Charting the trail to the autonomous enterprise

LEAVE A REPLY Cancel reply

Latest Articles

Robots-Weblog | Kunst oder KI: Wer ist der Künstler?

Added Scientific secures strategic funding

Charting the trail to the autonomous enterprise

New Phishing Package Targets Italian Entities

How RAIN boosted throughput with a compact cobot palletizer

About US