What are Convolutional Neural Networks (CNN) in Computer Vision?

Introduction🔊

In this blog post, we’ll look at CNN’s (Convolution Neural Networks), which use features from the input image to identify images. Keep an eye out for the essential phrases mentioning CNN. Please be aware that this blog does not discuss the mathematical component in order to make it accessible to everyone, especially high school students who are interested in learning machine learning.

What are CNNs❓

The above video from a LinkedIn post justifies memically what CNNs are and how they are to be implemented.

Convolution is a mathematical procedure that gives rise to convolutional neural networks. This is a specific type of linear operation, and CNN’s employ it instead of matrix multiplication in at least one of the layers.
The filter size/kernel size varies based on the user input and can be chosen using a trial-and-error method. Also, this blog gives an idea of choosing the ideal filter size/kernel size.
If the size of the image is small we do padding to get a larger image to feed as input to the convolution layer to provide better results. It improves accuracy and helps gather more insights from the image.
The last step of the procedure in convolution is where we utilize either max-pooling or average pooling to pixelate the images while keeping their significance.

Note:

All things discussed above have been illustrated below to get a clear understanding of each of them.
Also, strides are considered to be 1 for the entire blog.

CNN — https://www.pinterest.com/pin/491244271853383345/

The above image shows all the terms used above. Let us summarize them in steps:

Convolution➕

Before applying a filter, add a convolution layer to the input image. It is clear that there are stacked layers because the input image is colored (RGB).

Pooling🎱

Utilizing maximum pooling, we combine the filter’s output, choosing the highest value currently available or taking the average into account (sum all the values inside the filter currently and divide it by the number of elements). This minimizes noise, but it also makes outliers a concern. It is highly misleading to choose maximum pooling when the numbers are anything like { 4, 6, 10, or 4656 }.

Output📤

Repeat the previous two processes (Convolution and Pooling) as necessary, and then add either a dense layer followed by a flattened layer, which produces the output, or a flattened layer directly.

Code Implementation⌨️

Now with a basic understanding let’s get our hands dirty. You could go to this notebook or codedamn playground. But we will focus more on the Model Building section as that is where the heart of the code lies. The model on which we are going to train our dataset is as follows:

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Dense(128, activation=tf.nn.relu),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
Code language: Python (python)

The above block of code makes use of Tensorflow, an open-source ML library by Google. We make use of some of its available functions like:

Actually whatever is written as code is summarized here:

The input of our first layer of convolution is an image of size (28 x 28 x 1). A maximum pooling layer of size (2 x 2) is then added on top of the first convolutional layer. Then, in order to obtain a shape (11, 11, 64), we feed this to a further Convolution layer and a maximum pooling layer. If you applied a distinct filter size and stride, this value might change. The layers are then flattened to a 1D layer and passed through the thick layers to get an output of (10 x 1).

In the model training step, the constructed model is then applied to the training images. In order to assess how well our model has performed, we also evaluate (classify) the output of the model that is obtained from the result of correctly predicting test images and the actual output. The model might then be adjusted if there are any ambiguities or areas for improvement, and the process could be repeated.

Conclusion🔚

This blog gives a brief understanding of CNN for even a noob beginner who is venturing into the field of Deep Learning(a subset of ML). Key takeaways from the blog are:

✅Understanding CNN

✅Allied terminologies

✅Implementation using CNN

✅Classify the image output