Convolutional Neural Networks Overview: The Heart of Deep Learning Algorithms

4 min readFeb 1, 2023

In this day and age, deep learning has attracted tremendous attention from various fields of technology and neural networks are the heart of deep learning algorithms. Neural networks educate the computer to perform tasks that come naturally to people. There are different types of neural networks in deep learning such as convolutional neural networks (CNN), recurrent neural networks (RNN), artificial neural networks (ANN), etc. But Convolutional Neural Networks (CNN) or ConvNets are one particular model that has made a significant contribution to the field of computer vision and image analysis.

Convolutional Neural Networks Overview: The Heart of Deep Learning Algorithms

What is convolutional neural networks?

A neural network type known as convolutional neural network or CNN or ConvNet, which has one or more convolutional layers, is focused on processing data with a grid-like architecture, such as images.

Common uses for convolutional neural networks

It’s mainly used for image processing, classification, segmentation and also for other auto correlated data. Besides, CNNs are also used in astronomy to analyze data from radio telescopes and forecast the most plausible visual representation of the data; Professor Gerald Quon at the Quon-titative biology lab uses CNNs for generative models in single-cell genomics for disease identification.

How do convolutional neural networks work?

Convolutional neural networks have three main kinds of layers, that are in order:

Convolutional layer
Pooling layer
Fully-connected (FC) layer

Convolutional Layer

The first layer of convolution networks is convolutional layer. Next would be additional convolutional layers or pooling layers and fully-connected layer is the final layer. The CNN becomes more complicated with each layer, detecting larger areas of the image. Early layers emphasize basic elements like colors and borders.

For instance, the input is 2D image. As a result, the input will have 2 dimensions — height, width. Additionally, we have a feature detector, also referred to as a kernel or filter, which will move through the image’s receptive fields and determine whether the feature is there. Convolution describes this process.

After each convolution operation, a CNN applies a Rectified Linear Unit (ReLU) transformation to the feature map, introducing nonlinearity to the model.

As was previously mentioned, the first convolution layer may be followed by another convolution layer. When this occurs, the CNN’s structure may become hierarchical because the later layers will be able to view the pixels in the earlier layers’ receptive fields. For example, let’s assume we try to demonstrate if the image contains a tree or not. You can think of the tree as a sum of parts and each individual part of a tree (leaf, body, …) account for a lower-level pattern in the neural net and the combination of its parts represents a higher-level pattern, creating a feature hierarchy within the CNN.

Pooling Layer

Pooling layers are one of the building blocks of Convolutional Neural Networks. Where Convolutional layers extract features from images, Pooling layers consolidate the features learned by CNNs. Its purpose is to gradually shrink the representation’s spatial dimension to minimize the number of parameters and computations in the network.

The Convolutional layer may not be able to identify an object in an image if it has slightly moved. It follows that the feature map captures the exact locations of the features in the input. The “Translational Invariance” that pooling layers offer makes the CNN invariant to translations, i.e., the CNN will still be able to recognize the characteristics in the input even if it has been translated.

There are 2 types of pooling

Max pooling: It works by choosing the maximum value from every pool and retains the most prominent features of the image map and returns the shaper one.

Average pooling: It works by getting the average of the pool and retains the average values of features of the feature map.

Fully-Connected Layer

The last layer of the convolutional neural network is full-connected layer (sometimes more than one). Neural networks are a set of dependent non-linear functions. Each individual function consists of a neuron (or a perceptron). In fully connected layers, the neuron applies a linear transformation to the input vector through a weights matrix. The output from the convolutional layers represents high-level features in the data. While that output could be flattened and connected to the output layer, adding a fully-connected layer is a (usually) cheap way of learning non-linear combinations of these features.

Essentially the convolutional layers are providing a meaningful, low-dimensional, and somewhat invariant feature space, and the fully-connected layer is learning a (possibly non-linear) function in that space.

Thanks for reading!

If you are finding information about machine learning, artificial intelligence or data in general or medical field. Follow us to acquire more useful knowledge about this 3 keywords.

Contact

Email: info@vinlab.io

Twitter: https://twitter.com/VinLab

YouTube: https://www.youtube.com/@Vinlab-MedicalImageAnnotation

Open source project: https://github.com/vinbigdata-medical/vindr-lab