Convolutional neural networks (CNNs) are deep neural networks that have the capability to classify and segment images. CNNs can be trained using supervised or unsupervised machine learning methods, depending on what you want them to do. CNN architectures for classification and segmentation include a variety of different layers with specific purposes, such as a convolutional layer, pooling layer, fully connected layers, dropout layers, etc. In this blog post, we will go over how CNNs work in detail for classification and segmentation problems.
Description of basic CNN architecture for Classification
The CNN architecture for classification includes convolutional layers, max-pooling layers, and fully connected layers. Convolution and max-pooling layers are used for feature extraction. While convolution layers are meant for feature detection, max-pooling layers are meant for feature selection. Max-pooling layers are employed when there are instances when the picture doesn’t require all of the high-resolution details or an output with smaller regions extracted by CNN’s is needed after performing downsampling operation on input data. The output from convolution and pooling layers is fed into the fully connected layers for classification. The examples of classification learning task where CNN is used are image classification, object detection, and facial recognition. The diagram below represents the basic CNN architecture for image classification:
Description of basic CNN architecture for Segmentation
Computer vision deals with images, and image segmentation is one of the most important steps. It involves dividing a visual input into segments to make image analysis easier. Segments are made up of sets of one or more pixels. Image segmentation sorts pixels into larger components while also eliminating the need to consider each pixel as a unit. It is the process of dividing image into manageable sections or “tiles”. The process of image segmentation starts with defining small regions on an image that should not be divided. These regions are called seeds, and the position of these seeds defines the tiles. The picture below can be used to understand image classification, object detection and image segmentation. Notice how image segmentation can be used for image classification or object detection.
Image segmentation has two levels of granularity. They are as following:
- Semantic segmentation: Semantic segmentation classifies image pixels into one or more classes which are semantically interpretable, rather, real-world objects. Categorizing the pixel values into distinct groups using CNN is known as region proposal and annotation. Region proposals are also called candidate objects patches (COMPs), which can be thought of as small groups of pixels that are likely to belong to the same object.
- Instance segmentation: In case of instance segmentation, each instance of each object is identified. The difference between instance segmentation and semantic segmentation is that semantic segmentation doesn’t categorize every pixel. If there are three objects of same type (say, bicycle) in an image, each individual bicycle is identified in instance segmentation while semantic segmentation classifies all the bicycles as one instance.
CNN architecture for segmentation makes use of encoder and decoder models. The encoders are used to encode the input into a representation that can be sent through the network, and then decoders are used to decode the representation back. Encoders can be convolutional neural networks and the decoders can be based on the deconvolutional or transposed neural networks with the purpose of creating a segmentation map. The picture below represents the basic CNN architecture for image segmentation:
Here is another view of encoder-decoder CNN architecture for image segmentation:
CNNs are a powerful tool for segmentation and classification, but they’re not the only way to do these tasks. CNN architectures have two primary types: segmentations CNNs that identify regions in an image from one or more classes of semantically interpretable objects, and classification CNNs that classify each pixel into one or more classes given a set of real-world object categories. In this article we covered some important points about CNNs including how they work, what their typical architecture looks like, and which applications use them most frequently. Do you want help with understanding more about CNN architectures? Let us know!