image credits: https://quotefancy.com |
How to increase the receptive field in CNNs?
- Adding multiple deeper layers
- Incorporate multi-scale scene context
- Inflating the size of the filter (a.k.a. dilated convolution)
- Pooling at different scales (a.k.a. spatial pyramid pooling)
Semantic segmentation is the most widely used technique for holistic scene understanding. Fully Convolutional Networks (FCN) have adopted CNNs (that was initially used for image classification) for the task of semantic segmentation. Given an input image, semantic segmentation outputs a semantic segmentation mask that assigns a pre-determined semantic category to each pixel in the image. FCN achieves this by downsampling of image features followed by a rapid upsampling procedure to reconstruct the segmentation mask. However, this rapid upsampling procedure has lead to loss of contextual information. Recovering pixel level fine details from too coarse features (given as input for upsampling layer) is difficult.
How to upsample without loosing contextual information?
- Learn to upsample while remembering the lost context (Deconvolution operation with unpooling) (Pooling is [originally] used to filter out noisy activations)
- Using the inherent/ built in semantic information flow at different scales in CNNs (Feature Pyramids)
I will mention the related papers soon. :)