Semantic Segmentation of image Using Deep Learning:Review
Main Article Content
Abstract
Semantic segmentation is considered as one of the most important and challenging problem in the field of computer vision which aims at assigning a class label to each pixel in an image which leads to sophisticated scene understanding. This task has been extensively used in various application areas including the self-driving cars, medical diagnosis, and environmental monitoring. Semantic segmentation has come a long way since its early detection algorithms based on features extractions to state of the art deep learning methods.This paper aims at presenting the evolution of the semantic segmentation, and specifically, how the deep learning has changed the field. The conventional approaches consisting of edge detection and histogram analysis offered a basic level of understanding but were constrained by the use of hand crafted features. Deep learning, however, is capable of learning features and has produced very promising results across numerous tasks. Some important architectures that have set the benchmark in the field include Fully Convolutional Networks (FCNs), U-Net, and DeepLab which have used convolutional layers, encoder-decoder architecture, and atrous convolutions for improving the accuracy of the segmentation. The article also reviews some of the publicly available datasets which include Cityscapes, PASCAL VOC and ISIC 2017 which are widely used to assess the performance of the segmentation models. These datasets differ in their complexity, resolution, and the application domain that they cover which makes the problems that they present to researchers diverse. Also, we compare the traditional and deep learning based feature extraction methods and present the characteristics of each method, their advantages, and disadvantages, and areas of application. This survey aims at assisting researchers and practitioners by presenting the current best practice in the form of state-of-the-art methodologies, discussing the potential of application of such methodologies in the real world, and identifying the directions for further research.Therefore, despite the advancement of deep learning in the area of semantic segmentation, there are still numerous issues which need to be addressed in the future, including efficiency, scalability, and domain specific issues. This all-encompassing review paper is hoped to be beneficial to those wishing to gain more knowledge on the current trends as well as find a way to contribute to the field of semantic segmentation in the future.
Downloads
Article Details
References
Badue C, Guidolini R, Carneiro RV, Azevedo P, Cardoso VB, Forechi A, et al. Self-driving cars: A survey. Expert Syst Appl. 2021 Mar;165:113816.
Esteva A, Chou K, Yeung S, Naik N, Madani A, Mottaghi A, et al. Deep learning-enabled medical computer vision. NPJ Digit Med. 2021 Jan 8;4(1):5.
Li X, Ding J, Liu J, Ge X, Zhang J. Digital Mapping of Soil Organic Carbon Using Sentinel Series Data: A Case Study of the Ebinur Lake Watershed in Xinjiang. Remote Sens (Basel). 2021 Feb 19;13(4):769.
Su Z, Liu W, Yu Z, Hu D, Liao Q, Tian Q, et al. Pixel Difference Networks for Efficient Edge Detection. 2021 Aug 16;
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 3431–40.
Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In 2015. p. 234–41.
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans Pattern Anal Mach Intell. 2018 Apr 1;40(4):834–48.
Tschandl P, Rosendahl C, Kittler H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci Data. 2018 Aug 14;5(1):180161.
Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:201011929. 2020;
Kora P, Ooi CP, Faust O, Raghavendra U, Gudigar A, Chan WY, et al. Transfer learning techniques for medical image analysis: A review. Biocybern Biomed Eng. 2022 Jan;42(1):79–107.
Yao Z, Wang L. Multi-pathway feature integration network for salient object detection. Neurocomputing. 2021 Oct;461:462–78.
Wan S, Ding S, Chen C. Edge computing enabled video segmentation for real-time traffic monitoring in internet of vehicles. Pattern Recognit. 2022 Jan;121:108146.
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, et al. The Cityscapes Dataset for Semantic Urban Scene Understanding. 2016 Apr 6;
Everingham M, Eslami SMA, Van Gool L, Williams CKI, Winn J, Zisserman A. The Pascal Visual Object Classes Challenge: A Retrospective. Int J Comput Vis. 2015 Jan 25;111(1):98–136.
Zhou B, Zhao H, Puig X, Xiao T, Fidler S, Barriuso A, et al. Semantic Understanding of Scenes through the ADE20K Dataset. 2016 Aug 18;
Lin TY, Maire M, Belongie S, Bourdev L, Girshick R, Hays J, et al. Microsoft COCO: Common Objects in Context. 2014 May 1;
Tschandl P, Rosendahl C, Kittler H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci Data. 2018 Aug 14;5(1):180161.
Brostow GJ, Fauqueur J, Cipolla R. Semantic object classes in video: A high-definition ground truth database. Pattern Recognit Lett. 2009 Jan;30(2):88–97.
Neuhold G, Ollmann T, Rota Bulo S, Kontschieder P. The mapillary vistas dataset for semantic understanding of street scenes. In: Proceedings of the IEEE international conference on computer vision. 2017. p. 4990–9.
Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2012. p. 3354–61.
Staal J, Abramoff MD, Niemeijer M, Viergever MA, van Ginneken B. Ridge-Based Vessel Segmentation in Color Images of the Retina. IEEE Trans Med Imaging. 2004 Apr;23(4):501–9.
Gong K, Liang X, Zhang D, Shen X, Lin L. Look into Person: Self-supervised Structure-sensitive Learning and A New Benchmark for Human Parsing. 2017 Mar 15;
Lee M, Kim M, Jeong CY. Real-time semantic segmentation on edge devices: A performance comparison of segmentation models. In: 2022 13th International Conference on Information and Communication Technology Convergence (ICTC). IEEE; 2022. p. 383–8.
Kirillov A, Wu Y, He K, Girshick R. Pointrend: Image segmentation as rendering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020. p. 9799–808.
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. 2020 Oct 22;
Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. 2021 May 31;
Esteva A, Chou K, Yeung S, Naik N, Madani A, Mottaghi A, et al. Deep learning-enabled medical computer vision. NPJ Digit Med. 2021 Jan 8;4(1):5.
He X, Zhou Y, Zhao J, Zhang D, Yao R, Xue Y. Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation. IEEE Transactions on Geoscience and Remote Sensing. 2022;60:1–15.
Bragagnolo L, Rezende LR, da Silva RV, Grzybowski JMV. Convolutional neural networks applied to semantic segmentation of landslide scars. Catena (Amst). 2021 Jun;201:105189.
Ishihara K, Kanervisto A, Miura J, Hautamaki V. Multi-task learning with attention for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021. p. 2902–11.
Su D, Kong H, Qiao Y, Sukkarieh S. Data augmentation for deep learning based semantic segmentation and crop-weed classification in agricultural robotics. Comput Electron Agric. 2021 Nov;190:106418.
Grill JB, Strub F, Altché F, Tallec C, Richemond PH, Buchatskaya E, et al. Bootstrap your own latent: A new approach to self-supervised Learning. 2020 Jun 13;
Mao W, Liu M, Salzmann M. Weakly-supervised action transition learning for stochastic human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. p. 8151–60.
Mirikharaji Z, Abhishek K, Bissoto A, Barata C, Avila S, Valle E, et al. A survey on deep learning for skin lesion segmentation. Med Image Anal. 2023;88:102863.
Kumar P, Kumar V. Exploring the Frontier of Object Detection: A Deep Dive into YOLOv8 and the COCO Dataset. In: 2023 IEEE International Conference on Computer Vision and Machine Intelligence (CVMI). 2023. p. 1–6.
Liu Y, Bai X, Wang J, Li G, Li J, Lv Z. Image semantic segmentation approach based on DeepLabV3 plus network with an attention mechanism. Eng Appl Artif Intell. 2024 Jan;127:107260.
Sun Y, Pan B, Fu Y. Lightweight Deep Neural Network for Real-Time Instrument Semantic Segmentation in Robot Assisted Minimally Invasive Surgery. IEEE Robot Autom Lett. 2021 Apr;6(2):3870–7.
Niu R, Sun X, Tian Y, Diao W, Chen K, Fu K. Hybrid Multiple Attention Network for Semantic Segmentation in Aerial Images. IEEE Transactions on Geoscience and Remote Sensing. 2022;60:1–18.
Subasi A. Artificial intelligence for 3D medical image analysis. In: Applications of Artificial Intelligence in Healthcare and Biomedicine. Elsevier; 2024. p. 357–75.
Lai X, Tian Z, Jiang L, Liu S, Zhao H, Wang L, et al. Semi-supervised semantic segmentation with directional context-aware consistency. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. p. 1205–14.
Rahman MdA, Amin MFI, Hamada M. Edge Detection Technique by Histogram Processing with Canny Edge Detector. In: 202020 3rd IEEE International Conference on Knowledge Innovation and Invention (ICKII). IEEE; 2020. p. 128–31.
Dalal N, Triggs B. Histograms of Oriented Gradients for Human Detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). IEEE; p. 886–93.
Lowe DG. Distinctive Image Features from Scale-Invariant Keypoints. Int J Comput Vis. 2004 Nov;60(2):91–110.
Haralick RM, Shanmugam K, Dinstein I. Textural Features for Image Classification. IEEE Trans Syst Man Cybern. 1973 Nov;SMC-3(6):610–21.
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 3431–40.
Abdulrahman Safar A, Salih DM, Murshid AM. Pattern recognition using the multi-layer perceptron (MLP) for medical disease: A survey. Int J Nonlinear Anal Appl [Internet]. 2023;14:2008–6822. Available from: http://dx.doi.org/10.22075/ijnaa.2022.7114
Ahmed MS, Fakhrudeen AM. COVID-19IraqKirkukDataset: Development and evaluation of an Iraqi dataset for COVID-19 classification based on deep learning. International Journal of Nonlinear Analysis and Applications [Internet]. 2023;14(1):2507–18. Available from: https://ijnaa.semnan.ac.ir/article_7317.html
Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In 2015. p. 234–41.
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans Pattern Anal Mach Intell. 2018 Apr 1;40(4):834–48.
Lin T. Focal Loss for Dense Object Detection. arXiv preprint arXiv:170802002. 2017;
Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:201011929. 2020;
Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. 2021 May 31;
Rambhatla SS, Chellappa R, Shrivastava A. The pursuit of knowledge: Discovering and localizing novel categories using dual memory. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. p. 9153–63.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 770–8.
