Semantic Segmentation of image Using Deep Learning:Review

Semantic segmentation is considered as one of the most important and challenging problem in the field of computer vision which aims at assigning a class label to each pixel in an image which leads to sophisticated scene understanding. This task has been extensively used in various application areas including the self-driving cars, medical diagnosis, and environmental monitoring. Semantic segmentation has come a long way since its early detection algorithms based on features extractions to state of the art deep learning methods.This paper aims at presenting the evolution of the semantic segmentation, and specifically, how the deep learning has changed the field. The conventional approaches consisting of edge detection and histogram analysis offered a basic level of understanding but were constrained by the use of hand crafted features. Deep learning, however, is capable of learning features and has produced very promising results across numerous tasks. Some important architectures that have set the benchmark in the field include Fully Convolutional Networks (FCNs), U-Net, and DeepLab which have used convolutional layers, encoder-decoder architecture, and atrous convolutions for improving the accuracy of the segmentation. The article also reviews some of the publicly available datasets which include Cityscapes, PASCAL VOC and ISIC 2017 which are widely used to assess the performance of the segmentation models. These datasets differ in their complexity, resolution, and the application domain that they cover which makes the problems that they present to researchers diverse. Also, we compare the traditional and deep learning based feature extraction methods and present the characteristics of each method, their advantages, and disadvantages, and areas of application. This survey aims at assisting researchers and practitioners by presenting the current best practice in the form of state-of-the-art methodologies, discussing the potential of application of such methodologies in the real world, and identifying the directions for further research.Therefore, despite the advancement of deep learning in the area of semantic segmentation, there are still numerous issues which need to be addressed in the future, including efficiency, scalability, and domain specific issues. This all-encompassing review paper is hoped to be beneficial to those wishing to gain more knowledge on the current trends as well as find a way to contribute to the field of semantic segmentation in the future.

Downloads

Download data is not yet available.

How to Cite

sinjawi, M. (2026). Semantic Segmentation of image Using Deep Learning:Review. Al-Kitab Journal for Pure Sciences, 10(01), 01–14. https://doi.org/10.32441/kjps.2026.10.01.p1

Issue

Vol. 10 No. 01 (2026): Vol. 10 No. 01 (2026): Al-Kitab Journal for Pure Sciences

Section

Articles

References

Badue C, Guidolini R, Carneiro RV, Azevedo P, Cardoso VB, Forechi A, et al. Self-driving cars: A survey. Expert Syst Appl. 2021 Mar;165:113816. DOI: https://doi.org/10.1016/j.eswa.2020.113816

Esteva A, Chou K, Yeung S, Naik N, Madani A, Mottaghi A, et al. Deep learning-enabled medical computer vision. NPJ Digit Med. 2021 Jan 8;4(1):5.

Li X, Ding J, Liu J, Ge X, Zhang J. Digital Mapping of Soil Organic Carbon Using Sentinel Series Data: A Case Study of the Ebinur Lake Watershed in Xinjiang. Remote Sens (Basel). 2021 Feb 19;13(4):769. DOI: https://doi.org/10.3390/rs13040769

Su Z, Liu W, Yu Z, Hu D, Liao Q, Tian Q, et al. Pixel Difference Networks for Efficient Edge Detection. 2021 Aug 16; DOI: https://doi.org/10.1109/ICCV48922.2021.00507

Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 3431–40.

Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In 2015. p. 234–41.

Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans Pattern Anal Mach Intell. 2018 Apr 1;40(4):834–48.

Tschandl P, Rosendahl C, Kittler H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci Data. 2018 Aug 14;5(1):180161.

Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:201011929. 2020;

Kora P, Ooi CP, Faust O, Raghavendra U, Gudigar A, Chan WY, et al. Transfer learning techniques for medical image analysis: A review. Biocybern Biomed Eng. 2022 Jan;42(1):79–107. DOI: https://doi.org/10.1016/j.bbe.2021.11.004

Yao Z, Wang L. Multi-pathway feature integration network for salient object detection. Neurocomputing. 2021 Oct;461:462–78. DOI: https://doi.org/10.1016/j.neucom.2021.08.082

Wan S, Ding S, Chen C. Edge computing enabled video segmentation for real-time traffic monitoring in internet of vehicles. Pattern Recognit. 2022 Jan;121:108146. DOI: https://doi.org/10.1016/j.patcog.2021.108146

Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, et al. The Cityscapes Dataset for Semantic Urban Scene Understanding. 2016 Apr 6; DOI: https://doi.org/10.1109/CVPR.2016.350

Everingham M, Eslami SMA, Van Gool L, Williams CKI, Winn J, Zisserman A. The Pascal Visual Object Classes Challenge: A Retrospective. Int J Comput Vis. 2015 Jan 25;111(1):98–136. DOI: https://doi.org/10.1007/s11263-014-0733-5

Zhou B, Zhao H, Puig X, Xiao T, Fidler S, Barriuso A, et al. Semantic Understanding of Scenes through the ADE20K Dataset. 2016 Aug 18; DOI: https://doi.org/10.1109/CVPR.2017.544

Lin TY, Maire M, Belongie S, Bourdev L, Girshick R, Hays J, et al. Microsoft COCO: Common Objects in Context. 2014 May 1; DOI: https://doi.org/10.1007/978-3-319-10602-1_48

Tschandl P, Rosendahl C, Kittler H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci Data. 2018 Aug 14;5(1):180161. DOI: https://doi.org/10.1038/sdata.2018.161

Brostow GJ, Fauqueur J, Cipolla R. Semantic object classes in video: A high-definition ground truth database. Pattern Recognit Lett. 2009 Jan;30(2):88–97. DOI: https://doi.org/10.1016/j.patrec.2008.04.005

Neuhold G, Ollmann T, Rota Bulo S, Kontschieder P. The mapillary vistas dataset for semantic understanding of street scenes. In: Proceedings of the IEEE international conference on computer vision. 2017. p. 4990–9. DOI: https://doi.org/10.1109/ICCV.2017.534

Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2012. p. 3354–61. DOI: https://doi.org/10.1109/CVPR.2012.6248074

Staal J, Abramoff MD, Niemeijer M, Viergever MA, van Ginneken B. Ridge-Based Vessel Segmentation in Color Images of the Retina. IEEE Trans Med Imaging. 2004 Apr;23(4):501–9. DOI: https://doi.org/10.1109/TMI.2004.825627

Gong K, Liang X, Zhang D, Shen X, Lin L. Look into Person: Self-supervised Structure-sensitive Learning and A New Benchmark for Human Parsing. 2017 Mar 15; DOI: https://doi.org/10.1109/CVPR.2017.715

Lee M, Kim M, Jeong CY. Real-time semantic segmentation on edge devices: A performance comparison of segmentation models. In: 2022 13th International Conference on Information and Communication Technology Convergence (ICTC). IEEE; 2022. p. 383–8. DOI: https://doi.org/10.1109/ICTC55196.2022.9952938

Kirillov A, Wu Y, He K, Girshick R. Pointrend: Image segmentation as rendering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020. p. 9799–808. DOI: https://doi.org/10.1109/CVPR42600.2020.00982

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. 2020 Oct 22;

Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. 2021 May 31;

Esteva A, Chou K, Yeung S, Naik N, Madani A, Mottaghi A, et al. Deep learning-enabled medical computer vision. NPJ Digit Med. 2021 Jan 8;4(1):5. DOI: https://doi.org/10.1038/s41746-020-00376-2

He X, Zhou Y, Zhao J, Zhang D, Yao R, Xue Y. Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation. IEEE Transactions on Geoscience and Remote Sensing. 2022;60:1–15. DOI: https://doi.org/10.1109/TGRS.2022.3144165

Bragagnolo L, Rezende LR, da Silva RV, Grzybowski JMV. Convolutional neural networks applied to semantic segmentation of landslide scars. Catena (Amst). 2021 Jun;201:105189. DOI: https://doi.org/10.1016/j.catena.2021.105189

Ishihara K, Kanervisto A, Miura J, Hautamaki V. Multi-task learning with attention for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021. p. 2902–11. DOI: https://doi.org/10.1109/CVPRW53098.2021.00325

Su D, Kong H, Qiao Y, Sukkarieh S. Data augmentation for deep learning based semantic segmentation and crop-weed classification in agricultural robotics. Comput Electron Agric. 2021 Nov;190:106418. DOI: https://doi.org/10.1016/j.compag.2021.106418

Grill JB, Strub F, Altché F, Tallec C, Richemond PH, Buchatskaya E, et al. Bootstrap your own latent: A new approach to self-supervised Learning. 2020 Jun 13;

Mao W, Liu M, Salzmann M. Weakly-supervised action transition learning for stochastic human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. p. 8151–60. DOI: https://doi.org/10.1109/CVPR52688.2022.00798

Mirikharaji Z, Abhishek K, Bissoto A, Barata C, Avila S, Valle E, et al. A survey on deep learning for skin lesion segmentation. Med Image Anal. 2023;88:102863. DOI: https://doi.org/10.1016/j.media.2023.102863

Kumar P, Kumar V. Exploring the Frontier of Object Detection: A Deep Dive into YOLOv8 and the COCO Dataset. In: 2023 IEEE International Conference on Computer Vision and Machine Intelligence (CVMI). 2023. p. 1–6. DOI: https://doi.org/10.1109/CVMI59935.2023.10464837

Liu Y, Bai X, Wang J, Li G, Li J, Lv Z. Image semantic segmentation approach based on DeepLabV3 plus network with an attention mechanism. Eng Appl Artif Intell. 2024 Jan;127:107260. DOI: https://doi.org/10.1016/j.engappai.2023.107260

Sun Y, Pan B, Fu Y. Lightweight Deep Neural Network for Real-Time Instrument Semantic Segmentation in Robot Assisted Minimally Invasive Surgery. IEEE Robot Autom Lett. 2021 Apr;6(2):3870–7. DOI: https://doi.org/10.1109/LRA.2021.3066956

Niu R, Sun X, Tian Y, Diao W, Chen K, Fu K. Hybrid Multiple Attention Network for Semantic Segmentation in Aerial Images. IEEE Transactions on Geoscience and Remote Sensing. 2022;60:1–18. DOI: https://doi.org/10.1109/TGRS.2021.3065112

Subasi A. Artificial intelligence for 3D medical image analysis. In: Applications of Artificial Intelligence in Healthcare and Biomedicine. Elsevier; 2024. p. 357–75. DOI: https://doi.org/10.1016/B978-0-443-22308-2.00005-6

Lai X, Tian Z, Jiang L, Liu S, Zhao H, Wang L, et al. Semi-supervised semantic segmentation with directional context-aware consistency. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. p. 1205–14. DOI: https://doi.org/10.1109/CVPR46437.2021.00126

Rahman MdA, Amin MFI, Hamada M. Edge Detection Technique by Histogram Processing with Canny Edge Detector. In: 202020 3rd IEEE International Conference on Knowledge Innovation and Invention (ICKII). IEEE; 2020. p. 128–31. DOI: https://doi.org/10.1109/ICKII50300.2020.9318922

Dalal N, Triggs B. Histograms of Oriented Gradients for Human Detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). IEEE; p. 886–93. DOI: https://doi.org/10.1109/CVPR.2005.177

Lowe DG. Distinctive Image Features from Scale-Invariant Keypoints. Int J Comput Vis. 2004 Nov;60(2):91–110. DOI: https://doi.org/10.1023/B:VISI.0000029664.99615.94

Haralick RM, Shanmugam K, Dinstein I. Textural Features for Image Classification. IEEE Trans Syst Man Cybern. 1973 Nov;SMC-3(6):610–21. DOI: https://doi.org/10.1109/TSMC.1973.4309314

Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 3431–40. DOI: https://doi.org/10.1109/CVPR.2015.7298965

Abdulrahman Safar A, Salih DM, Murshid AM. Pattern recognition using the multi-layer perceptron (MLP) for medical disease: A survey. Int J Nonlinear Anal Appl [Internet]. 2023;14:2008–6822. Available from: http://dx.doi.org/10.22075/ijnaa.2022.7114

Ahmed MS, Fakhrudeen AM. COVID-19IraqKirkukDataset: Development and evaluation of an Iraqi dataset for COVID-19 classification based on deep learning. International Journal of Nonlinear Analysis and Applications [Internet]. 2023;14(1):2507–18. Available from: https://ijnaa.semnan.ac.ir/article_7317.html DOI: https://doi.org/10.14704/WEB/V19I1/WEB19071

Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In 2015. p. 234–41. DOI: https://doi.org/10.1007/978-3-319-24574-4_28

Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans Pattern Anal Mach Intell. 2018 Apr 1;40(4):834–48. DOI: https://doi.org/10.1109/TPAMI.2017.2699184

Lin T. Focal Loss for Dense Object Detection. arXiv preprint arXiv:170802002. 2017; DOI: https://doi.org/10.1109/ICCV.2017.324

Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:201011929. 2020;

Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. 2021 May 31;

Rambhatla SS, Chellappa R, Shrivastava A. The pursuit of knowledge: Discovering and localizing novel categories using dual memory. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. p. 9153–63. DOI: https://doi.org/10.1109/ICCV48922.2021.00902

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 770–8. DOI: https://doi.org/10.1109/CVPR.2016.90

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References