Main Article Content
A method for semantic segmentation of RGB images captured by UAVs to detect railway infrastructure elements, including tracks, level crossings, and surrounding vegetation is proposed. The study was conducted at the Łukasiewicz Research Network - Institute of Aviation, where a proprietary, manually annotated UAV RGB dataset was created. Five deep neural network architectures were trained and compared: DeepLabV3+, Feature Pyramid Network (FPN), LinkNet, Pyramid Attention Network (PAN) and X-Unet. These models were chosen for their distinct approaches to semantic segmentation and feature processing. Training was performed on a desktop computer with an NVIDIA GeForce RTX 3080 GPU and tests were made also on an NVIDIA Jetson AGX Orin to assess deployment feasibility under real-time conditions. Experimental results confirm the strong performance of the analyzed models in segmenting railway tracks and surrounding vegetation. FPN achieved the highest scores, followed by X-Unet, DeepLabV3+, LinkNet, and PAN. All models operated reliably on the NVIDIA Jetson AGX Orin edge platform. The proposed solution can support remote monitoring of railway infrastructure and vegetation. It can also be adapted to other applications by adjusting the training dataset and object categories. This research demonstrates the potential of deep learning as a powerful tool for analyzing UAV RGB imagery in engineering and environmental contexts.
Article Details
P. Aela, H.-L. Chi, A. Fares, T. Zayed, and M. Kim. UAV-based studies in railway infrastructure monitoring. Automation in Construction 167:105714, 2024. https://doi.org/10.1016/j.autcon.2024.105714. (Crossref)
P. Anilkumar, P. Venugopal, K. Lokesh, G. NagaJyothi, and M. Nanda kumar. AA-TransDeepLabv3+: a novel semantic segmentation framework for aerial images using adaptive and attentive based Transdeeplabv3+ with hybrid optimization technique. Signal, Image and Video Processing 19(225), 2025. https://doi.org/10.1007/s11760-024-03617-z. (Crossref)
R. Cabral, R. Santos, J. A. F. O. Correia, and D. Ribeiro. A hybrid YOLO and Segment Anything Model pipeline for multi-damage segmentation in UAV inspection imagery. Sensors 25(21):6568, 2025. https://doi.org/10.3390/s25216568. (Crossref)
A. Chandramouli, H. Song, M. Liu, a. Damai, H. S. Narman, et al. Deep learning approaches for railroad infrastructure monitoring: Comparing YOLO and vision transformers for defect detection. In: 2025 IEEE 16th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), pp. 0205-0211, 2025. https://doi.org/10.1109/UEMCON67449.2025.11267623. (Crossref)
A. Chaurasia and E. Culurciello. LinkNet: Exploiting encoder representations for efficient semantic segmentation. In: 2017 IEEE Visual Communications and Image Processing (VCIP), pp. 1-4, 2017. https://doi.org/10.1109/VCIP.2017.8305148. (Crossref)
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(4):834-848, 2018. https://doi.org/10.1109/TPAMI.2017.2699184. (Crossref)
L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam. Rethinking atrous convolution for semantic image segmentation. arXiv, arXiv:1706.05587, 2017. https://doi.org/10.48550/arXiv.1706.05587.
L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 833-851, 2018. https://doi.org/10.1007/978-3-030-01234-2_49. (Crossref)
C. Chenglin, W. Fei, Y. Min, Q. Yong, and B. Yun. Edge-enabled real-time railway track segmentation. arXiv, arXiv:2401.11492, 2024. https://doi.org/10.48550/arXiv.2401.11492.
S. Chilamkurthy. segmentation_models.pytorch: Segmentation models. Python library with neural networks for image segmentation based on PyTorch. GitHub, 2019. https://github.com/chsasank/segmentation_models.pytorch. [Accessed: 2024].
M. Di Summa, M. E. Griseta, N. Mosca, C. Patruno, M. Nitti, et al. A review on deep learning techniques for railway infrastructure monitoring. IEEE Access 11:114638-114661, 2023. https://doi.org/10.1109/ACCESS.2023.3309814. (Crossref)
M. Giunta, V. Barrile, G. Leonardi, and E. Genovese. Comprehensive railway track monitoring using unmanned aerial systems (UASs) and building information modelling (BIM). In: Computational Science and Its Applications - ICCSA 2025 Workshops, vol. 15894 of Lecture Notes in Computer Science, pp. 407-419. Springer, 2025. https://doi.org/10.1007/978-3-031-97648-3_27. (Crossref)
B. A. Khan and J.-W. Jung. Semantic segmentation of aerial imagery using U-Net with self-attention and separable convolutions. Applied Sciences 14(9):3712, 2024. https://doi.org/10.3390/app14093712. (Crossref)
A. Kirillov, R. Girshick, K. He, and P. Dollár. Panoptic feature pyramid networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6392-6401, 2019. https://doi.org/10.1109/CVPR.2019.00656. (Crossref)
Y. Kwon, W. Kim, and H. Kim. HARD: Hardware-aware lightweight real-time semantic segmentation model deployable from edge to GPU. In: Computer Vision - ACCV 2024, Lecture Notes in Computer Science, pp. 252-269, 2024. https://doi.org/10.1007/978-981-96-0963-5_15. (Crossref)
H. Li, P. Xiong, J. An, and L. Wang. Pyramid attention network for semantic segmentation. arXiv, arXiv:1805.10180, 2018. https://doi.org/10.48550/arXiv.1805.10180.
T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, et al. Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936-944, 2017. https://doi.org/10.1109/CVPR.2017.106. (Crossref)
S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia. Path aggregation network for instance segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8759-8768, 2018. https://doi.org/10.1109/CVPR.2018.00913. (Crossref)
M. Lopez-Montiel, D. A. Lopez, and O. Montiel. JetSeg: Efficient real-time semantic segmentation model for low-power GPU-embedded systems. arXiv, arXiv:2305.11419, 2023. https://doi.org/10.48550/arXiv.2305.11419.
Y.-H. Na and D.-K. Kim. Deep learning strategy for UAV-based multi-class damage detection on railway bridges using U-Net with different loss functions. Applied Sciences 15(15):8719, 2025. https://doi.org/10.3390/app15158719. (Crossref)
C. R. Nagarathna. Intelligent aerial surveillance for safer railways using machine learning. International Journal of Innovative Research and Scientific Studies 8(5):1160-1166, 2025. https://doi.org/10.53894/ijirss.v8i5.9077. (Crossref)
S. Qiao, L.-C. Chen, and A. Yuille. DetectoRS: Detecting objects with recursive feature pyramid and switchable atrous convolution. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10208-10219, 2021. https://doi.org/10.1109/CVPR46437.2021.01008. (Crossref)
X. Qin, Z. Zhang, C. Huang, M. Dehghan, O. R. Zaiane, et al. U2-Net: Going deeper with nested U-structure for salient object detection. Pattern Recognition 106:107404, 2020. https://doi.org/10.1016/j.patcog.2020.107404. (Crossref)
O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 234-241, 2015. https://doi.org/10.1007/978-3-319-24574-4_28. (Crossref)
C. Shen, J. Zhang, Y. Ji, T. Xu, L. Jiang, et al. Real-time semantic segmentation for UAV perspectives on embedded platforms. In: Advanced Intelligent Computing Technology and Applications. ICIC 2025, vol. 15842 of Lecture Notes in Computer Science, pp. 425-434. Springer, 2025. https://doi.org/10.1007/978-981-96-9863-9_36. (Crossref)
K. Stypułkowski, P. Gołda, K. Lewczuk, and J. Tomaszewska. Monitoring system for railway infrastructure elements based on thermal imaging analysis. Sensors 21(11):3819, 2021. https://doi.org/10.3390/s21113819. (Crossref)
P. Wang. x-unet: Implementation of a U-net complete with efficient attention as well as the latest research findings. GitHub, 2024. https://github.com/lucidrains/x-unet. [Accessed: 2024].
L. Wen, Y. Peng, M. Lin, N. Gan, and R. Tan. Multi-modal contrastive learning for LiDAR point cloud rail-obstacle detection in complex weather. Electronics 13(1):220, 2024. https://doi.org/10.3390/electronics13010220. (Crossref)
Y. Weng, Z. Li, X. Chen, J. He, F. Liu, et al. A railway track extraction method based on improved DeepLabV3+. Electronics 12(16):3500, 2023. https://doi.org/10.3390/electronics12163500. (Crossref)
Y. Weng, J. Yang, C. Zhang, J. He, C. Peng, et al. An improved DeepLabv3+ railway track extraction algorithm based on densely connected and attention mechanisms. Scientific Reports 15:2556, 2025. https://doi.org/10.1038/s41598-024-84937-5. (Crossref)
M. Xu, Y. Guo, and J. Luo. Lightweight feature pyramid networks for real-time semantic segmentation on edge devices. IEEE Access 10:33645-33655, 2022. https://doi.org/10.1109/ACCESS.2022.3161230.
Z. Zhang and G. Li. UAV imagery real-time semantic segmentation with global-local information attention. Sensors 25(6):1786, 2025. https://doi.org/10.3390/s25061786. (Crossref)
Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang. UNet++: A nested U-Net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support (DLMIA 2018), pp. 3-11, 2018. https://doi.org/10.1007/978-3-030-00889-5_1. (Crossref)
Downloads
- Justyna S. Stypułkowska, Przemysław Rokita, Classification of maize growth stages using deep neural networks with voting classifier , Machine Graphics & Vision: Vol. 33 No. 3/4 (2024)