Main Article Content
In this work, an efficient pedestrian attribute recognition system is introduced. The system is based on a novel processing pipeline that combines the best-performing attribute extraction model with an efficient attribute filtering algorithm using keypoints of human pose. The attribute extraction models are developed based on several state-of-the-art deep networks via transfer learning techniques, including ResNet50, Swin-transformer, and ConvNeXt. Pre-trained models of these networks are fine-tuned using the Ensemble Pedestrian Attribute Recognition (EPAR) dataset. Several optimization techniques, including the advanced optimizer Adam with Decoupled Weight Decay Regularization (AdamW), Random Erasing (RE), and weighted loss functions, are adopted to solve issues of data unbalancing or challenging conditions like partial and occluded bodies. Experimental evaluations are performed via EPAR that contains 26993 images of 1477 person IDs, most of which are in challenging conditions. The results show that the ConvNeXt-v2-B outperforms other networks; mean accuracy (mA) reaches 85.57%, and other indices are also the highest. The addition of AdamW or RE can improve accuracy by 1-2%. The use of new loss functions can solve the issue of data unbalancing, in which the accuracy of data-less attributes improves by a maximum of 14% in the best case. Significantly, when the attribute filtering algorithm is applied, the results are dramatically improved, and mA reaches an excellent value of 94.85%. Utilizing the state-of-the-art attribute extraction model with optimization techniques on the large-scale and diverse dataset and attribute filtering has shown a good approach and thus has a high potential for practical applications.
L. Bourdev, S. Maji, and J. Malik. Describing people: A poselet-based approach to attribute classification. In Proc. 2011 Int. Conf. Computer Vision (ICCV), pages 1543-1550, Barcelona, Spain, 6-13 Nov 2011. IEEE. https://doi.org/10.1109/ICCV.2011.6126413. (Crossref)
W.-C. Chen, X.-Y. Yu, and L.-L. Ou. Pedestrian attribute recognition in video surveillance scenarios based on view-attribute attention localization. Machine Intelligence Research, 19(2):153-168, 2022. https://doi.org/10.1007/s11633-022-1321-8. (Crossref)
X. Cheng, M. Jia, Q. Wang, and J. Zhang. A simple visual-textual baseline for pedestrian attribute recognition. IEEE Transactions on Circuits and Systems for Video Technology, 32(10):6994-7004, 2022. https://doi.org/10.1109/TCSVT.2022.3178144. (Crossref)
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Proc. 2009 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 248-255, Miami, FL, USA, 20-25 Jun 2009. https://doi.org/10.1109/CVPR.2009.5206848. (Crossref)
Y. Deng, P. Luo, C. C. Loy, and X. Tang. Pedestrian attribute recognition at far distance. In Proc. 22nd ACM Int. Conf. Multimedia (MM'14), ACM Conferences, pages 789-792, Orlando, FL, USA, 3-7 Nov 2014. https://doi.org/10.1145/2647868.2654966. (Crossref)
A. Diba, A. M. Pazandeh, H. Pirsiavash, and L. Van Gool. Deepcamp: Deep convolutional action & attribute mid-level patterns. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 3557-3565, Las Vegas, NV, USA, 27-30 Jun 2016. https://doi.org/10.1109/CVPR.2016.387. (Crossref)
H. Galiyawala, M. S. Raval, and M. Patel. Person retrieval in surveillance videos using attribute recognition. Journal of Ambient Intelligence and Humanized Computing, pages 1-13, 2022. https://doi.org/10.1007/s12652-022-03891-0. (Crossref)
G. Gkioxari, R. Girshick, and J. Malik. Actions and attributes from wholes and parts. In Proc. IEEE Int. Conf. Computer Vision (ICCV), pages 2470-2478, Santiago, Chile, 13-16 Dec 2015. https://doi.org/10.1109/ICCV.2015.284. (Crossref)
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 770-778, Las Vegas, NV, USA, 27-30 Jun 2016. https://doi.org/10.1109/CVPR.2016.90. (Crossref)
J. Jia, H. Huang, X. Chen, and K. Huang. Rethinking of pedestrian attribute recognition: A reliable evaluation under zero-shot pedestrian identity setting. arXiv, 2021. arXiv:2107.03576. https://doi.org/10.48550/arXiv.2107.03576.
J. Joo, S. Wang, and S.-C. Zhu. Human attribute recognition by rich appearance dictionary. In Proc. IEEE Int. Conf. Computer Vision (ICCV), pages 721-728, Sydney, Australia, 1-8 Dec 2013. https://doi.org/10.1109/ICCV.2013.95. (Crossref)
D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv, 2014. arXiv:1412.6980. https://doi.org/10.48550/arXiv.1412.6980.
D.-H. Lee. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Proc. Workshop on Challenges in Representation Learning (WREPL), part of Int. Conf. Machine Learning (ICML), page 896. Atlanta, GE, USA, 16-21 Jun 2013.
D. Li, X. Chen, and K. Huang. Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In Proc. 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pages 111-115, Kuala Lumpur, Malaysia, 3-6 Nov 2015. IEEE. https://doi.org/10.1109/ACPR.2015.7486476. (Crossref)
D. Li, X. Chen, Z. Zhang, and K. Huang. Pose guided deep model for pedestrian attribute recognition in surveillance scenarios. In Proc. 2018 IEEE Int. Conf. Multimedia and Expo (ICME), pages 1-6, San Diego, CA, USA, 23-27 Jul 2018. https://doi.org/10.1109/ICME.2018.8486604. (Crossref)
D. Li, Z. Zhang, X. Chen, and K. Huang. A richly annotated pedestrian dataset for person retrieval in real surveillance scenarios. IEEE Transactions on Image Processing, 28(4):1575-1590, 2018. https://doi.org/10.1109/TIP.2018.2878349. (Crossref)
Y. Li, C. Huang, C. C. Loy, and X. Tang. Human attribute recognition by deep hierarchical contexts. In Computer Vision, Proc. 14th European Conf. Computer Vision (ECCV 2016), volume 9910 Part VI of Lecture Notes in Computer Science, pages 684-700, Amsterdam, The Netherlands, 11-14 Oct 2016. Springer. https://doi.org/10.1109/10.1007/978-3-319-46466-4_41. (Crossref)
Y. Lin, L. Zheng, Z. Zheng, Y. Wu, Z. Hu, C. Yan, and Y. Yang. Improving person re-identification by attribute and identity learning. Pattern Recognition, 95:151-161, 2019. https://doi.org/10.1016/j.patcog.2019.06.006. (Crossref)
P. Liu, X. Liu, J. Yan, and J. Shao. Localization guided learning for pedestrian attribute recognition. In Proc. British Machine Vision Conference (BMVC 2018), Northumbria, UK, 3-6 Sep 2018. BMVA Press. https://bmva-archive.org.uk/bmvc/2018/contents/papers/0573.pdf.
X. Liu, H. Zhao, M. Tian, L. Sheng, J. Shao, S. Yi, J. Yan, and X. Wang. Hydraplus-net: Attentive deep features for pedestrian analysis. In Proc. IEEE Int. Conf. Computer Vision (ICCV), pages 350-359, Venice, Italy, 22-29 Oct 2017. https://doi.org/10.1109/ICCV.2017.46. (Crossref)
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), pages 10012-10022, Montreal, QC, Canada, 10-17 Oct 2021. https://doi.org/10.1109/ICCV48922.2021.00986. (Crossref)
Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie. A ConvNet for the 2020s. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), pages 11976-11986, New Orleans, LA, USA, 18-24 Jun 2022. https://doi.org/10.1109/CVPR52688.2022.01167. (Crossref)
I. Loshchilov and F. Hutter. Decoupled weight decay regularization. In Proc. 7th Int. Conf. Learning Representations (ICLR), New Orleans, LA, USA, 6-9 May 2019. https://openreview.net/forum?id=Bkg6RiCqY7.
D. Maji, S. Nagori, M. Mathew, and D. Poddar. YOLO-Pose: Enhancing YOLO for multi person pose estimation using object keypoint similarity loss. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition Workshops (CVPRW), pages 2636-2645, New Orleans, LA, USA, 19-20 Jun 2022. https://doi.org/10.1109/CVPRW56347.2022.00297. (Crossref)
OpenCV Team. OpenCV, 2022. https://opencv.org. [Accessed 15 Jan 2022].
H. X. Nguyen, D. N. Hoang, T. V. Nguyen, T. M. Dang, A. D. Pham, and D.-T. Nguyen. Person re-identification from multiple surveillance cameras combining face and body feature matching. Modern Physics Letters B, 37(19):2340031, 2023. https://doi.org/10.1142/S0217984923400316. (Crossref)
A. Specker, M. Cormier, and J. Beyerer. UPAR: Unified Pedestrian Attribute Recognition and person retrieval. In Proc. 2023 IEEE/CVF Winter Conf. Applications of Computer Vision (WACV), pages 981-990, Los Alamitos, CA, USA, 3-7 Jan 2023. https://doi.org/10.1109/WACV56688.2023.00104. (Crossref)
Z. Tan, Y. Yang, J. Wan, G. Guo, and S. Z. Li. Relation-aware pedestrian attribute recognition with graph convolutional networks. In Proc. AAAI Conf. Artificial Intelligence, volume 34 of AAAI-20 Technical Tracks 7, pages 12055-12062, New York, NY, USA, 7-12 Feb 2020. AAAI Press. https://doi.org/10.1609/aaai.v34i07.6883. (Crossref)
C. Y. Wang, A. Bochkovskiy, and H. Y. M. Liao. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), pages 7464-7475, Vancouver, Canada, 18-22 Jun 2023. https://openaccess.thecvf.com/content/CVPR2023/html/Wang_YOLOv7_Trainable_Bag-of-Freebies_Sets_New_State-of-the-Art_for_Real-Time_Object_Detectors_CVPR_2023_paper.html.
X. Wang, S. Zheng, R. Yang, A. Zheng, Z. Chen, J. Tang, and B. Luo. Pedestrian attribute recognition: A survey. Pattern Recognition, 121:108220, 2022. https://doi.org/10.1016/j.patcog.2021.108220. (Crossref)
L. Wei, S. Zhang, W. Gao, and Q. Tian. Person transfer GAN to bridge domain gap for person re-identification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 79-88, Salt Lake City, UT, USA, 18-23 Jun 2018. https://doi.org/10.1109/CVPR.2018.00016. (Crossref)
S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, and S. Xie. ConvNeXt V2: Co-designing and scaling ConvNets with masked autoencoders. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 18-22 Jun 2023. https://openaccess.thecvf.com/content/CVPR2023/html/Woo_ConvNeXt_V2_Co-Designing_and_Scaling_ConvNets_With_Masked_Autoencoders_CVPR_2023_paper.html.
N. Zhang, M. Paluri, M'A. Ranzato, T. Darrell, and L. Bourdev. PANDA: Pose Aligned Networks for Deep Attribute modeling. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 1637-1644, Columbus, OH, USA, 23-28 Jun 2014. https://doi.org/10.1109/CVPR.2014.212. (Crossref)
S. Zhang, Z. Li, S. Yan, X. He, and J. Sun. Distribution alignment: A unified framework for long-tail visual recognition. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), pages 2361-2370, Nashville, TN, USA, 20-25 Jun 2021. https://doi.org/10.1109/CVPR46437.2021.00239. (Crossref)
Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang. Random erasing data augmentation. In Proc. AAAI Conf. Artificial Intelligence, volume 34 of AAAI-20 Technical Tracks 7, pages 13001-13008, New York, NY, USA, 7-12 Feb 2020. AAAI Press. https://doi.org/10.1609/aaai.v34i07.7000. (Crossref)
J. Zhu, S. Liao, D. Yi, Z. Lei, and S. Z. Li. Multi-label CNN based pedestrian attribute learning for soft biometrics. In Proc. 2015 Int. Conf. Biometrics (ICB), pages 535-540, Phuket, Thailand, 19-22 May 2015. IEEE. https://doi.org/10.1109/ICB.2015.7139070. (Crossref)