Event detection system for the penitentiary institutions using multimodal data and deep networks

Main Article Content

Piotr Bilski
Marcin Lewandowski
Adrian Bilski
Andrzej Buchowicz
Jacek Olejnik
Paweł Mazurek
Konrad Jędrzejewski


Keywords : deep learning, posture-based event detection, multimodal analysis
Abstract

The aim of the paper is to present the distributed system for the unwanted event detection regarding inmates in the closed penitentiary facilities. The system processes large number of data streams from IP cameras (up to 180) and performs the event detection using Deep Learning neural networks. Both audio and video streams are processed to produce the classification outcome. The application-specific data set has been prepared for training the neural models. For the particular event types 3DCNN and YOLO architectures have been used. The system was thoroughly tested both in the laboratory conditions and in the actual facility. Accuracy of the particular event detection is on the satisfactory level, though problems with the particular events have been reported and will be dealt with in the future.

Article Details

How to Cite
Bilski, P., Lewandowski, M., Bilski, A., Buchowicz, A., Olejnik, J., Mazurek, P., & Jędrzejewski, K. (2023). Event detection system for the penitentiary institutions using multimodal data and deep networks. Machine Graphics and Vision, 32(3/4), 233–254. https://doi.org/10.22630/MGV.2023.32.3.12
References

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, et al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. https://www.tensorflow.org/, Software available from tensorflow.org.

S. Abu-El-Haija, N. Kothari, J. Lee, P. Natsev, G. Toderici, et al. YouTube-8M: A large-scale video classification benchmark. arXiv, 2016. ArXiv.1609.08675. https://doi.org/10.48550/arXiv.1609.08675.

A. Al Ibrahim, G. Abosamra, and M. Dahab. Deep convolutional framework for abnormal behaviour detection in a smart surveillance system. Engineering Applications of Artificial Intelligence, 67:226-234, 2018. https://doi.org/10.1016/j.engappai.2017.10.001. (Crossref)

A. Al Ibrahim, G. Abosamra, and M. Dahab. Real-time anomalous behavior detection of students in examination rooms using neural networks and Gaussian distribution. International Journal of Scientific and Engineering Research, 9(10):1716-1724, 2018. https://doi.org/10.14299/ijser.2018.10.15. (Crossref)

A. S. Alturki, A. H. Ibrahim, and F. H. Shaik. Real time action recognition in surveillance video using machine learning. International Journal of Engineering Research and Technology, 13(8):1874-1879, 2020. https://doi.org/10.37624/IJERT/13.8.2020.1874-1879. (Crossref)

C. Amrutha, C. Jyotsna, and J. Amudha. Deep learning approach for suspicious activity detection from surveillance video. In: Proc. 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), pp. 335-339, 2020. https://doi.org/10.1109/ICIMIA48430.2020.9074920. (Crossref)

E. Bermejo Nievas, O. Deniz Suarez, G. Bueno García, and R. Sukthankar. Violence detection in video using computer vision techniques. In: P. Real, D. Diaz-Pernil, H. Molina-Abril, A. Berciano, and W. Kropatsch, eds., Proc. Conf. Computer Analysis of Images and Patterns (CAIP), vol. 6855 of Lecture Notes in Computer Science, pp. 332-339. Springer Berlin Heidelberg, 2011. (Crossref)

G. Bertasius, H. Wang, and L. Torresani. Is space-time attention all you need for video understanding? In: M. Meila and T. Zhang, eds., Proc. 38th International Conference on Machine Learning, vol. 139 of Proceedings of Machine Learning Research, pp. 813-824. PMLR, 2021. https://proceedings.mlr.press/v139/bertasius21a.html.

M. Bianculli, N. Falcionelli, P. Sernani, S. Tomassini, P. Contardo, et al. A dataset for automatic violence detection in videos. Data in Brief, 33:106587, 2020. https://doi.org/10.1016/j.dib.2020.106587. (Crossref)

G. Bradski. The OpenCV library. Dr. Dobb's Journal: Software Tools for the Professional Programmer, 25(11):120-123, 2000.

J. Carreira and A. Zisserman. Quo vadis, action recognition? A new model and the kinetics dataset. In: Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724-4733, 2017. https://doi.org/10.1109/CVPR.2017.502. (Crossref)

M. Cheng, K. Cai, and M. Li. RWF-2000: An open large scale video database for violence detection. In: Proc. 2020 25th International Conference on Pattern Recognition (ICPR), pp. 4183-4190, 2021. https://doi.org/10.1109/ICPR48806.2021.9412502. (Crossref)

P. Dasari, L. Zhang, Y. Yu, H. Huang, and R. Gao. Human action recognition using hybrid deep evolving neural networks. In: Proc. 2022 International Joint Conference on Neural Networks (IJCNN), pp. 1-8, 2022. https://doi.org/10.1109/IJCNN55064.2022.9892025. (Crossref)

S. R. Dinesh Jackson, E. Fenil, M. Gunasekaran, G. Vivekananda, T. Thanjaivadivel, et al. Real time violence detection framework for football stadium comprising of big data analysis and deep learning through bidirectional LSTM. Computer Networks, 151:191-200, 2019. https://doi.org/10.1016/j.comnet.2019.01.028. (Crossref)

J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, et al. Long-term recurrent convolutional networks for visual recognition and description. In: Proc. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2625-2634, 2015. https://doi.org/10.1109/CVPR.2015.7298878. (Crossref)

H. Fan, B. Xiong, K. Mangalam, Y. Li, Z. Yan, et al. Multiscale vision transformers. In: Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6804-6815, 2021. https://doi.org/10.1109/ICCV48922.2021.00675. (Crossref)

G. Farnebäck. Two-frame motion estimation based on polynomial expansion. In: J. Bigun and T. Gustavsson, eds., Image Analysis. Proc. 13th Scandinavian Conference (SCIA) 2003, vol. 2749 of Lecture Notes in Computer Science, pp. 363-370. Springer Berlin Heidelberg, 2003. https://doi.org/10.1007/3-540-45103-X_50. (Crossref)

C. Feichtenhofer, A. Pinz, and A. Zisserman. Convolutional two-stream network fusion for video action recognition. In: Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1933-1941, 2016. https://doi.org/10.1109/CVPR.2016.213. (Crossref)

S. Ganta, D. S. Desu, A. Golla, and M. A. Kumar. Human action recognition using computer vision and deep learning techniques. In: Proc. 2023 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA), pp. 1-5, 2023. https://doi.org/10.1109/ACCTHPA57160.2023.10083351. (Crossref)

B. K. P. Horn and B. G. Schunck. Determining optical flow. Artificial Intelligence, 17(1):185-203, 1981. https://doi.org/10.1016/0004-3702(81)90024-2. (Crossref)

S. Ji, W. Xu, M. Yang, and K. Yu. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1):221-231, 2013. https://doi.org/10.1109/TPAMI.2012.59. (Crossref)

B. Jiang, F. Xu, W. Tu, and C. Yang. Channel-wise attention in 3D convolutional networks for violence detection. In: Proc. 2019 International Conference on Intelligent Computing and its Emerging Applications (ICEA), pp. 59-64, 2019. https://doi.org/10.1109/ICEA.2019.8858306. (Crossref)

W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, et al. The kinetics human action video dataset. arXiv, 2017. ArXiv.1705.06950. https://doi.org/10.48550/arXiv.1705.06950.

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. HMDB: A large video database for human motion recognition. In: Proc. 2011 International Conference on Computer Vision (ICCV), pp. 2556-2563, 2011. https://doi.org/10.1109/ICCV.2011.6126543. (Crossref)

B. D. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In: Proc. 7th Int. Joint Conf. Artificial Intelligence (IJCAI) 1981, pp. 674-679, 24-28 Aug 1981. https://hal.science/hal-03697340.

D. Maji, S. Nagori, M. Mathew, and D. Poddar. YOLO-Pose: Enhancing YOLO for multi person pose estimation using object keypoint similarity loss. arXiv, 2022. ArXiv.2204.06806. https://doi.org/10.48550/arXiv.2204.06806. (Crossref)

A. Nakajima, Y. Hoshino, K. Motegi, and Y. Shiraishi. Human action recognition based on self- organizing map in surveillance cameras. In: Proc. 2020 59th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE), pp. 1610-1615, 2020. https://doi.org/10.23919/SICE48898.2020.9240260. (Crossref)

J. Y.-H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, et al. Beyond short snippets: Deep networks for video classification. In: Proc. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4694-4702, 2015. https://doi.org/10.1109/CVPR.2015.7299101. (Crossref)

NVIDIA, P. Vingelmann, and F. H. P. Fitzek. CUDA, release: 10.2.89, 2020. https://developer.nvidia.com/cuda-toolkit.

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, et al. PyTorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32 - Proc. 33rd Conf. Neural Information Processing Systems (NeurIPS 2019), vol. 11, pp. 8024-8035. Vancouver, Canada, 8-14 Dec 2019. Accessible in arXiv. https://doi.org/10.48550/arXiv.1912.01703.

N. S. Rao, G. Shanmugapriya, S. Vinod, R. S, S. P. Mallick, et al. Detecting human behavior from a silhouette using convolutional neural networks. In: Proc. 2023 Second International Conference on Electronics and Renewable Systems (ICEARS), pp. 943-948, 2023. https://doi.org/10.1109/ICEARS56392.2023.10085686. (Crossref)

P. Sernani, N. Falcionelli, S. Tomassini, P. Contardo, and A. F. Dragoni. Deep learning for automatic violence detection: Tests on the AIRTLab dataset. IEEE Access, 9:160580-160595, 2021. https://doi.org/10.1109/ACCESS.2021.3131315. (Crossref)

K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In: Proc. 27th International Conference on Neural Information Processing Systems, vol. 27 of NIPS Proceedings, p. 568–576, 2014. https://ora.ox.ac.uk/objects/uuid:1dd0bcd0-39ca-48a1-9c20-5341d6c49251.

K. Soomro, A. R. Zamir, and M. Shah. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv, 2012. ArXiv.1212.0402. https://doi.org/10.48550/arXiv.1212.0402.

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3D convolutional networks. In: Proc. 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4489-4497, 2015. https://doi.org/10.1109/ICCV.2015.510. (Crossref)

C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proc. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464-7475, 2023. https://doi.org/10.1109/CVPR52729.2023.00721. (Crossref)

L. Wang, Y. Xiong, Z. Wang, and Y. Qiao. Towards good practices for very deep two-stream ConvNets. arXiv, 2015. ArXiv.1507.02159. https://doi.org/10.48550/arXiv.1507.02159.

Y. Zhu, X. Li, C. Liu, M. Zolfaghari, Y. Xiong, et al. A comprehensive study of deep video action recognition. arXiv, 2020. ArXiv.2012.06567. https://doi.org/10.48550/arXiv.2012.06567.

Statistics

Downloads

Download data is not yet available.