A video-based fall detection using 3D sparse convolutional neural network in elderly care services

Main Article Content

Fangping Fu


Keywords : 3D convolutional neural network, sparse convolution, fall detection, jitter buffer
Abstract

Falls in the elderly have become one of the major risks for the growing elderly population. Therefore, the application of automatic fall detection system for the elderly is particularly important. In recent years, a large number of deep learning methods (such as CNN) have been applied to such research. This paper proposed a sparse convolution method 3D Sparse Convolutions and the corresponding 3D Sparse Convolutional Neural Network (3D-SCNN), which can achieve faster convolution at the approximate accuracy, thereby reducing computational complexity while maintaining high accuracy in video analysis and fall detection task. Additionally, the preprocessing stage involves a dynamic key frame selection method, using the jitter buffers to adjust frame selection based on current network conditions and buffer state. To ensure feature continuity, overlapping cubes of selected frames are intentionally employed, with dynamic resizing to adapt to network dynamics and buffer states. Experiments are conducted on Multi-camera fall dataset and UR fall dataset, and the results show that its accuracy exceeds the three compared methods, and outperforms the traditional 3D-CNN methods in both accuracy and losses.

Article Details

How to Cite
Fu, F. (2025). A video-based fall detection using 3D sparse convolutional neural network in elderly care services. Machine Graphics and Vision, 34(1), 53–74. https://doi.org/10.22630/MGV.2025.34.1.3
References

T. Alanazi, K. Babutain, and M. Ghulam. Mitigating human fall injuries: A novel system utilizing 3D 4-stream convolutional neural networks and image fusion. Image and Vision Computing 148:105153. 2024. https://doi.org/10.1016/j.imavis.2024.105153. (Crossref)

T. Alanazi and G. Muhammad. Human fall detection using 3D multi-stream convolutional neural networks with fusion. Diagnostics 12(12):3060. 2022. https://doi.org/10.3390/diagnostics12123060. (Crossref)

U. Asif, B. S. Mashford, S. von Cavallar, S. A. C. Yohanandan, S. Roy, et al. Privacy preserving human fall detection using video data. In: Proceedings of the Machine Learning for Health NeurIPS Workshop, vol. 116 of Proceedings of Machine Learning Research, pp. 39-51. 2020. https://proceedings.mlr.press/v116/asif20a/asif20a.html.

E. Auvinet, C. Rougier, J. Meunier, A. St-Arnaud, and J. Rousseau. Multiple cameras fall dataset. Technical Report 1350, DIRO-Université de Montréal. Jul 2010. https://www.iro.umontreal.ca/~labimage/Dataset/.

O. Barnich and M. V. Droogenbroeck. ViBe: A universal background subtraction algorithm for video sequences. IEEE Transactions on Image Processing 20(6):1709-1724. 2011. https://doi.org/10.1109/TIP.2010.2101613. (Crossref)

D. Blazer, T. Lustig, and M. Kearney. Social isolation and loneliness in older adults: Opportunities for the health care system. TR news: Transportation Research Jan./Feb. TN.331. 2021. https://doi.org/10.17226/25663.

J. Carreira and A. Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724-4733. 2017. https://doi.org/10.48550/arXiv.1705.07750. (Crossref)

Y. Cinar, P. Pocta, D. Chambers, and H. Melvin. Improved jitter buffer management for WebRTC. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17(1):30. 2021. https://doi.org/10.1145/3410449. (Crossref)

H. W. Di, C. Y. Luo, and X. C. Cai. Research and application of ONVIF protocol in IP camera. In: Measurement Technology and its Application III, vol. 568 of Applied Mechanics and Materials, pp. 1399-1402. Trans Tech Publications Ltd. 2014. https://doi.org/10.4028/www.scientific.net/AMM.568-570.1399. (Crossref)

R. Espinosa, H. Ponce, S. Gutiérrez, L. Martínez-Villaseñor, J. Brieva, et al. Application of convolutional neural networks for fall detection using multiple cameras. In: Challenges and Trends in Multimodal Fall Detection for Healthcare, pp. 97-120. Springer International Publishing, Cham. 2020. https://doi.org/10.1007/978-3-030-38748-8_5. (Crossref)

C. Feichtenhofer, H. Fan, J. Malik, and K. He. Slowfast networks for video recognition. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6201-6210. 2019. https://doi.org/10.1109/ICCV.2019.00630. (Crossref)

L. B. Freire, J. P. Brasilneto, L. D. S. Marianne, M. G. Cruz Miranda, et al. Risk factors for falls in older adults with diabetes mellitus: systematic review and meta-analysis. BMC Geriatrics 24(1):201. 2024. https://doi.org/10.1186/s12877-024-04668-0. (Crossref)

R. Jayaswal, A. Pathak, and S. Mahajan. Integrating 3dcnn attention mechanism with pose estimation for indoor fall detection. Available at SSRN, preprint 4883239. https://doi.org/10.2139/ssrn.4883239. (Crossref)

S. Jiang, N. Liu, and G. Yang. Design and implementation of WebRTC video conference system structure compatible with GB/T28181 devices*. In: Proceedings of the 2022 International Conference on Computer Science, Information Engineering and Digital Economy (CSIEDE 2022), pp. 411-420. Atlantis Press. 2022. https://doi.org/10.2991/978-94-6463-108-1_47. (Crossref)

M. Kepski and B. Kwolek. Fall detection on embedded platform using Kinect and wireless accelerometer. In: Computers Helping People with Special Needs (ICCHP 2012), vol. 7383 of Lecture Notes in Computer Science, pp. 407-414. 2012. https://doi.org/10.1007/978-3-642-31534-3_60. (Crossref)

A. I. Khan, S. Jain, and P. Sharma. A new approach for human identification using ai. 2022 International Mobile and Embedded Technology Conference (MECON) pp. 645-651. 2022. https://doi.org/10.1109/MECON53876.2022.9752153. (Crossref)

F. Kinoshita and H. Takada. Numerical analysis of stochastic differential equations describing body sway while viewing 3D video clips. Mechatronic Systems and Control 47(2):98-105. 2019. https://doi.org/10.2316/J.2019.201-2995. (Crossref)

D. Kraft, K. Srinivasan, and G. Bieber. Deep learning based fall detection algorithms for embedded systems, smartwatches, and IoT devices using accelerometers. Technologies 8(4):72. 2020. https://doi.org/10.3390/technologies8040072. (Crossref)

N. Lu, Y. Wu, L. Feng, and J. Song. Deep learning for fall detection: Three-dimensional CNN combined with LSTM on video kinematic data. IEEE Journal of Biomedical and Health Informatics 23(1):314-323. 2019. https://doi.org/10.1109/JBHI.2018.2808281. (Crossref)

N. Mamchur, N. Shakhovska, and M. Gregus ml. Person fall detection system based on video stream analysis. Procedia Computer Science 198:676-681. 2022. 12th International Conference on Emerging Ubiquitous Systems and Pervasive Networks / 11th International Conference on Current and Future Trends of Information and Communication Technologies in Healthcare. https://doi.org/10.1016/j.procs.2021.12.305. (Crossref)

National Academies of Sciences, Engineering, and Medicine. Social Isolation and Loneliness in Older Adults: Opportunities for the Health Care System. The National Academies Press, Washington, DC. 2020. https://doi.org/10.17226/25663. (Crossref)

J. Nogas, S. S. Khan, and A. Mihailidis. DeepFall: Non-invasive fall detection with deep spatio-temporal convolutional autoencoders. Journal of Healthcare Informatics Research 4:50-70. 2019. https://doi.org/10.1007/s41666-019-00061-4. (Crossref)

N. Noor and I. K. Park. A lightweight skeleton-based 3D-CNN for real-time fall detection and action recognition. In: 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. 2179-2188. 2023. https://doi.org/10.1109/ICCVW60793.2023.00232. (Crossref)

A. Núñez-Marcos, G. Azkune, and I. Arganda-Carreras. Vision-based fall detection with convolutional neural networks. Wireless Communications and Mobile Computing 2017(1):9474806. 2017. https://doi.org/10.1155/2017/9474806. (Crossref)

Z. Qiu, T. Yao, and T. Mei. Learning spatio-temporal representation with pseudo-3D residual networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5534-5542. 2017. https://doi.org/10.1109/ICCV.2017.590. (Crossref)

C. Rougier, J. Meunier, A. St-Arnaud, and J. Rousseau. Robust video surveillance for fall detection based on human shape deformation. IEEE Transactions on Circuits and Systems for Video Technology 21(5):611-622. 2011. https://doi.org/10.1109/TCSVT.2011.2129370. (Crossref)

E. Schooler, J. Rosenberg, H. Schulzrinne, A. Johnston, G. Camarillo, et al. SIP: Session Initiation Protocol. In: RFC, no. 3261 in Request for Comments. RFC Editor. Jul 2002. https://doi.org/10.17487/RFC3261. (Crossref)

C. Silver and T. Akilan. A novel approach for fall detection using thermal imaging and a stacking ensemble of autoencoder and 3D-CNN models. In: 2023 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 71-76. 2023. https://doi.org/10.1109/CCECE58730.2023.10288941. (Crossref)

B. Sredojev, D. Samardzija, and D. Posarac. WebRTC technology overview and signaling solution design and implementation. In: 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1006-1009. 2015. https://doi.org/10.1109/MIPRO.2015.7160422. (Crossref)

C. Su, J. Wei, D. Lin, L. Kong, and Y. L. Guan. A novel model for fall detection and action recognition combined lightweight 3D-CNN and convolutional LSTM networks. Pattern Analysis and Applications 27(1):3. 2024. https://doi.org/10.1007/s10044-024-01224-9. (Crossref)

M. M. Sylaja and J. Kurian. Robot task recognition using deep convolutional long short-term memory. Mechatronic Systems and Control 51(2):106-113. 2023. https://doi.org/10.2316/J.2023.201-0353. (Crossref)

A. Sánchez-Caballero, S. de López-Diz, D. Fuentes-Jimenez, C. Losada-Gutiérrez, M. Marrón-Romera, et al. 3DFCNN: real-time action recognition using 3D deep neural networks with raw depth information. Multimedia Tools and Applications 81(17):24119-24143. 2022. https://doi.org/10.1007/s11042-022-12091-z. (Crossref)

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3D convolutional networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4489-4497. 2015. https://doi.org/10.1109/ICCV.2015.510. (Crossref)

D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, et al. A closer look at spatiotemporal convolutions for action recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6450-6459. 2018. https://doi.org/10.1109/CVPR.2018.00675. (Crossref)

X. Xiong, W. Min, W. Zheng, P. Liao, H. Yang, et al. S3D-CNN: skeleton-based 3D consecutive-low-pooling neural network for fall detection. Applied Intelligence 50:3521-3534. 2020. https://doi.org/10.1007/s10489-020-01751-y. (Crossref)

C. Xu. Extracting and recognising music features through multi-modal emotion recognition. Mechatronic Systems and Control 52(3):140-146. 2024. https://doi.org/10.2316/j.2024.201-0380. (Crossref)

J. Zou and H. Zhang. New key point detection technology under real-time eye tracking. Mechatronic Systems and Control 47(2):71-76. 2019. https://doi.org/10.2316/J.2019.201-2969. (Crossref)

Statistics

Downloads

Download data is not yet available.
Recommend Articles