Intelligent extraction and layout optimization of digital media visual elements based on computer vision

Main Article Content

Hebin Wu


Keywords : digital media, layout optimization, SAM, ViT, PWCNet, MOFA
Abstract

In the field of digital media, intelligent extraction and layout optimization of visual elements face challenges such as inaccurate semantic understanding of elements and low efficiency in generating layout strategies. This study proposes an extraction and layout optimization model that integrates visual semantic understanding with intelligent optimization strategies, based on a segmentation Vision Transformer and Multi-Objective Firefly Algorithm. The model also utilizes the improved optical flow methods to efficiently capture dynamic information during the design process. Experimental results show that the segmentation Vision Transformer algorithm achieves an extraction accuracy of 98.8±0.2% for different categories of visual elements. As the training progresses to 50 iterations, the average Intersection-Over-Union stabilizes at 0.95, and the harmonic mean of recall reaches 98.17±0.38\%. The evaluation of the integrated model shows that it achieves 99% accuracy in extracting visually similar elements. After layout optimization using the model, the aesthetic score increases to 95.6, and the spatial occupancy rate improves to 97.2%. The above results indicate that the model proposed by the research institute can effectively enhance the accuracy of visual element extraction and the quality of layout optimization, significantly reducing the reliance of traditional methods on manual rules, and providing an efficient and adaptive solution for the automated design of digital media.

Article Details

How to Cite
Wu, H. (2026). Intelligent extraction and layout optimization of digital media visual elements based on computer vision. Machine Graphics & Vision, 35(1), 25–49. https://doi.org/10.22630/MGV.2026.35.1.2
References

J. D. Blair, K. M. Gaynor, M. S. Palmer, and K. E. Marshall. A gentle introduction to computer vision-based specimen classification in ecological datasets. Journal of Animal Ecology 93(2):147-158, 2024. https://doi.org/10.1111/1365-2656.14042. (Crossref)

F. Chen, L. Chen, H. Han, S. Zhang, D. Zhang, et al. The ability of Segmenting Anything Model (SAM) to segment ultrasound images. BioScience Trends 17(3):211-218, 2023. https://doi.org/10.5582/bst.2023.01128. (Crossref)

J. Deng, A. Berg, S. Satheesh, H. Su, A. Khosla, et al. ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC2012). IMAGENET, 2012. https://www.image-net.org/challenges/LSVRC/2012/.

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, et al. ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248-255, 2009. https://doi.org/10.1109/CVPR.2009.5206848. (Crossref)

J.-H. Ha and H. Lee. A deep learning model for precipitation nowcasting using multiple optical flow algorithms. Weather and Forecasting 39(1):41-53, 2024. https://doi.org/10.1175/WAF-D-23-0104.1. (Crossref)

E. Hassan, M. Y. Shams, N. A. Hikal, and S. Elmougy. The effect of choosing optimizer algorithms to improve computer vision tasks: A comparative study. Multimedia Tools and Applications 82(11):16591-16633, 2023. https://doi.org/10.1007/s11042-022-13820-0. (Crossref)

S. Khoubani and M. H. Moradi. A deep learning phase-based solution in 2D echocardiography motion estimation. Physical and Engineering Sciences in Medicine 47(4):1691-1703, 2024. https://doi.org/10.1007/s13246-024-01481-2. (Crossref)

S. Kitada. huggingface-datasets_Magazine. GitHub, 2023. https://github.com/creative-graphic-design/huggingface-datasets_Magazine.

A. Krizhevsky. Learning Multiple Layers of Features from Tiny Images. Master's thesis, University of Toronto, 2009. https://www.cs.toronto.edu/ kriz/learning-features-2009-TR.pdf.

A. Krizhevsky, V. Nair, and G. Hinton. The CIFAR-10 dataset. In Alex Krizhevsky home page, 2009. https://www.cs.toronto.edu/ kriz/cifar.html.

M. Y. Landolsi, L. Hlaoua, and L. Ben Romdhane. Information extraction from electronic medical documents: state of the art and future research directions. Knowledge and Information Systems 65(2):463-516, 2023. https://doi.org/10.1007/s10115-022-01779-1. (Crossref)

C. Li, Y. Huang, W. Li, H. Liu, X. Liu, et al. Flaws can be applause: Unleashing potential of segmenting ambiguous objects in SAM. Advances in Neural Information Processing Systems 37:45578-45599, 2024. https://doi.org/10.52202/079017-1449. (Crossref)

J. Li, Z. Zhou, J. Yang, A. Pepe, C. Gsaxner, et al. MedShapeNet – a large-scale dataset of 3D medical shapes for computer vision. Biomedical Engineering / Biomedizinische Technik 70(1):71-90, 2025. https://doi.org/10.1515/bmt-2024-0396. (Crossref)

H. B. Mahajan, N. Uke, P. Pise, M. Shahade, V. G. Dixit, et al. Automatic robot manoeuvres detection using computer vision and deep learning techniques: a perspective of internet of robotics things (IoRT). Multimedia Tools and Applications 82(15):23251-23276, 2023. https://doi.org/10.1007/s11042-022-14253-5. (Crossref)

T. Onyejelem and A. Eric Msughter. Digital generative multimedia tool theory (DGMTT): A theoretical postulation. Journalism and Mass Communication 14(3):189-204, 2024. https://doi.org/10.17265/2160-6579/2024.03.004. (Crossref)

A. S. Ortega-Calvo, R. Morcillo-Jimenez, C. Fernandez-Basso, K. Gutiérrez-Batista, M. A. Vila, et al. AIMDP: An artificial intelligence modern data platform. Use case for Spanish national health service data silo. Future Generation Computer Systems 143:248-264, 2023. https://doi.org/10.1016/j.future.2023.02.002. (Crossref)

E. W. Prastyaningtyas, A. M. A. Ausat, L. F. Muhamad, M. I. Wanof, and S. Suherlan. The role of information technology in improving human resources career development. Jurnal Teknologi Dan Sistem Informasi Bisnis 5(3):266-275, 2023. https://doi.org/10.47233/jteksis.v5i3.870. (Crossref)

B. Rokh, H. Mirvaziri, and M. H. Olyaee. A new evolutionary optimization based on multi-objective firefly algorithm for mining numerical association rules. Soft Computing 28(9):6879-6892, 2024. https://doi.org/10.1007/s00500-023-09558-y. (Crossref)

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, et al. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115(3):211-252, 2015. https://doi.org/10.1007/s11263-015-0816-y. (Crossref)

S. R. Shen, J. Balakrishnan, and C. H. Cheng. Dynamic content layout optimization for news website front pages. Journal of Modelling in Management 19(6):1907-1926, 2024. https://doi.org/10.1108/JM2-01-2024-0015. (Crossref)

K. Subramanian, F. Hajamohideen, V. Viswan, N. Shaffi, and M. Mahmud. Exploring intervention techniques for Alzheimer's disease: Conventional methods and the role of AI in advancing care. Artificial Intelligence and Applications 2(2):59-77, 2024. https://doi.org/10.47852/bonview42022497. (Crossref)

D. Sun, X. Yang, M.-Y. Liu, and J. Kautz. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8934-8943, 2018. https://doi.org/10.1109/CVPR.2018.00931. (Crossref)

K. Tan, J. Wu, H. Zhou, Y. Wang, and J. Chen. Integrating advanced computer vision and AI algorithms for autonomous driving systems. Journal of Theory and Practice of Engineering Science 4(1):41-48, 2024. https://doi.org/10.53469/jtpes.2024.04(01).06. https://centuryscipub.com/index.php/jtpes/article/view/427.

P. Tank. Social media post dataset. Kaggle Dataset, 2024. https://www.kaggle.com/datasets/prishatank/post-generator-dataset/data.

F. Tiago. ecommerce-product-dataset. GitHub Repository, 2025. https://github.com/octaprice/ecommerce-product-dataset.

D. Wang, J. Zhang, B. Du, M. Xu, L. Liu, et al. SAMRS: Scaling-up remote sensing segmentation dataset with segment anything model. Advances in Neural Information Processing Systems 36:8815-8827, 2023. https://doi.org/10.48550/arXiv.2305.02034.

Y. Xue and J. Williams. Inducing shifts in attentional and preattentive visual processing through brief training on novel grammatical morphemes: An event-related potential study. Language Learning 74(S1):185-223, 2024. https://doi.org/10.1111/lang.12642. (Crossref)

A. J. J. Yepes. PubLayNet. GitHub, 2025. https://github.com/ibm-aur-nlp/PubLayNet.

P. Zhang, J. Zheng, H. Lin, C. Liu, Z. Zhao, et al. Vehicle trajectory data mining for artificial intelligence and real-time traffic information extraction. IEEE Transactions on Intelligent Transportation Systems 24(11):13088-13098, 2023. https://doi.org/10.1109/TITS.2022.3178182. (Crossref)

X. Zheng, X. Qiao, Y. Cao, and R. W. Lau. Content-aware generative modeling of graphic design layouts. ACM Transactions on Graphics 38(4):133, 2019. https://doi.org/10.1145/3306346.3322971. (Crossref)

X. Zhong, J. Tang, and A. Jimeno Yepes. PubLayNet: Largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015-1022, 2019. https://doi.org/10.1109/ICDAR.2019.00166. (Crossref)

Statistics

Downloads

Download data is not yet available.
Recommend Articles