Fine-tuning stable diffusion for generating 2D floor plans using prompt templates

Main Article Content

Ahmed Mostafa
Omar Amir
Ali M. Mohamed
Marwa O. Al Enany


Keywords : graph neural networks (GNNs), diffusion model, latent diffusion, floorplan representation
Abstract

Automated generation of 2D floor plans is crucial for architectural design, requiring models to balance precision and adaptability to user-defined specifications. Diffusion models, like Stable Diffusion, excel at generating high-quality images but lack an intrinsic understanding of structured layouts such as floor plans. Conversely, Graph Neural Networks (GNNs) are adept at encoding relational data, representing floor plan objects as nodes and their connections as edges, but they are not generative or capable of processing textual inputs. In this work, we fine-tune Stable Diffusion 1.5 on a custom dataset of floor plans, leveraging structured prompt templates to constrain the model's creativity and guide it toward generating concise, error-tolerant outputs. This research suggests integrating the generative capabilities of diffusion models with the representational strengths of GNNs to overcome inherent challenges in diffusion models, like their inability to explicitly encode spatial relationships. This integration could expand the capabilities of these models, empowering them to comprehend and produce structured layouts more effectively. While computational constraints limited our exploration of this hybrid architecture, our results demonstrate that prompt engineering and dataset preprocessing significantly improve the output quality. This study highlights the potential for generative models in architectural tasks and lays the groundwork for integrating logical reasoning into diffusion-based architectures.

Article Details

How to Cite
Mostafa, A., Amir, O., Mohamed, A. M., & Al Enany, M. O. (2025). Fine-tuning stable diffusion for generating 2D floor plans using prompt templates. Machine Graphics & Vision, 34(3), 77–95. https://doi.org/10.22630/MGV.2025.34.3.4
References

V. Azizi, M. Usman, H. Zhou, P. Faloutsos, and M. Kapadia. Graph-based generative representation learning of semantically and behaviorally augmented floorplans. The Visual Computer 38(8):2785-2800, 2022. https://doi.org/10.1007/s00371-021-02155-w. (Crossref)

S. K. Baduge, S. Thilakarathna, J. S. Perera, M. Arashpour, P. Sharafi, et al. Artificial intelligence and smart vision for building and construction 4.0: Machine and deep learning methods and applications. Automation in Construction 141:104440, 2022. https://doi.org/10.1016/j.autcon.2022.104440. (Crossref)

T. Berrada, P. Astolfi, M. Hall, R. Askari-Hemmat, Y. Benchetrit, et al. On improved conditioning mechanisms and pre-training strategies for diffusion models. In: Advances in Neural Information Processing Systems, vol. 37, pp. 13321-13348. Curran Associates, Inc., 2024. https://proceedings.neurips.cc/paper_files/paper/2024/hash/18023809c155d6bbed27e443043cdebf-Abstract-Conference.html.

D. Bolya and J. Hoffman. Token merging for fast stable diffusion. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 4599-4603, 2023. https://doi.org/10.1109/CVPRW59228.2023.00484. (Crossref)

L.-P. de las Heras, S. Ahmed, M. Liwicki, E. Valveny, and G. Sánchez. Statistical segmentation and structural recognition for floor plan interpretation: Notation invariant structural element recognition. International Journal on Document Analysis and Recognition (IJDAR) 17(3):221-237, 2014. https://doi.org/10.1007/s10032-013-0215-2. (Crossref)

S. Dodge, J. Xu, and B. Stenger. Parsing floor plan images. In: 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA), pp. 358-361, 2017. https://doi.org/10.23919/MVA.2017.7986875. (Crossref)

J. Encarnação, R. Lindner, and E. G. Schlechtendahl. Computer Aided Design: Fundamentals and System Architectures. 2nd edn. Springer-Verlag, Berlin, Heidelberg, 2012. https://doi.org/10.1007/978-3-642-84054-8. (Crossref)

W. Feng, W. Zhu, T.-J. Fu, V. Jampani, A. Akula, et al. LayoutGPT: Compositional visual planning and generation with large language models. In: Advances in Neural Information Processing Systems, vol. 36, pp. 18225-18250. Curran Associates, Inc., 2023. https://proceedings.neurips.cc/paper_files/paper/2023/hash/3a7f9e485845dac27423375c934cb4db-Abstract.html.

G. Goodman. A machine learning approach to artificial floorplan generation. Master's thesis, University of Kentucky, 2019. https://uknowledge.uky.edu/cs_etds/89.

Y. Gu, Y. Huang, W. Liao, and X. Lu. Intelligent design of shear wall layout based on diffusion models. Computer-Aided Civil and Infrastructure Engineering 39(23):3610-3625, 2024. https://doi.org/10.1111/mice.13236. (Crossref)

L. Hahkio. Generation of realistic floorplans using diffusion-based models. Master's thesis, University of Helsinki, 2023. https://urn.fi/URN:NBN:fi:aalto-202310156363.

E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, et al. LoRA. Hugging Face. https://huggingface.co/docs/diffusers/main/en/training/lora.

E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, et al. LoRA: Low-rank adaptation of large language models. arXiv, arXiv.2106.09685, 2021. https://doi.org/10.48550/arXiv.2106.09685.

M. Jain, A. Sanyal, S. Goyal, C. Chattopadhyay, and G. Bhatnagar. Automatic rendering of building floor plan images from textual descriptions in English. arXiv, arXiv.1811.11938, 2018. https://doi.org/10.48550/arXiv.1811.11938.

M. Keshavarzi and M. Rahmani-Asl. GenFloor: Interactive generative space layout system via encoded tree graphs. Frontiers of Architectural Research 10(4):771-786, 2021. https://doi.org/10.1016/j.foar.2021.07.003. (Crossref)

C. Kupferschmidt, A. D. Binns, K. L. Kupferschmidt, and G. W. Taylor. Stable rivers: A case study in the application of text‐to‐image generative models for Earth sciences. Earth Surface Processes and Landforms 49(13):4213-4232, 2024. https://doi.org/10.1002/esp.5961. (Crossref)

S. Leng, Y. Zhou, M. H. Dupty, W. S. Lee, S. C. Joyce, et al. Tell2Design: A dataset for language-guided floor plan generation. arXiv, arXiv.2311.15941, 2023. https://doi.org/10.48550/arXiv.2311.15941. (Crossref)

C. Liu, J. Wu, P. Kohli, and Y. Furukawa. Raster-to-vector: Revisiting floorplan transformation. In: Proc. 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2214-2222, 2017. https://doi.org/10.1109/ICCV.2017.241. (Crossref)

Z. H. Luo, L. Lara, G. Y. Luo, F. Golemo, C. Beckham, et al. DStruct2Design: Data and benchmarks for data structure driven generative floor plan design. arXiv, arXiv.2407.15723, 2024. https://doi.org/10.48550/arXiv.2407.15723.

Z. Ma, Y. Zhang, G. Jia, L. Zhao, Y. Ma, et al. Efficient diffusion models: A comprehensive survey from principles to practices. arXiv, arXiv.2410.11795, 2024. https://doi.org/10.48550/arXiv.2410.11795.

N. Nauata, S. Hosseini, K.-H. Chang, H. Chu, C.-Y. Cheng, et al. House-GAN++: Generative adversarial layout refinement networks. arXiv, arXiv.2103.02574, 2021. https://doi.org/10.48550/arXiv.2103.02574.

H. T. Nguyen, Y. Chen, V. Voleti, V. Jampani, and H. Jiang. HouseCrafter: Lifting floorplans to 3D scenes with 2D diffusion model. arXiv, arXiv.2406.20077, 2024. https://doi.org/10.48550/arXiv.2406.20077.

M. Opiela, M. Hrehová, and F. Galčík. Map model extraction from image floor plans. In: Proceedings of the Work-in-Progress Papers at the 13th International Conference on Indoor Positioning and Indoor Navigation (IPIN-WiP 2023), vol. 3581 of CEUR-WS.org: IAOA Series. Nuremberg, Germany, 2023. https://ceur-ws.org/Vol-3581/194_WiP.pdf.

L. Papa, L. Faiella, L. Corvitto, L. Maiano, and I. Amerini. On the use of stable diffusion for creating realistic faces: from generation to detection. In: 2023 11th International Workshop on Biometrics and Forensics (IWBF), pp. 1-6, 2023. https://doi.org/10.1109/IWBF57495.2023.10156981. (Crossref)

P. N. Pizarro, N. Hitschfeld, I. Sipiran, and J. M. Saavedra. Automatic floor plan analysis and recognition. Automation in Construction 140:104348, 2022. https://doi.org/10.1016/j.autcon.2022.104348. (Crossref)

J. Ploennigs and M. Berger. Automating computational design with generative AI. arXiv, arXiv.2307.02511, 2024. https://doi.org/10.48550/arXiv.2307.02511.

A. Razzhigaev, A. Shakhmatov, A. Maltseva, V. Arkhipkin, I. Pavlov, et al. Kandinsky: an improved text-to-image synthesis with image prior and latent diffusion. arXiv, arXiv.2310.03502, 2023. https://doi.org/10.48550/arXiv.2310.03502. (Crossref)

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), pp. 10684-10695, 2022. https://doi.org/10.1109/CVPR52688.2022.01042. (Crossref)

R. Rombach and P. Esser. SD v1.5. Hugging Face, 2024. https://huggingface.co/stable-diffusion-v1-5.

M. A. Shabani, S. Hosseini, and Y. Furukawa. HouseDiffusion: Vector floorplan generation via a diffusion model with discrete and continuous denoising. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), pp. 5466-5475, 2023. https://doi.org/10.1109/CVPR52729.2023.00529. (Crossref)

S. Srivastava, N. Maheshwari, and K. S. Rajan. Towards generating semantically-rich IndoorGML data from architectural plans. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 42:591-595, 2018. https://doi.org/10.5194/isprs-archives-XLII-4-591-2018. (Crossref)

L. Wang, J. Liu, Y. Zeng, G. Cheng, H. Hu, et al. Automated building layout generation using deep learning and graph algorithms. Automation in Construction 154:105036, 2023. https://doi.org/10.1016/j.autcon.2023.105036. (Crossref)

X.-Y. Wang, Y. Yang, and K. Zhang. Customization and generation of floor plans based on graph transformations. Automation in Construction 94:405-416, 2018. https://doi.org/110.1016/j.autcon.2018.07.017. (Crossref)

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13(4):600-612, 2004. https://doi.org/10.1109/TIP.2003.819861. (Crossref)

R. E. Weber, C. Mueller, and C. Reinhart. Automated floorplan generation in architectural design: A review of methods and applications. Automation in Construction 140:104385, 2022. https://doi.org/10.1016/j.autcon.2022.104385. (Crossref)

W. Wu, L. Fan, L. Liu, and P. Wonka. MIQP-based layout design for building interiors. ACM Transactions on Graphics (SIGGRAPH Asia) 38(6):1-12, 2019.

W. Wu, X.-M. Fu, R. Tang, Y. Wang, Y.-H. Qi, et al. Data-driven interior plan generation for residential buildings. Wenming Wu's Homepage, 2019. https://wutomwu.github.io/particulars.html?id=1. RPLAN project page. (Crossref)

J. Yang, Z. Cheng, Y. Duan, P. Ji, and H. Li. ConsistNet: Enforcing 3D consistency for multi-view images diffusion. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7079-7088, 2024. https://doi.org/10.1109/CVPR52733.2024.00676. (Crossref)

Z. Zong, Z. Zhan, and G. Tan. HouseLLM: LLM-assisted two-phase text-to-floorplan generation. arXiv, arXiv.2411.12279v3, 2024. https://doi.org/10.48550/arXiv.2411.12279.

Statistics

Downloads

Download data is not yet available.
Recommend Articles