Diffusion model

In machine learning, diffusion models, also known as diffusion probabilistic models or score-based generative models, are a class of latent variable generative models. A diffusion model consists of three major components: the forward process, the reverse process, and the sampling procedure.^[1] The goal of diffusion models is to learn a diffusion process for a given dataset, such that the process can generate new elements that are distributed similarly as the original dataset. A diffusion model models data as generated by a diffusion process, whereby a new datum performs a random walk with drift through the space of all possible data.^[2] A trained diffusion model can be sampled in many ways, some of which are more efficient but also of lower quality than others.

There are various equivalent formalisms, include Markov chains, denoising diffusion probabilistic models, noise conditioned score networks, and stochastic differential equations.^[3] They are typically trained using variational inference.^[4] The model responsible for denoising is typically called its "backbone". The backbone may be of any kind, but they are typically U-nets or transformers.

As of 2024^[update], diffusion models are mainly used for computer vision tasks, including image denoising, inpainting, super-resolution, and image generation. These typically involves training a neural network to sequentially denoise images blurred with Gaussian noise.^[2]^[5] The model is trained to reverse the process of adding noise to an image. After training to convergence, it can be used for image generation by starting with an image composed of random noise for the network to iteratively denoise. Diffusion-based image generators have seen widespread commercial interest, such as Stable Diffusion and DALL-E. These models typically combine diffusion models with other models, such as text-encoders and cross-attention modules to allow text-conditioned generation.^[6]

Diffusion models have also found applications in natural language processing (NLP),^[7] particularly in areas like text generation^[8]^[9] and summarization.^[10]

^ Chang, Ziyi; Koulieris, George Alex; Shum, Hubert P. H. (2023). "On the Design Fundamentals of Diffusion Models: A Survey". arXiv:2306.04542 [cs.LG].
^ ^a ^b Song, Yang; Sohl-Dickstein, Jascha; Kingma, Diederik P.; Kumar, Abhishek; Ermon, Stefano; Poole, Ben (2021-02-10). "Score-Based Generative Modeling through Stochastic Differential Equations". arXiv:2011.13456 [cs.LG].
^ Croitoru, Florinel-Alin; Hondru, Vlad; Ionescu, Radu Tudor; Shah, Mubarak (2023). "Diffusion Models in Vision: A Survey". IEEE Transactions on Pattern Analysis and Machine Intelligence. 45 (9): 10850–10869. arXiv:2209.04747. doi:10.1109/TPAMI.2023.3261988. PMID 37030794. S2CID 252199918.
^ Cite error: The named reference ho was invoked but never defined (see the help page).
^ Gu, Shuyang; Chen, Dong; Bao, Jianmin; Wen, Fang; Zhang, Bo; Chen, Dongdong; Yuan, Lu; Guo, Baining (2021). "Vector Quantized Diffusion Model for Text-to-Image Synthesis". arXiv:2111.14822 [cs.CV].
^ Cite error: The named reference dalle2 was invoked but never defined (see the help page).
^ Li, Yifan; Zhou, Kun; Zhao, Wayne Xin; Wen, Ji-Rong (August 2023). "Diffusion Models for Non-autoregressive Text Generation: A Survey". Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization. pp. 6692–6701. arXiv:2303.06574. doi:10.24963/ijcai.2023/750. ISBN 978-1-956792-03-4.
^ Han, Xiaochuang; Kumar, Sachin; Tsvetkov, Yulia (2023). "SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control". Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics: 11575–11596. arXiv:2210.17432. doi:10.18653/v1/2023.acl-long.647.
^ Xu, Weijie; Hu, Wenxiang; Wu, Fanyou; Sengamedu, Srinivasan (2023). "DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM". Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg, PA, USA: Association for Computational Linguistics: 9040–9057. arXiv:2310.15296. doi:10.18653/v1/2023.findings-emnlp.606.
^ Zhang, Haopeng; Liu, Xiao; Zhang, Jiawei (2023). "DiffuSum: Generation Enhanced Extractive Summarization with Diffusion". Findings of the Association for Computational Linguistics: ACL 2023. Stroudsburg, PA, USA: Association for Computational Linguistics: 13089–13100. arXiv:2305.01735. doi:10.18653/v1/2023.findings-acl.828.

[chang23design-1] Chang, Ziyi; Koulieris, George Alex; Shum, Hubert P. H. (2023). "On the Design Fundamentals of Diffusion Models: A Survey". arXiv:2306.04542 [cs.LG].

[song-2] Song, Yang; Sohl-Dickstein, Jascha; Kingma, Diederik P.; Kumar, Abhishek; Ermon, Stefano; Poole, Ben (2021-02-10). "Score-Based Generative Modeling through Stochastic Differential Equations". arXiv:2011.13456 [cs.LG].

[3] Croitoru, Florinel-Alin; Hondru, Vlad; Ionescu, Radu Tudor; Shah, Mubarak (2023). "Diffusion Models in Vision: A Survey". IEEE Transactions on Pattern Analysis and Machine Intelligence. 45 (9): 10850–10869. arXiv:2209.04747. doi:10.1109/TPAMI.2023.3261988. PMID 37030794. S2CID 252199918.

[ho-4] Cite error: The named reference ho was invoked but never defined (see the help page).

[gu-5] Gu, Shuyang; Chen, Dong; Bao, Jianmin; Wen, Fang; Zhang, Bo; Chen, Dongdong; Yuan, Lu; Guo, Baining (2021). "Vector Quantized Diffusion Model for Text-to-Image Synthesis". arXiv:2111.14822 [cs.CV].

[dalle2-6] Cite error: The named reference dalle2 was invoked but never defined (see the help page).

[7] Li, Yifan; Zhou, Kun; Zhao, Wayne Xin; Wen, Ji-Rong (August 2023). "Diffusion Models for Non-autoregressive Text Generation: A Survey". Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization. pp. 6692–6701. arXiv:2303.06574. doi:10.24963/ijcai.2023/750. ISBN 978-1-956792-03-4.

[8] Han, Xiaochuang; Kumar, Sachin; Tsvetkov, Yulia (2023). "SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control". Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics: 11575–11596. arXiv:2210.17432. doi:10.18653/v1/2023.acl-long.647.

[9] Xu, Weijie; Hu, Wenxiang; Wu, Fanyou; Sengamedu, Srinivasan (2023). "DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM". Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg, PA, USA: Association for Computational Linguistics: 9040–9057. arXiv:2310.15296. doi:10.18653/v1/2023.findings-emnlp.606.

[10] Zhang, Haopeng; Liu, Xiao; Zhang, Jiawei (2023). "DiffuSum: Generation Enhanced Extractive Summarization with Diffusion". Findings of the Association for Computational Linguistics: ACL 2023. Stroudsburg, PA, USA: Association for Computational Linguistics: 13089–13100. arXiv:2305.01735. doi:10.18653/v1/2023.findings-acl.828.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]