Contrastive Language-Image Pre-training

CLIP
CLIP
Developer(s)	OpenAI
Initial release	January 5, 2021
Repository	github.com/OpenAI/CLIP
Written in	Python
License	MIT License
Website	openai.com/research/clip

Contrastive Language-Image Pre-training (CLIP) is a technique for training a pair of neural network models, one for image understanding and one for text understanding, using a contrastive objective.^[1] This method has enabled broad applications across multiple domains, including cross-modal retrieval,^[2] text-to-image generation,^[3] and aesthetic ranking.^[4]

^ Radford, Alec; Kim, Jong Wook; Hallacy, Chris; Ramesh, Aditya; Goh, Gabriel; Agarwal, Sandhini; Sastry, Girish; Askell, Amanda; Mishkin, Pamela; Clark, Jack; Krueger, Gretchen; Sutskever, Ilya (2021-07-01). Learning Transferable Visual Models From Natural Language Supervision. Proceedings of the 38th International Conference on Machine Learning. PMLR. pp. 8748–8763.
^ Hendriksen, Mariya; Bleeker, Maurits; Vakulenko, Svitlana; van Noord, Nanne; Kuiper, Ernst; de Rijke, Maarten (2022). Hagen, Matthias; Verberne, Suzan; Macdonald, Craig; Seifert, Christin; Balog, Krisztian; Nørvåg, Kjetil; Setty, Vinay (eds.). "Extending CLIP for Category-to-Image Retrieval in E-Commerce". Advances in Information Retrieval. Cham: Springer International Publishing: 289–303. doi:10.1007/978-3-030-99736-6_20. ISBN 978-3-030-99736-6.
^ "Stable Diffusion Repository on GitHub". CompVis - Machine Vision and Learning Research Group, LMU Munich. 17 September 2022. Archived from the original on January 18, 2023. Retrieved 17 September 2022.
^ LAION-AI/aesthetic-predictor, LAION AI, 2024-09-06, retrieved 2024-09-08

[:0-1] Radford, Alec; Kim, Jong Wook; Hallacy, Chris; Ramesh, Aditya; Goh, Gabriel; Agarwal, Sandhini; Sastry, Girish; Askell, Amanda; Mishkin, Pamela; Clark, Jack; Krueger, Gretchen; Sutskever, Ilya (2021-07-01). Learning Transferable Visual Models From Natural Language Supervision. Proceedings of the 38th International Conference on Machine Learning. PMLR. pp. 8748–8763.

[2] Hendriksen, Mariya; Bleeker, Maurits; Vakulenko, Svitlana; van Noord, Nanne; Kuiper, Ernst; de Rijke, Maarten (2022). Hagen, Matthias; Verberne, Suzan; Macdonald, Craig; Seifert, Christin; Balog, Krisztian; Nørvåg, Kjetil; Setty, Vinay (eds.). "Extending CLIP for Category-to-Image Retrieval in E-Commerce". Advances in Information Retrieval. Cham: Springer International Publishing: 289–303. doi:10.1007/978-3-030-99736-6_20. ISBN 978-3-030-99736-6.

[stable-diffusion-github-3] "Stable Diffusion Repository on GitHub". CompVis - Machine Vision and Learning Research Group, LMU Munich. 17 September 2022. Archived from the original on January 18, 2023. Retrieved 17 September 2022.

[4] LAION-AI/aesthetic-predictor, LAION AI, 2024-09-06, retrieved 2024-09-08

[1]

[2]

[3]

[4]