Contrastive Language-Image Pre-training

CLIP
Developer(s)OpenAI
Initial releaseJanuary 5, 2021
Repositorygithub.com/OpenAI/CLIP
Written inPython
LicenseMIT License
Websiteopenai.com/research/clip

Contrastive Language-Image Pre-training (CLIP) is a technique for training a pair of neural network models, one for image understanding and one for text understanding, using a contrastive objective.[1] This method has enabled broad applications across multiple domains, including cross-modal retrieval,[2] text-to-image generation,[3] and aesthetic ranking.[4]

  1. ^ Radford, Alec; Kim, Jong Wook; Hallacy, Chris; Ramesh, Aditya; Goh, Gabriel; Agarwal, Sandhini; Sastry, Girish; Askell, Amanda; Mishkin, Pamela; Clark, Jack; Krueger, Gretchen; Sutskever, Ilya (2021-07-01). Learning Transferable Visual Models From Natural Language Supervision. Proceedings of the 38th International Conference on Machine Learning. PMLR. pp. 8748–8763.
  2. ^ Hendriksen, Mariya; Bleeker, Maurits; Vakulenko, Svitlana; van Noord, Nanne; Kuiper, Ernst; de Rijke, Maarten (2022). Hagen, Matthias; Verberne, Suzan; Macdonald, Craig; Seifert, Christin; Balog, Krisztian; Nørvåg, Kjetil; Setty, Vinay (eds.). "Extending CLIP for Category-to-Image Retrieval in E-Commerce". Advances in Information Retrieval. Cham: Springer International Publishing: 289–303. doi:10.1007/978-3-030-99736-6_20. ISBN 978-3-030-99736-6.
  3. ^ "Stable Diffusion Repository on GitHub". CompVis - Machine Vision and Learning Research Group, LMU Munich. 17 September 2022. Archived from the original on January 18, 2023. Retrieved 17 September 2022.
  4. ^ LAION-AI/aesthetic-predictor, LAION AI, 2024-09-06, retrieved 2024-09-08

© MMXXIII Rich X Search. We shall prevail. All rights reserved. Rich X Search