BERT (language model)

Bidirectional Encoder Representations from Transformers (BERT)
Original author(s)	Google AI
Initial release	October 31, 2018
Repository	https://github.com/google-research/bert
Type	Large language model; Transformer (deep learning architecture); Foundation model;
License	Apache 2.0
Website	arxiv.org/abs/1810.04805

Bidirectional Encoder Representations from Transformers (BERT) is a language model introduced in October 2018 by researchers at Google.^[1]^[2] It learned by self-supervised learning to represent text as a sequence of vectors. It had the transformer encoder architecture. It was notable for its dramatic improvement over previous state of the art models, and as an early example of large language model. As of 2020^[update], BERT was a ubiquitous baseline in Natural Language Processing (NLP) experiments.^[3]

BERT is trained by masked token prediction and next sentence prediction. As a result of this training process, BERT learns contextual, latent representations of tokens in their context, similar to ELMo and GPT-2.^[4] It found applications for many many natural language processing tasks, such as coreference resolution and polysemy resolution.^[5] It is an evolutionary step over ELMo, and spawned the study of "BERTology", which attempts to interpret what is learned by BERT.^[3]

BERT was originally implemented in the English language at two model sizes, BERT_BASE (110 million parameters) and BERT_LARGE (340 million parameters). Both were trained on the Toronto BookCorpus^[6] (800M words) and English Wikipedia (2,500M words). The weights were released on GitHub.^[7] On March 11, 2020, 24 smaller models were released, the smallest being BERT_TINY with just 4 million parameters.^[7]

^ Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (October 11, 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2 [cs.CL].
^ "Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing". Google AI Blog. November 2, 2018. Retrieved November 27, 2019.
^ ^a ^b Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020). "A Primer in BERTology: What We Know About How BERT Works". Transactions of the Association for Computational Linguistics. 8: 842–866. arXiv:2002.12327. doi:10.1162/tacl_a_00349. S2CID 211532403.
^ Ethayarajh, Kawin (September 1, 2019), How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings, doi:10.48550/arXiv.1909.00512, retrieved August 5, 2024
^ Anderson, Dawn (November 5, 2019). "A deep dive into BERT: How BERT launched a rocket into natural language understanding". Search Engine Land. Retrieved August 6, 2024.
^ Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). "Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books". pp. 19–27. arXiv:1506.06724 [cs.CV].
^ ^a ^b "BERT". GitHub. Retrieved March 28, 2023.

[:0-1] Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (October 11, 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2 [cs.CL].

[2] "Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing". Google AI Blog. November 2, 2018. Retrieved November 27, 2019.

[:4-3] Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020). "A Primer in BERTology: What We Know About How BERT Works". Transactions of the Association for Computational Linguistics. 8: 842–866. arXiv:2002.12327. doi:10.1162/tacl_a_00349. S2CID 211532403.

[:5-4] Ethayarajh, Kawin (September 1, 2019), How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings, doi:10.48550/arXiv.1909.00512, retrieved August 5, 2024

[5] Anderson, Dawn (November 5, 2019). "A deep dive into BERT: How BERT launched a rocket into natural language understanding". Search Engine Land. Retrieved August 6, 2024.

[6] Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). "Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books". pp. 19–27. arXiv:1506.06724 [cs.CV].

[:3-7] "BERT". GitHub. Retrieved March 28, 2023.

[1]

[2]

[3]

[4]

[5]

[6]

[7]