Part of a series on |
Machine learning and data mining |
---|
A transformer is a deep learning architecture developed by Google and based on the multi-head attention mechanism, proposed in a 2017 paper "Attention Is All You Need".[1] Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from a word embedding table.[1] At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism allowing the signal for key tokens to be amplified and less important tokens to be diminished. The transformer paper, published in 2017, is based on the softmax-based attention mechanism proposed by Bahdanau et. al. in 2014 for machine translation,[2][3] and the Fast Weight Controller, similar to a transformer, proposed in 1992.[4][5][6]
Transformers have the advantage of having no recurrent units, and therefore require less training time than earlier recurrent neural architectures such as long short-term memory (LSTM).[7] Later variations have been widely adopted for training large language models (LLM) on large (language) datasets, such as the Wikipedia corpus and Common Crawl.[8]
This architecture is now used not only in natural language processing and computer vision,[citation needed] but also in audio,[9] multi-modal processing and robotics.[10] It has also led to the development of pre-trained systems, such as generative pre-trained transformers (GPTs)[11] and BERT[12] (Bidirectional Encoder Representations from Transformers).
2017_Attention_Is_All_You_Need
was invoked but never defined (see the help page).inventors
was invoked but never defined (see the help page).inventconfirm
was invoked but never defined (see the help page).transform1992
was invoked but never defined (see the help page).schlag2021
was invoked but never defined (see the help page).fastlinear2020
was invoked but never defined (see the help page).© MMXXIII Rich X Search. We shall prevail. All rights reserved. Rich X Search