Transformer model: Difference between revisions

Revision as of 13:54, 18 June 2024

What is transformer model ?

A transformer model is a type of deep learning model that was introduced in 2017 paper called "Attention is All You Need" by Ashish Vaswani, a team at Google Brain, and a group from the University of Toronto. and now now used in applications such as training LLMs. ^[1]^[2]

Transformer models can translate text and speech in near-real-time. OpenAI’s popular ChatGPT text generation tool makes use of transformer architectures for prediction, summarization, question answering and more, because they allow the model to focus on the most relevant segments of input text. The “GPT” seen in the tool’s various versions (e.g. GPT-2, GPT-3) stands for “generative pre-trained transformer.” Text-based generative AI tools such as ChatGPT benefit from transformer models because they can more readily predict the next word in a sequence of text, based on a large, complex data sets.

Transformers have the advantage of having no recurrent units, and thus requires less training time than previous recurrent neural architectures, such as long short-term memory (LSTM), and its later variation has been prevalently adopted for training large language models (LLM) on large (language) datasets. because Transformer model is not having to rely on recurrent neural networks (RNNs) or convolutional neural networks (CNNs), Transformers process input sequences in parallel, making it highly efficient for training and inference — because you can’t just speed things up by adding more GPUs. So Transformer models need less training time than previous recurrent neural network architectures.

References

[1] ttps://www.ibm.com/topics/transformer-model

[2] ttps://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)

[1]

[2]

@@ Line 1: / Line 1: @@
 == What is transformer model ? ==
 A transformer model is a type of deep learning model that was introduced in 2017 paper called "Attention is All You Need" by Ashish Vaswani, a team at Google Brain, and a group from the University of Toronto. and now now used in applications such as training LLMs. <ref>https://www.ibm.com/topics/transformer-model</ref><ref>https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)</ref>
 Transformer models can translate text and speech in near-real-time. OpenAI’s popular ChatGPT text generation tool makes use of transformer architectures for prediction, summarization, question answering and more, because they allow the model to focus on the most relevant segments of input text. The “GPT” seen in the tool’s various versions (e.g. GPT-2, GPT-3) stands for “generative pre-trained transformer.” Text-based generative AI tools such as ChatGPT benefit from transformer models because they can more readily predict the next word in a sequence of text, based on a large, complex data sets.

Transformer model: Difference between revisions

Revision as of 13:54, 18 June 2024

What is transformer model ?

References

Navigation menu

Search