Transformer model: Difference between revisions

From HPCWIKI
Jump to navigation Jump to search
Line 1: Line 1:
== What is transformer model ? ==
== What is transformer model ? ==
A transformer model is a type of deep learning model that was introduced in 2017 paper called "Attention is All You Need" by Ashish Vaswani, a team at Google Brain, and a group from the University of Toronto. and now now used in applications such as training LLMs. <ref>https://www.ibm.com/topics/transformer-model</ref><ref>https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)</ref>
A transformer model is a type of deep learning model that was introduced in 2017 paper called "Attention is All You Need" by Ashish Vaswani, a team at Google Brain, and a group from the University of Toronto. and now now used in applications such as training LLMs. <ref>https://www.ibm.com/topics/transformer-model</ref><ref>https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)</ref>


Transformer models can translate text and speech in near-real-time. OpenAI’s popular ChatGPT text generation tool makes use of transformer architectures for prediction, summarization, question answering and more, because they allow the model to focus on the most relevant segments of input text. The “GPT” seen in the tool’s various versions (e.g. GPT-2, GPT-3) stands for “generative pre-trained transformer.” Text-based generative AI tools such as ChatGPT benefit from transformer models because they can more readily predict the next word in a sequence of text, based on a large, complex data sets.
Transformer models can translate text and speech in near-real-time. OpenAI’s popular ChatGPT text generation tool makes use of transformer architectures for prediction, summarization, question answering and more, because they allow the model to focus on the most relevant segments of input text. The “GPT” seen in the tool’s various versions (e.g. GPT-2, GPT-3) stands for “generative pre-trained transformer.” Text-based generative AI tools such as ChatGPT benefit from transformer models because they can more readily predict the next word in a sequence of text, based on a large, complex data sets.

Revision as of 13:54, 18 June 2024

What is transformer model ?

A transformer model is a type of deep learning model that was introduced in 2017 paper called "Attention is All You Need" by Ashish Vaswani, a team at Google Brain, and a group from the University of Toronto. and now now used in applications such as training LLMs. [1][2]

Transformer models can translate text and speech in near-real-time. OpenAI’s popular ChatGPT text generation tool makes use of transformer architectures for prediction, summarization, question answering and more, because they allow the model to focus on the most relevant segments of input text. The “GPT” seen in the tool’s various versions (e.g. GPT-2, GPT-3) stands for “generative pre-trained transformer.” Text-based generative AI tools such as ChatGPT benefit from transformer models because they can more readily predict the next word in a sequence of text, based on a large, complex data sets.

Transformers have the advantage of having no recurrent units, and thus requires less training time than previous recurrent neural architectures, such as long short-term memory (LSTM), and its later variation has been prevalently adopted for training large language models (LLM) on large (language) datasets. because Transformer model is not having to rely on recurrent neural networks (RNNs) or convolutional neural networks (CNNs), Transformers process input sequences in parallel, making it highly efficient for training and inference — because you can’t just speed things up by adding more GPUs. So Transformer models need less training time than previous recurrent neural network architectures.

References