Transformer model
What is transformer model ?
A transformer model is a type of deep learning model that was introduced in 2017 paper called "Attention is All You Need" by Ashish Vaswani, a team at Google Brain, and a group from the University of Toronto. and now now used in applications such as training LLMs. [1][2]
Transformers have the advantage of having no recurrent units such as recurrent neural networks (RNNs) or convolutional neural networks (CNNs), and thus requires less training time than recurrent neural architectures, such as long short-term memory (LSTM), and its later variation has been prevalently adopted for training large language models (LLM) on large (language) datasets.
Transformers process input sequences in parallel, making it highly efficient for training and inference. So Transformer models need less training time than previous recurrent neural network architectures. Transformer models can translate text and speech in near-real-time. OpenAI’s popular ChatGPT text generation tool makes use of transformer architectures for prediction, summarization, question answering and more, because they allow the model to focus on the most relevant segments of input text. The “GPT” seen in the tool’s various versions (e.g. GPT-2, GPT-3) stands for “generative pre-trained transformer.” Text-based generative AI tools such as ChatGPT benefit from transformer models because they can more readily predict the next word in a sequence of text, based on a large, complex data sets.
This page describes Transformer architectues and its work process.