Attention All you Need: Transformer (Simple explanation of transformer model)

Paper 1: Attention All You Need
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." Advances in neural information processing systems 30 (2017).

In recent years, the field of artificial intelligence has experienced a paradigm shift with the introduction of transformer based models. Transformers have proven to be pivotal in the development of sharper Natural Language Processing (NLP) models, while also extending its employability in additional areas within machine learning.

The transformer is an architecture that relies on the concept of attention, a technique used to provide weights to different parts of an input sequence so that a better understanding of its underlying context is achieved. This allows transformers to perform machine translation, text generation and many other NLP tasks.

In addition, transformers process inputs in parallel making them more efficient and scalable in comparison to traditional sequential models such as RNN and LSTM.

I will be dissecting the paper, ‘Attention Is All You Need’ and its significance in artificial intelligence. All you need will be here, so Attention!

Model Architecture
Encoder
Decoder
Attention
FeedForward Networks (FFN)
Layer Normalization
Positional Encoding