Secret weapon how to promote your YouTube channel
Get Free YouTube Subscribers, Views and Likes

BERT explained: Training Inference BERT vs GPT/LLamA Fine tuning [CLS] token

Follow
Umar Jamil

Full explanation of the BERT model, including a comparison with other language models like LLaMA and GPT. I cover topics like: training, inference, fine tuning, Masked Language Models (MLM), Next Sentence Prediction (NSP), [CLS] token, sentence embedding, text classification, question answering, selfattention mechanism. Everything is visually explained step by step.

I also review the background knowledge in order to understand BERT, by starting from an introduction to large language models (LLM) and the attention mechanism.

Slides PDF: https://github.com/hkproj/bertfroms...
BERT paper: https://arxiv.org/abs/1810.04805

Chapters
00:00 Introduction
02:00 Language Models
03:10 Training (Language Models)
07:23 Inference (Language Models)
09:15 Transformer architecture (Encoder)
10:28 Input Embeddings
14:17 Positional Encoding
17:14 SelfAttention and causal mask
29:14 BERT (overview)
32:08 BERT vs GPT/LLaMA
34:25 Left context and right context
36:36 BERT pretraining
37:05 Masked Language Model
45:01 [CLS] token
48:26 BERT finetuning
49:00 Text classification
50:50 Question answering

posted by Waingeripdari79