A little secret to rock your YouTube subscribers

Get Free YouTube Subscribers, Views and Likes

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

Follow

Algorithmic Simplicity

Mamba is a new neural network architecture that came out this year, and it performs better than transformers at language modelling! This is probably the most exciting development in AI since 2017. In this video I explain how to derive Mamba from the perspective of linear RNNs. And don't worry, there's no state space model theory needed!

Mamba paper: https://openreview.net/forum?id=AL1fq...
Linear RNN paper: https://openreview.net/forum?id=M3Yd3...

#mamba
#deeplearning
#largelanguagemodels

00:00 Intro
01:33 Recurrent Neural Networks
05:24 Linear Recurrent Neural Networks
06:57 Parallelizing Linear RNNs
15:33 Vanishing and Exploding Gradients
19:08 Stable initialization
21:53 State Space Models
24:33 Mamba
25:26 The High Performance Memory Trick
27:35 The Mamba Drama

posted by linalamont512c0

Transformer Neural Networks Derived from Scratch

Transformer Neural Networks Derived from Scratch

Mamba: LinearTime Sequence Modeling with Selective State Spaces (Paper Explained)

Mamba: LinearTime Sequence Modeling with Selective State Spaces (Paper Explained)

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

I Made an AI with just Redstone!

I Made an AI with just Redstone!

Gail Weiss: Thinking Like Transformers

Gail Weiss: Thinking Like Transformers

Why Does Diffusion Work Better than AutoRegression?

Why Does Diffusion Work Better than AutoRegression?

The Most Important Algorithm in Machine Learning

The Most Important Algorithm in Machine Learning

Miles Cranmer The Next Great Scientific Theory is Hiding Inside a Neural Network (April 3, 2024)

Miles Cranmer The Next Great Scientific Theory is Hiding Inside a Neural Network (April 3, 2024)

Is the Future of Linear Algebra.. Random?

Is the Future of Linear Algebra.. Random?

Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

This is why Deep Learning is really weird.

This is why Deep Learning is really weird.

MAMBA AI (S6): Better than Transformers?

MAMBA AI (S6): Better than Transformers?

MAMBA and State Space Models explained | SSM explained

MAMBA and State Space Models explained | SSM explained

Watching Neural Networks Learn

Watching Neural Networks Learn

Biggest Breakthroughs in Math: 2023

Biggest Breakthroughs in Math: 2023

I Made a Graph of Wikipedia... This Is What I Found

I Made a Graph of Wikipedia... This Is What I Found

'I want Llama3 to perform 10x with my private knowledge' Local Agentic RAG w/ llama3

'I want Llama3 to perform 10x with my private knowledge' Local Agentic RAG w/ llama3

Mamba a replacement for Transformers?

Mamba a replacement for Transformers?

The future of AI looks like THIS (& it can learn infinitely)

The future of AI looks like THIS (& it can learn infinitely)

ChatGPT: 30 Year History | How AI Learned to Talk

ChatGPT: 30 Year History | How AI Learned to Talk

Recommended

Get Ready For Your Next Cosplay Festival!

Get Ready For Your Next Cosplay Festival!

11:09

Kids' Poor Reaction To Christmas Presents

Kids' Poor Reaction To Christmas Presents

01:41

Christmas Present Makes A Dog Go Nuts!

Christmas Present Makes A Dog Go Nuts!

01:09

Dog The Doctor Treats Little Boy Better Than Any Medicines!

Dog The Doctor Treats Little Boy Better Than Any Medicines!

00:02

Cat Caught Red-Handed When Stealing Underwear!

Cat Caught Red-Handed When Stealing Underwear!

01:35

Meet Your Brand New Awesome Accessory!

07:16

7 Ways To Say NO To Morning Coffee Obsession

7 Ways To Say NO To Morning Coffee Obsession

03:08

Breath-Taking Views Of Colorful Liquid in Space

Breath-Taking Views Of Colorful Liquid in Space

01:14