Efficient Fine-Tuning for Llama-v2-7b on a Single GPU

The first problem you’re likely to encounter when finetuning an LLM is the “host out of memory” error. It’s more difficult for finetuning the 7B parameter Llama2 model which requires more memory. In this talk, we are having Piero Molino and Travis Addair from the opensource Ludwig project to show you how to tackle this problem.

The good news is that, with an optimized LLM training framework like Ludwig.ai, you can get the host memory overhead back down to a more reasonable host memory even when training on multiple GPUs.
In this handson workshop, we‘ll discuss the unique challenges in finetuning LLMs and show you how you can tackle these challenges with opensource tools through a demo.

By the end of this session, attendees will understand:
How to finetune LLMs like Llama27b on a single GPU
Techniques like parameter efficient tuning and quantization, and how they can help
How to train a 7b param model on a single T4 GPU (QLoRA)
How to deploy tuned models like Llama2 to production
Continued training with RLHF
How to use RAG to do question answering with trained LLMs
This session will equip ML engineers to unlock the capabilities of LLMs like Llama2 on for their own projects.

This event is inspired by DeepLearning.AI’s GenAI short courses, created in collaboration with AI companies across the globe. Our courses help you learn new skills, tools, and concepts efficiently within 1 hour.

https://www.deeplearning.ai/shortcou...

Here is the link to the notebook used in the workshop:
https://pbase.ai/FineTuneLlama

Speakers:

Piero Molino, Cofounder and CEO of Predibase

/ pieromolino

Travis Addair, Cofounder and CTO of Predibase

/ travisaddair