YouTube magic that brings views, likes and suibscribers
Get Free YouTube Subscribers, Views and Likes

Create Training Data for Finetuning LLMs

Follow
APC Mastery Path

Mastering LLM FineTuning: From PDFs to JSONL Files

Welcome to APC Mastery Path! In this comprehensive tutorial, we dive deep into the process of creating training data for finetuning Large Language Models (LLMs). We'll guide you through extracting text data from PDFs using the powerful `markerpdf` Python library, cleansing the resulting markdown, and converting it into a JSONL format ready for LLMs.


Agenda:
00:08 Intro
00:59 Part 1: Main concept of the solution
01:50 Part 2: Marker PDF Package Overview & Installation
05:46 Part 21: Single File Conversion
08:29 Part 22: Multiple File Conversion & Conversion to JSONL format
15:03 Part 3: Finetuning LLMs using extracted data
22:01 Outro

At APC Mastery Path, we offer bespoke mentoring and teaching packages to RICS APC candidates. Enhance your APC journey with our expert guidance and tailored support.

Don’t forget to subscribe, like, and share! Let’s embark on this LLM finetuning journey together! ✨

General Links & Resources:
⚫Our Website: www.apcmasterypath.co.uk
⚫All APC Mastery Path Blogposts: https://www.apcmasterypath.co.uk/blog...
⚫Personal Linkedin Page:   / mohamedashour0727  
⚫APC Mastery Path Linkedin Page:   / apcmasterypath  

Useful videos:
⚫Finetune your LLMs on custom datasets using Unsloth:    • Finetune Your LLM on Custom Datasets ...  
⚫Deploy Open WebUI with Zero Coding Skills :    • Unlocking Local AI: Deploy Open WebUI...  

Prerequisites & Dependencies:
⚫Nvidia Cuda Toolkit v 12.1: https://developer.nvidia.com/cuda12...
⚫Windows subsystem for Linux : https://learn.microsoft.com/enus/win...
⚫Anaconda for Linux: https://repo.anaconda.com/archive/Ana...
⚫ Pytorch: https://pytorch.org/
⚫Ollama : www.ollama.com/download
⚫Docker: https://desktop.docker.com/win/main/a...
⚫Open WebUI on Github: https://github.com/openwebui/openwebui

Github & Huggingface repositories:
⚫Unsloth available LLMs: https://huggingface.co/unsloth
⚫Marker PDF on GitHub: https://github.com/VikParuchuri/marker
⚫Unsloth GitHub Repository: https://github.com/unslothai/unsloth?...

#LLM #MachineLearning #DataScience #AI #Python #PDFConversion #JSONL #MarkerPDF #FineTuning #APCMasteryPath #RICSAPC #Mentoring #Education #TechTutorials

posted by moulinaventbw