The easiest way to skyrocket your YouTube subscribers
Get Free YouTube Subscribers, Views and Likes

K-means Clustering From Scratch In Python [Machine Learning Tutorial]

Follow
Dataquest

In this project, we'll build a kmeans clustering algorithm from scratch. Clustering is an unsupervised machine learning technique that can find patterns in your data. Kmeans is one of the most popular forms of clustering.

We'll create our algorithm using python and pandas. We'll then compare it to the reference implementation from scikitlearn.

You can find the full project code here https://github.com/dataquestio/projec... .

You can download the data here https://www.kaggle.com/datasets/stefa... .

Project Steps
Write out pseudocode for the algorithm
Code the kmeans algorithm
Plot the clusters from the algorithm
Compare performance to the scikitlearn algorithm

Chapters

00:00 Intro
00:37 kmeans overview
02:51 Loading in and cleaning FIFA data
06:11 Scaling the data
10:31 Initialize random centroids
14:20 Finding cluster labels for each data point
19:29 Update centroid values
23:30 Plotting kmeans iterations
28:24 Pulling the algorithm together
35:25 Comparing our implementation to scikitlearn
37:56 Conclusion and next steps


Join 1M+ Dataquest learners today!
Master data skills and change your life.
Sign up for free: https://bit.ly/3O8MDef

posted by Szank9o