In this project, we'll build a kmeans clustering algorithm from scratch. Clustering is an unsupervised machine learning technique that can find patterns in your data. Kmeans is one of the most popular forms of clustering.
We'll create our algorithm using python and pandas. We'll then compare it to the reference implementation from scikitlearn.
You can find the full project code here https://github.com/dataquestio/projec... .
You can download the data here https://www.kaggle.com/datasets/stefa... .
Project Steps
Write out pseudocode for the algorithm
Code the kmeans algorithm
Plot the clusters from the algorithm
Compare performance to the scikitlearn algorithm
Chapters
00:00 Intro
00:37 kmeans overview
02:51 Loading in and cleaning FIFA data
06:11 Scaling the data
10:31 Initialize random centroids
14:20 Finding cluster labels for each data point
19:29 Update centroid values
23:30 Plotting kmeans iterations
28:24 Pulling the algorithm together
35:25 Comparing our implementation to scikitlearn
37:56 Conclusion and next steps
Join 1M+ Dataquest learners today!
Master data skills and change your life.
Sign up for free: https://bit.ly/3O8MDef