Rock YouTube channel with real views, likes and subscribers
Get Free YouTube Subscribers, Views and Likes

System Design for Recommendations and Search // Eugene Yan // MLOps Meetup #78

Follow
MLOps.community

Join us at our first inperson conference on June 25 all about AI Quality: https://www.aiqualityconference.com/

MLOps Community Meetup #78! Last Wednesday we talked to Eugene Yan, an Applied Scientist at Amazon.

//Abstract
How does system design for industrial recommendations and search look like? In this talk, Eugene Yan shares how its often split into:
Latencyconstrained online vs. lessdemanding offline environments, and
Fast but coarse candidate retrieval vs. slower but more precise ranking

We'll also see examples of system design from companies such as Alibaba, Facebook, JD, DoorDash, LinkedIn, and maybe do a quick walkthrough on how to implement a candidate retrieval MVP.

//Bio
Eugene Yan designs, builds, and operates machine learning systems that serve customers at scale. He's currently an Applied Scientist at Amazon. Previously, he led the data science teams at Lazada (acquired by Alibaba) and uCare.ai. He writes & speaks about data science, data/ML systems, and career growth at eugeneyan.com and tweets at @eugeneyan.

// Relevant links
eugeneyan.com
applyingml.com
https://www.oreilly.com/library/view/...

✌Connect With Us ✌
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, Feature Store, Machine Learning Monitoring and Blogs: https://mlops.community/

Connect with Demetrios on LinkedIn:   / dpbrinkm  
Connect with Eugene on   / eugeneyan  

Timestamps:
[00:10] System Design for Recommendations and Search
[01:37] Why: Batch vs. Realtime
[02:05] Batch
Recommender (keyvalue DB)
Recommendations refreshed periodically
[02:21] Realtime
Recommender (REST/gRPC)
Recommendations generated in realtime
[02:37] Batch benefits
Precomputed
Decouple compute from serving
Lower operational load
[03:25] Realtime benefits
Responsive to timesensitive context
Reduce cost on nonvisiting users
[06:50] Focus on realtime aka ondemand
[07:00] Offline vs Online aspect
[07:11] Offline aspect
Host batch processes such as training, index/graph building
Load data into feature stores
[07:23] Online aspect
Uses artifacts from the offline environment to serve requests
Candidate retrieval and ranking
[07:40] Retrieval
Fast but coarse
Searches millions of items to get hundreds of candidates
Approx NN. Graphs, etc.
[08:05] Ranking
Slower but more precise
Ranks hundreds of candidates
Adds more features
Classification or learning to rank
[08:49] Online Retrieval
[09:37] Offline Ranking
[10:50] Online Retrieval
[11:15] Offline Retrieval
[12:25] How: Industry Examples
[12:45] Building item embeddings for candidate retrieval (Alibaba)
[15:31] Building a graph network for ranking (Alibaba)
[17:06] Building embeddings for retrieval in search (Facebook)
[19:10] Building graphs for query expansion and retrieval (DoorDash)
[22:32] Unnecessary realtime overengineering
[25:05] Realtime timely decision
[26:27] How: Industry Examples (Retrieval)
[26:43] Collaborative Filtering
[30:32] Candidate Retrieval at YouTube (via penultimate embedding)
[32:06] Candidate Retrieval at Instagram (via word2vec)
[33:53] How: Industry Examples (Ranking)
[33:56] Ranking at Google (via sigmoid)
[35:00] Ranking at YouTube (via weighted logistic regression)
[35:31] Ranking at Alibab (via Transformer)
[36:16] How: Building an MVP
[36:22] Training: Selfsupervised Representation Learning
[37:20] Ranking: Logistic Regression
[37:21] Retrieval: Approximate nearest neighbors
[38:40] Ranking: Logistic Regression
[39:00] Serving: Multiple instances + Load Balancer (or SageMaker)
[39:38] From twostage to fourstage
[41:54] Further reading
[43:44] Applied ML page
[52:52] Keeping the habit
[55:26] Recommended books for machine learning

posted by depresantid