Buy real YouTube subscribers. Best price and warranty.
Get Free YouTube Subscribers, Views and Likes

A problem so hard even Google relies on Random Chance

Follow
Breaking Taps

Head to https://brilliant.org/BreakingTaps/ to get a 30day free trial. The first 200 people will get 20% off their annual subscription.

Watch this video ad free on Nebula: https://nebula.tv/videos/breakingtaps...



Today we're looking at HyperLogLog, an algorithm that leverages random chance to count the number of distinct items are in a dataset. It does this by tracking the longest run of zeros in a binary sequence, and uses that as an estimate of cardinality.

HLL is a probabilistic algorithm, meaning it's a guess rather than true answer. But due to some clever tricks it is usually within 2% of the correct value, and can do it both quickly and in a memoryefficient manner. A 512kb datastructure can accurately process trillions of items and terrabytes of data, which is pretty impressive!

When I made this video, I didn't realize that another #SoME3 was in progress. But a bunch of viewers suggested I enter the video, so I guess this is will be part of the event!



Patreon if that's your jam:   / breakingtaps  

Twitter:   / breakingtaps  
Instagram:   / breakingtaps  
Discord:   / discord  



Journal papers:

Flajolet, Philippe, et al. "Hyperloglog: the analysis of a nearoptimal cardinality estimation algorithm." _Discrete Mathematics and Theoretical Computer Science_. Discrete Mathematics and Theoretical Computer Science, 2007. https://algo.inria.fr/flajolet/Public...

Heule, Stefan, Marc Nunkesser, and Alexander Hall. "Hyperloglog in practice: Algorithmic engineering of a state of the art cardinality estimation algorithm." _Proceedings of the 16th International Conference on Extending Database Technology_. 2013. https://static.googleusercontent.com/...

Earlier work:

Durand, Marianne, and Philippe Flajolet. "Loglog counting of large cardinalities." _AlgorithmsESA 2003: 11th Annual European Symposium, Budapest, Hungary, September 1619, 2003. Proceedings 11_. Springer Berlin Heidelberg, 2003.

Flajolet, Philippe, and G. Nigel Martin. "Probabilistic counting algorithms for data base applications." Journal of computer and system sciences 31.2 (1985): 182209.

Articles:

https://towardsdatascience.com/hyperl...
https://engineering.fb.com/2018/12/13...

posted by lemurjev87