Head to https://brilliant.org/BreakingTaps/ to get a 30day free trial. The first 200 people will get 20% off their annual subscription.
Watch this video ad free on Nebula: https://nebula.tv/videos/breakingtaps...
Today we're looking at HyperLogLog, an algorithm that leverages random chance to count the number of distinct items are in a dataset. It does this by tracking the longest run of zeros in a binary sequence, and uses that as an estimate of cardinality.
HLL is a probabilistic algorithm, meaning it's a guess rather than true answer. But due to some clever tricks it is usually within 2% of the correct value, and can do it both quickly and in a memoryefficient manner. A 512kb datastructure can accurately process trillions of items and terrabytes of data, which is pretty impressive!
When I made this video, I didn't realize that another #SoME3 was in progress. But a bunch of viewers suggested I enter the video, so I guess this is will be part of the event!
Patreon if that's your jam: / breakingtaps
Twitter: / breakingtaps
Instagram: / breakingtaps
Discord: / discord
Journal papers:
Flajolet, Philippe, et al. "Hyperloglog: the analysis of a nearoptimal cardinality estimation algorithm." _Discrete Mathematics and Theoretical Computer Science_. Discrete Mathematics and Theoretical Computer Science, 2007. https://algo.inria.fr/flajolet/Public...
Heule, Stefan, Marc Nunkesser, and Alexander Hall. "Hyperloglog in practice: Algorithmic engineering of a state of the art cardinality estimation algorithm." _Proceedings of the 16th International Conference on Extending Database Technology_. 2013. https://static.googleusercontent.com/...
Earlier work:
Durand, Marianne, and Philippe Flajolet. "Loglog counting of large cardinalities." _AlgorithmsESA 2003: 11th Annual European Symposium, Budapest, Hungary, September 1619, 2003. Proceedings 11_. Springer Berlin Heidelberg, 2003.
Flajolet, Philippe, and G. Nigel Martin. "Probabilistic counting algorithms for data base applications." Journal of computer and system sciences 31.2 (1985): 182209.
Articles:
https://towardsdatascience.com/hyperl...
https://engineering.fb.com/2018/12/13...