Get YouTube subscribers that watch and like your videos
Get Free YouTube Subscribers, Views and Likes

Spark Dask DuckDB Polars: TPC-H Benchmarks at Scale

Follow
Coiled

We run the common TPCH Benchmark suite at 10 GB, 100 GB, 1 TB, and 10 TB scale on the cloud a local machine and compare performance for common large dataframe libraries.

No tool does universally well. We look at common bottlenecks and compare performance between the different systems.

This talk was originally given at PyData NYC 2023. These results are preliminary, and come from only a couple weeks of exploration.

00:00 Introduction
01:58 Background!
13:30 Charts!
20:00 Analysis.
30:12 Deployment!

Learn More:
Latest TPCH results and more details: https://docs.coiled.io/blog/tpch.html
Performance improvements for Dask DataFrame: https://docs.coiled.io/blog/daskdata...

posted by aznrycboy8j