I did large-scale molecular data engineering with DuckDB and Polars (https://git...

I did large-scale molecular data engineering with DuckDB and Polars (https://github.com/scikit-fingerprints/MolPILE_dataset, https://arxiv.org/abs/2509.18353 for those interested). Both were amazing, but for large-scale JOINs, only DuckDB didn't result in OOM. Really a pleasure to use too.

I could also JOIN local CSV datasets, Postgres database, and even Excel files from chemists. All of this in Jupyter Notebook and really seamless Python integration. This also means that one can easily do heavy lifting in DuckDB, export to Polars variable, put into Plotly, and get an interactive plot. Neat stuff.