The Other optimizations page is under construction.
Besides predicate and projection pushdown,
Polars does other optimizations.
One important topic is optional caching and parallelization. It's easy to imagine having two
DataFrame computations that lead to a scan of the same file.
Polars may cache
the scanned file to prevent scanning the same file twice. However, if you want to, you
may override this behavior and force
Polars to read the same file. This could be faster
because the scan can be done in parallel.
If we look at the previous query, we see that the join operation has as input a
computation path with
data/reddit.csv as root and one path with
Polars can observe that there are no dependencies between the two
will read both files in parallel. If other operations are done before the join (e.g.
groupby, filters, etc.) they are also executed in parallel.
Some other optimizations that are done are expression simplifications. The impact of these optimizations is less than that of predicate and projection pushdown, but they likely add up. You can track this issue to see the latest status of those.