Understanding a query plan

For any lazy query Polars has both:

  • a non-optimized plan with the set of steps code as we provided it and
  • an optimized plan with changes made by the query optimizer

We can understand both the non-optimized and optimized query plans with visualization and by printing them as text.

Non-optimized query plan

Graphviz visualization

First we visualise the non-optimized plan by setting optimized=False.

import polars as pl

from ..paths import DATA_DIR

q1 = (
    pl.scan_csv(f"{DATA_DIR}/reddit.csv")
    .with_columns(pl.col("name").str.to_uppercase())
    .filter(pl.col("comment_karma") > 0)
)

q1.show_graph(optimized=False)

The query plan visualisation should be read from bottom to top. In the visualisation:

  • each box corresponds to a stage in the query plan
  • the sigma stands for SELECTION and indicates any filter conditions
  • the pi stands for PROJECTION and indicates choosing a subset of columns

Printed query plan

We can also print the non-optimized plan with describe_plan

q1.describe_plan()
{{#include ../outputs/lazy_api/q1_plan.txt}}

The printed plan should also be read from bottom to top. This non-optimized plan is roughly to:

  • read from the data/reddit.csv file
  • read all 6 columns (where the * wildcard in PROJECT */6 COLUMNS means take all columns)
  • transform the name column to uppercase
  • apply a filter on the comment_karma column

Optimized query plan

Now we visualise the optimized plan with show_graph.

q1.show_graph()

We can also print the optimized plan with describe_optimized_plan

q1.describe_optimized_plan()
{{#include ../outputs/lazy_api/q1_opt_plan.txt}}

The optimized plan is to:

  • read the data from the Reddit CSV
  • apply the filter on the comment_karma column while the CSV is being read line-by-line
  • transform the name column to uppercase

In this case the query optimizer has identified that the filter can be applied while the CSV is read from disk rather than writing the whole file to disk and then applying it. This optimization is called Predicate Pushdown.