Create a string representation of the query plan
Description
The query plan is read from bottom to top. When optimized =
FALSE
, the query as it was written by the user is shown. This is
not what Polars runs. Instead, it applies optimizations that are
displayed by default by
$explain()
. One classic example
is the predicate pushdown, which applies the filter as early as possible
(i.e. at the bottom of the plan).
Usage
<LazyFrame>$explain(
...,
format = "plain",
optimized = TRUE,
type_coercion = TRUE,
predicate_pushdown = TRUE,
projection_pushdown = TRUE,
simplify_expression = TRUE,
slice_pushdown = TRUE,
comm_subplan_elim = TRUE,
comm_subexpr_elim = TRUE,
cluster_with_columns = TRUE,
streaming = FALSE
)
Arguments
…
|
Ignored. |
format
|
The format to use for displaying the logical plan. Must be either
“plain” (default) or “tree” .
|
optimized
|
Return an optimized query plan. If TRUE (default), the
subsequent optimization flags control which optimizations run.
|
type_coercion
|
Logical. Coerce types such that operations succeed and run on minimal required memory. |
predicate_pushdown
|
Logical. Applies filters as early as possible at scan level. |
projection_pushdown
|
Logical. Select only the columns that are needed at the scan level. |
simplify_expression
|
Logical. Various optimizations, such as constant folding and replacing expensive operations with faster alternatives. |
slice_pushdown
|
Logical. Only load the required slice from the scan level. Don’t
materialize sliced outputs (e.g. join$head(10) ).
|
comm_subplan_elim
|
Logical. Will try to cache branching subplans that occur on self-joins or unions. |
comm_subexpr_elim
|
Logical. Common subexpressions will be cached and reused. |
cluster_with_columns
|
Combine sequential independent calls to with_columns() .
|
streaming
|
Logical. Run parts of the query in a streaming fashion (this is in an alpha state). |
Value
A character value containing the query plan.
Examples
library("polars")
lazy_frame = as_polars_lf(iris)
# Prepare your query
lazy_query = lazy_frame$sort("Species")$filter(pl$col("Species") != "setosa")
# This is the query that was written by the user, without any optimizations
# (use cat() for better printing)
lazy_query$explain(optimized = FALSE) |> cat()
#> FILTER [(col("Species")) != (String(setosa))] FROM
#> SORT BY [col("Species")]
#> DF ["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"]; PROJECT */5 COLUMNS; SELECTION: None
# This is the query after `polars` optimizes it: instead of sorting first and
# then filtering, it is faster to filter first and then sort the rest.
lazy_query$explain() |> cat()
#> SORT BY [col("Species")]
#> DF ["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"]; PROJECT */5 COLUMNS; SELECTION: [(col("Species")) != (String(setosa))]
#> 0 1
#> ┌───────────────────────────────────────────────────────────────────────────────────────────
#> │
#> │ ╭─────────╮
#> 0 │ │ SORT BY │
#> │ ╰────┬┬───╯
#> │ ││
#> │ │╰────────────────────────────────────────────╮
#> │ │ │
#> │ ╭───────┴────────╮ ╭─────────────────────────────────┴─────────────────────────────────╮
#> │ │ expression: │ │ DF ["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"] │
#> 1 │ │ col("Species") │ │ PROJECT */5 COLUMNS │
#> │ ╰────────────────╯ ╰─────────────────────────────────┬─────────────────────────────────╯
#> │ │
#> │ │
#> │ │
#> │ ╭───────────────────┴────────────────────╮
#> │ │ SELECTION: │
#> 2 │ │ [(col("Species")) != (String(setosa))] │
#> │ ╰────────────────────────────────────────╯