Collect and profile a lazy query
Description
This will run the query and return a list containing the materialized DataFrame and a DataFrame that contains profiling information of each node that is executed.
Usage
<LazyFrame>$profile(
...,
show_plot = FALSE,
truncate_nodes = 0,
engine = c("auto", "in-memory", "streaming"),
optimizations = pl\$QueryOptFlags(),
type_coercion = deprecated(),
predicate_pushdown = deprecated(),
projection_pushdown = deprecated(),
simplify_expression = deprecated(),
slice_pushdown = deprecated(),
comm_subplan_elim = deprecated(),
comm_subexpr_elim = deprecated(),
cluster_with_columns = deprecated(),
collapse_joins = deprecated(),
no_optimization = deprecated()
)
Arguments
Details
The units of the timings are microseconds.
Value
List of two DataFrames: one with the collected result, the
other with the timings of each step. If show_plot = TRUE,
then the plot is also stored in the list.
See Also
-
$collect()- regular collect. -
$sink_parquet()streams query to a parquet file. -
$sink_ipc()streams query to a arrow file.
Examples
library("polars")
lf <- pl$LazyFrame(
a = c("a", "b", "a", "b", "b", "c"),
b = 1:6,
c = 6:1,
)
lf$group_by("a", .maintain_order = TRUE)$agg(
pl$all()$sum()
)$sort("a")$profile()
#> [[1]]
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ a ┆ b ┆ c │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ i32 ┆ i32 │
#> ╞═════╪═════╪═════╡
#> │ a ┆ 4 ┆ 10 │
#> │ b ┆ 11 ┆ 10 │
#> │ c ┆ 6 ┆ 1 │
#> └─────┴─────┴─────┘
#>
#> [[2]]
#> shape: (2, 3)
#> ┌──────────────┬───────┬──────┐
#> │ node ┆ start ┆ end │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ u64 ┆ u64 │
#> ╞══════════════╪═══════╪══════╡
#> │ optimization ┆ 0 ┆ 6004 │
#> │ sort(a) ┆ 6004 ┆ 6713 │
#> └──────────────┴───────┴──────┘