Collect and profile a lazy query
Description
This will run the query and return a list containing the materialized DataFrame and a DataFrame that contains profiling information of each node that is executed.
Usage
<LazyFrame>$profile(
...,
type_coercion = TRUE,
`_type_check` = TRUE,
predicate_pushdown = TRUE,
projection_pushdown = TRUE,
simplify_expression = TRUE,
slice_pushdown = TRUE,
comm_subplan_elim = TRUE,
comm_subexpr_elim = TRUE,
cluster_with_columns = TRUE,
no_optimization = FALSE,
`_check_order` = TRUE,
show_plot = FALSE,
truncate_nodes = 0,
collapse_joins = deprecated()
)
Arguments
Details
The units of the timings are microseconds.
Value
List of two DataFrame
s: one with the collected result, the
other with the timings of each step. If show_plot = TRUE
,
then the plot is also stored in the list.
See Also
-
$collect()
- regular collect. -
$sink_parquet()
streams query to a parquet file. -
$sink_ipc()
streams query to a arrow file.
Examples
library("polars")
lf <- pl$LazyFrame(
a = c("a", "b", "a", "b", "b", "c"),
b = 1:6,
c = 6:1,
)
lf$group_by("a", .maintain_order = TRUE)$agg(
pl$all()$sum()
)$sort("a")$profile()
#> [[1]]
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ a ┆ b ┆ c │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ i32 ┆ i32 │
#> ╞═════╪═════╪═════╡
#> │ a ┆ 4 ┆ 10 │
#> │ b ┆ 11 ┆ 10 │
#> │ c ┆ 6 ┆ 1 │
#> └─────┴─────┴─────┘
#>
#> [[2]]
#> shape: (3, 3)
#> ┌─────────────────────────┬───────┬───────┐
#> │ node ┆ start ┆ end │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ u64 ┆ u64 │
#> ╞═════════════════════════╪═══════╪═══════╡
#> │ optimization ┆ 0 ┆ 2151 │
#> │ group_by_partitioned(a) ┆ 2151 ┆ 9774 │
#> │ sort(a) ┆ 9858 ┆ 10401 │
#> └─────────────────────────┴───────┴───────┘