Skip to content

Collect and profile a lazy query

Source code

Description

This will run the query and return a list containing the materialized DataFrame and a DataFrame that contains profiling information of each node that is executed.

Usage

<LazyFrame>$profile(
  ...,
  type_coercion = TRUE,
  `_type_check` = TRUE,
  predicate_pushdown = TRUE,
  projection_pushdown = TRUE,
  simplify_expression = TRUE,
  slice_pushdown = TRUE,
  comm_subplan_elim = TRUE,
  comm_subexpr_elim = TRUE,
  cluster_with_columns = TRUE,
  collapse_joins = TRUE,
  no_optimization = FALSE,
  `_check_order` = TRUE,
  show_plot = FALSE,
  truncate_nodes = 0
)

Arguments

These dots are for future extensions and must be empty.
type_coercion A logical, indicates type coercion optimization.
predicate_pushdown A logical, indicates predicate pushdown optimization.
projection_pushdown A logical, indicates projection pushdown optimization.
simplify_expression A logical, indicates simplify expression optimization.
slice_pushdown A logical, indicates slice pushdown optimization.
comm_subplan_elim A logical, indicates trying to cache branching subplans that occur on self-joins or unions.
comm_subexpr_elim A logical, indicates trying to cache common subexpressions.
cluster_with_columns A logical, indicates to combine sequential independent calls to with_columns.
collapse_joins Collapse a join and filters into a faster join.
no_optimization A logical. If TRUE, turn off (certain) optimizations.
\_check_order, \_type_check For internal use only.
show_plot Show a Gantt chart of the profiling result
truncate_nodes Truncate the label lengths in the Gantt chart to this number of characters. If 0 (default), do not truncate.

Details

The units of the timings are microseconds.

Value

List of two DataFrames: one with the collected result, the other with the timings of each step. If show_plot = TRUE, then the plot is also stored in the list.

See Also

  • $collect() - regular collect.
  • $sink_parquet() streams query to a parquet file.
  • $sink_ipc() streams query to a arrow file.

Examples

library("polars")

lf <- pl$LazyFrame(
  a = c("a", "b", "a", "b", "b", "c"),
  b = 1:6,
  c = 6:1,
)

lf$group_by("a", .maintain_order = TRUE)$agg(
  pl$all()$sum()
)$sort("a")$profile()
#> [[1]]
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ a   ┆ b   ┆ c   │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ i32 ┆ i32 │
#> ╞═════╪═════╪═════╡
#> │ a   ┆ 4   ┆ 10  │
#> │ b   ┆ 11  ┆ 10  │
#> │ c   ┆ 6   ┆ 1   │
#> └─────┴─────┴─────┘
#> 
#> [[2]]
#> shape: (3, 3)
#> ┌─────────────────────────┬───────┬──────┐
#> │ node                    ┆ start ┆ end  │
#> │ ---                     ┆ ---   ┆ ---  │
#> │ str                     ┆ u64   ┆ u64  │
#> ╞═════════════════════════╪═══════╪══════╡
#> │ optimization            ┆ 0     ┆ 1962 │
#> │ group_by_partitioned(a) ┆ 1962  ┆ 7333 │
#> │ sort(a)                 ┆ 7423  ┆ 8258 │
#> └─────────────────────────┴───────┴──────┘