Skip to content

Collect and profile a lazy query

Source code

Description

This will run the query and return a list containing the materialized DataFrame and a DataFrame that contains profiling information of each node that is executed.

Usage

<LazyFrame>$profile(
  ...,
  show_plot = FALSE,
  truncate_nodes = 0,
  engine = c("auto", "in-memory", "streaming"),
  optimizations = pl\$QueryOptFlags(),
  type_coercion = deprecated(),
  predicate_pushdown = deprecated(),
  projection_pushdown = deprecated(),
  simplify_expression = deprecated(),
  slice_pushdown = deprecated(),
  comm_subplan_elim = deprecated(),
  comm_subexpr_elim = deprecated(),
  cluster_with_columns = deprecated(),
  collapse_joins = deprecated(),
  no_optimization = deprecated()
)

Arguments

These dots are for future extensions and must be empty.
show_plot Show a Gantt chart of the profiling result
truncate_nodes Truncate the label lengths in the Gantt chart to this number of characters. If 0 (default), do not truncate.
engine The engine name to use for processing the query. One of the followings:
  • “auto” (default): Select the engine automatically. The “in-memory” engine will be selected for most cases.
  • “in-memory”: Use the in-memory engine.
  • “streaming”: [Experimental] Use the (new) streaming engine.
optimizations [Experimental] A QueryOptFlags object to indicate optimization passes done during query optimization.
type_coercion [Deprecated] Use the type_coercion property of a QueryOptFlags object, then pass that to the optimizations argument instead.
predicate_pushdown [Deprecated] Use the predicate_pushdown property of a QueryOptFlags object, then pass that to the optimizations argument instead.
projection_pushdown [Deprecated] Use the projection_pushdown property of a QueryOptFlags object, then pass that to the optimizations argument instead.
simplify_expression [Deprecated] Use the simplify_expression property of a QueryOptFlags object, then pass that to the optimizations argument instead.
slice_pushdown [Deprecated] Use the slice_pushdown property of a QueryOptFlags object, then pass that to the optimizations argument instead.
comm_subplan_elim [Deprecated] Use the comm_subplan_elim property of a QueryOptFlags object, then pass that to the optimizations argument instead.
comm_subexpr_elim [Deprecated] Use the comm_subexpr_elim property of a QueryOptFlags object, then pass that to the optimizations argument instead.
cluster_with_columns [Deprecated] Use the cluster_with_columns property of a QueryOptFlags object, then pass that to the optimizations argument instead.
collapse_joins [Deprecated] Use the predicate_pushdown property of a QueryOptFlags object, then pass that to the optimizations argument instead.
no_optimization [Deprecated] Use the optimizations argument with pl$QueryOptFlags()$no_optimizations() instead.

Details

The units of the timings are microseconds.

Value

List of two DataFrames: one with the collected result, the other with the timings of each step. If show_plot = TRUE, then the plot is also stored in the list.

See Also

  • $collect() - regular collect.
  • $sink_parquet() streams query to a parquet file.
  • $sink_ipc() streams query to a arrow file.

Examples

library("polars")

lf <- pl$LazyFrame(
  a = c("a", "b", "a", "b", "b", "c"),
  b = 1:6,
  c = 6:1,
)

lf$group_by("a", .maintain_order = TRUE)$agg(
  pl$all()$sum()
)$sort("a")$profile()
#> [[1]]
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ a   ┆ b   ┆ c   │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ i32 ┆ i32 │
#> ╞═════╪═════╪═════╡
#> │ a   ┆ 4   ┆ 10  │
#> │ b   ┆ 11  ┆ 10  │
#> │ c   ┆ 6   ┆ 1   │
#> └─────┴─────┴─────┘
#> 
#> [[2]]
#> shape: (2, 3)
#> ┌──────────────┬───────┬──────┐
#> │ node         ┆ start ┆ end  │
#> │ ---          ┆ ---   ┆ ---  │
#> │ str          ┆ u64   ┆ u64  │
#> ╞══════════════╪═══════╪══════╡
#> │ optimization ┆ 0     ┆ 6004 │
#> │ sort(a)      ┆ 6004  ┆ 6713 │
#> └──────────────┴───────┴──────┘