Skip to content

Materialize this LazyFrame into a DataFrame

Source code

Description

By default, all query optimizations are enabled. Individual optimizations may be disabled by setting the corresponding parameter to FALSE.

Usage

<LazyFrame>$collect(
  ...,
  engine = c("auto", "in-memory", "streaming"),
  optimizations = pl\$QueryOptFlags(),
  type_coercion = deprecated(),
  predicate_pushdown = deprecated(),
  projection_pushdown = deprecated(),
  simplify_expression = deprecated(),
  slice_pushdown = deprecated(),
  comm_subplan_elim = deprecated(),
  comm_subexpr_elim = deprecated(),
  cluster_with_columns = deprecated(),
  collapse_joins = deprecated(),
  no_optimization = deprecated()
)

Arguments

These dots are for future extensions and must be empty.
engine The engine name to use for processing the query. One of the followings:
  • “auto” (default): Select the engine automatically. The “in-memory” engine will be selected for most cases.
  • “in-memory”: Use the in-memory engine.
  • “streaming”: [Experimental] Use the (new) streaming engine.
optimizations [Experimental] A QueryOptFlags object to indicate optimization passes done during query optimization.
type_coercion [Deprecated] Use the type_coercion property of a QueryOptFlags object, then pass that to the optimizations argument instead.
predicate_pushdown [Deprecated] Use the predicate_pushdown property of a QueryOptFlags object, then pass that to the optimizations argument instead.
projection_pushdown [Deprecated] Use the projection_pushdown property of a QueryOptFlags object, then pass that to the optimizations argument instead.
simplify_expression [Deprecated] Use the simplify_expression property of a QueryOptFlags object, then pass that to the optimizations argument instead.
slice_pushdown [Deprecated] Use the slice_pushdown property of a QueryOptFlags object, then pass that to the optimizations argument instead.
comm_subplan_elim [Deprecated] Use the comm_subplan_elim property of a QueryOptFlags object, then pass that to the optimizations argument instead.
comm_subexpr_elim [Deprecated] Use the comm_subexpr_elim property of a QueryOptFlags object, then pass that to the optimizations argument instead.
cluster_with_columns [Deprecated] Use the cluster_with_columns property of a QueryOptFlags object, then pass that to the optimizations argument instead.
collapse_joins [Deprecated] Use the predicate_pushdown property of a QueryOptFlags object, then pass that to the optimizations argument instead.
no_optimization [Deprecated] Use the optimizations argument with pl$QueryOptFlags()$no_optimizations() instead.

Value

A polars DataFrame

See Also

  • $profile() - same as $collect() but also returns a table with each operation profiled.
  • $sink_parquet() streams query to a parquet file.
  • $sink_ipc() streams query to a arrow file.

Examples

library("polars")

lf <- pl$LazyFrame(
  a = c("a", "b", "a", "b", "b", "c"),
  b = 1:6,
  c = 6:1,
)
lf$group_by("a")$agg(pl$all()$sum())$collect()
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ a   ┆ b   ┆ c   │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ i32 ┆ i32 │
#> ╞═════╪═════╪═════╡
#> │ c   ┆ 6   ┆ 1   │
#> │ b   ┆ 11  ┆ 10  │
#> │ a   ┆ 4   ┆ 10  │
#> └─────┴─────┴─────┘
# Collect in streaming mode
lf$group_by("a")$agg(pl$all()$sum())$collect(
  engine = "streaming"
)
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ a   ┆ b   ┆ c   │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ i32 ┆ i32 │
#> ╞═════╪═════╪═════╡
#> │ c   ┆ 6   ┆ 1   │
#> │ a   ┆ 4   ┆ 10  │
#> │ b   ┆ 11  ┆ 10  │
#> └─────┴─────┴─────┘