Materialize this LazyFrame into a DataFrame
Source code
Description
By default, all query optimizations are enabled. Individual
optimizations may be disabled by setting the corresponding parameter to
FALSE.
Usage
<LazyFrame>$collect(
...,
engine = c("auto", "in-memory", "streaming"),
optimizations = pl\$QueryOptFlags(),
type_coercion = deprecated(),
predicate_pushdown = deprecated(),
projection_pushdown = deprecated(),
simplify_expression = deprecated(),
slice_pushdown = deprecated(),
comm_subplan_elim = deprecated(),
comm_subexpr_elim = deprecated(),
cluster_with_columns = deprecated(),
collapse_joins = deprecated(),
no_optimization = deprecated()
)
Arguments
…
|
These dots are for future extensions and must be empty.
|
engine
|
The engine name to use for processing the query. One of the followings:
-
“auto” (default): Select the engine automatically. The
“in-memory” engine will be selected for most cases.
-
“in-memory”: Use the in-memory engine.
-
“streaming”:
Use the (new) streaming engine.
|
optimizations
|
A QueryOptFlags object to indicate optimization passes done during query
optimization.
|
type_coercion
|
Use the type_coercion property of a QueryOptFlags object,
then pass that to the optimizations argument instead.
|
predicate_pushdown
|
Use the predicate_pushdown property of a QueryOptFlags
object, then pass that to the optimizations argument
instead.
|
projection_pushdown
|
Use the projection_pushdown property of a QueryOptFlags
object, then pass that to the optimizations argument
instead.
|
simplify_expression
|
Use the simplify_expression property of a QueryOptFlags
object, then pass that to the optimizations argument
instead.
|
slice_pushdown
|
Use the slice_pushdown property of a QueryOptFlags object,
then pass that to the optimizations argument instead.
|
comm_subplan_elim
|
Use the comm_subplan_elim property of a QueryOptFlags
object, then pass that to the optimizations argument
instead.
|
comm_subexpr_elim
|
Use the comm_subexpr_elim property of a QueryOptFlags
object, then pass that to the optimizations argument
instead.
|
cluster_with_columns
|
Use the cluster_with_columns property of a QueryOptFlags
object, then pass that to the optimizations argument
instead.
|
collapse_joins
|
Use the predicate_pushdown property of a QueryOptFlags
object, then pass that to the optimizations argument
instead.
|
no_optimization
|
Use the optimizations argument with
pl$QueryOptFlags()$no_optimizations() instead.
|
Value
A polars DataFrame
See Also
-
$profile() - same as
$collect() but also returns a
table with each operation profiled.
-
$sink_parquet() streams query to a parquet file.
-
$sink_ipc() streams query to a arrow file.
Examples
library("polars")
lf <- pl$LazyFrame(
a = c("a", "b", "a", "b", "b", "c"),
b = 1:6,
c = 6:1,
)
lf$group_by("a")$agg(pl$all()$sum())$collect()
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ a ┆ b ┆ c │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ i32 ┆ i32 │
#> ╞═════╪═════╪═════╡
#> │ c ┆ 6 ┆ 1 │
#> │ b ┆ 11 ┆ 10 │
#> │ a ┆ 4 ┆ 10 │
#> └─────┴─────┴─────┘
# Collect in streaming mode
lf$group_by("a")$agg(pl$all()$sum())$collect(
engine = "streaming"
)
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ a ┆ b ┆ c │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ i32 ┆ i32 │
#> ╞═════╪═════╪═════╡
#> │ c ┆ 6 ┆ 1 │
#> │ a ┆ 4 ┆ 10 │
#> │ b ┆ 11 ┆ 10 │
#> └─────┴─────┴─────┘