Collect a query into a DataFrame
Description
$collect()
performs the query on
the LazyFrame. It returns a DataFrame
Usage
<LazyFrame>$collect(
...,
type_coercion = TRUE,
predicate_pushdown = TRUE,
projection_pushdown = TRUE,
simplify_expression = TRUE,
slice_pushdown = TRUE,
comm_subplan_elim = TRUE,
comm_subexpr_elim = TRUE,
cluster_with_columns = TRUE,
streaming = FALSE,
no_optimization = FALSE,
collect_in_background = FALSE
)
Arguments
…
|
Ignored. |
type_coercion
|
Logical. Coerce types such that operations succeed and run on minimal required memory. |
predicate_pushdown
|
Logical. Applies filters as early as possible at scan level. |
projection_pushdown
|
Logical. Select only the columns that are needed at the scan level. |
simplify_expression
|
Logical. Various optimizations, such as constant folding and replacing expensive operations with faster alternatives. |
slice_pushdown
|
Logical. Only load the required slice from the scan level. Don’t
materialize sliced outputs (e.g. join$head(10) ).
|
comm_subplan_elim
|
Logical. Will try to cache branching subplans that occur on self-joins or unions. |
comm_subexpr_elim
|
Logical. Common subexpressions will be cached and reused. |
cluster_with_columns
|
Combine sequential independent calls to with_columns() .
|
streaming
|
Logical. Run parts of the query in a streaming fashion (this is in an alpha state). |
no_optimization
|
Logical. Sets the following parameters to FALSE :
predicate_pushdown , projection_pushdown ,
slice_pushdown , comm_subplan_elim ,
comm_subexpr_elim , cluster_with_columns .
|
collect_in_background
|
Logical. Detach this query from R session. Computation will start in background. Get a handle which later can be converted into the resulting DataFrame. Useful in interactive mode to not lock R session. |
Details
Note: use $fetch(n)
if you want
to run your query on the first n
rows only. This can be a
huge time saver in debugging queries.
Value
A DataFrame
See Also
-
$fetch()
- fast limited query check -
$profile()
- same as$collect()
but also returns a table with each operation profiled. -
$collect_in_background()
- non-blocking collect returns a future handle. Can also just be used via$collect(collect_in_background = TRUE)
. -
$sink_parquet()
streams query to a parquet file. -
$sink_ipc()
streams query to a arrow file.
Examples
#> shape: (50, 5)
#> ┌──────────────┬─────────────┬──────────────┬─────────────┬─────────┐
#> │ Sepal.Length ┆ Sepal.Width ┆ Petal.Length ┆ Petal.Width ┆ Species │
#> │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ f64 ┆ f64 ┆ cat │
#> ╞══════════════╪═════════════╪══════════════╪═════════════╪═════════╡
#> │ 5.1 ┆ 3.5 ┆ 1.4 ┆ 0.2 ┆ setosa │
#> │ 4.9 ┆ 3.0 ┆ 1.4 ┆ 0.2 ┆ setosa │
#> │ 4.7 ┆ 3.2 ┆ 1.3 ┆ 0.2 ┆ setosa │
#> │ 4.6 ┆ 3.1 ┆ 1.5 ┆ 0.2 ┆ setosa │
#> │ 5.0 ┆ 3.6 ┆ 1.4 ┆ 0.2 ┆ setosa │
#> │ … ┆ … ┆ … ┆ … ┆ … │
#> │ 4.8 ┆ 3.0 ┆ 1.4 ┆ 0.3 ┆ setosa │
#> │ 5.1 ┆ 3.8 ┆ 1.6 ┆ 0.2 ┆ setosa │
#> │ 4.6 ┆ 3.2 ┆ 1.4 ┆ 0.2 ┆ setosa │
#> │ 5.3 ┆ 3.7 ┆ 1.5 ┆ 0.2 ┆ setosa │
#> │ 5.0 ┆ 3.3 ┆ 1.4 ┆ 0.2 ┆ setosa │
#> └──────────────┴─────────────┴──────────────┴─────────────┴─────────┘