Fetch n
rows of a LazyFrame
Description
This is similar to $collect()
but
limit the number of rows to collect. It is mostly useful to check that a
query works as expected.
Usage
<LazyFrame>$fetch(
n_rows = 500,
...,
type_coercion = TRUE,
predicate_pushdown = TRUE,
projection_pushdown = TRUE,
simplify_expression = TRUE,
slice_pushdown = TRUE,
comm_subplan_elim = TRUE,
comm_subexpr_elim = TRUE,
cluster_with_columns = TRUE,
streaming = FALSE,
no_optimization = FALSE
)
Arguments
n_rows
|
Integer. Maximum number of rows to fetch. |
…
|
Ignored. |
type_coercion
|
Logical. Coerce types such that operations succeed and run on minimal required memory. |
predicate_pushdown
|
Logical. Applies filters as early as possible at scan level. |
projection_pushdown
|
Logical. Select only the columns that are needed at the scan level. |
simplify_expression
|
Logical. Various optimizations, such as constant folding and replacing expensive operations with faster alternatives. |
slice_pushdown
|
Logical. Only load the required slice from the scan level. Don’t
materialize sliced outputs (e.g. join$head(10) ).
|
comm_subplan_elim
|
Logical. Will try to cache branching subplans that occur on self-joins or unions. |
comm_subexpr_elim
|
Logical. Common subexpressions will be cached and reused. |
cluster_with_columns
|
Combine sequential independent calls to with_columns() .
|
streaming
|
Logical. Run parts of the query in a streaming fashion (this is in an alpha state). |
no_optimization
|
Logical. Sets the following parameters to FALSE :
predicate_pushdown , projection_pushdown ,
slice_pushdown , comm_subplan_elim ,
comm_subexpr_elim , cluster_with_columns .
|
Details
$fetch()
does not guarantee the
final number of rows in the DataFrame output. It only guarantees that
n
rows are used at the beginning of the query. Filters,
join operations and a lower number of rows available in the scanned file
influence the final number of rows.
Value
A DataFrame of maximum n_rows
See Also
-
$collect()
- regular collect. -
$profile()
- same as$collect()
but also returns a table with each operation profiled. -
$collect_in_background()
- non-blocking collect returns a future handle. Can also just be used via$collect(collect_in_background = TRUE)
. -
$sink_parquet()
streams query to a parquet file. -
$sink_ipc()
streams query to a arrow file.
Examples
#> shape: (3, 5)
#> ┌──────────────┬─────────────┬──────────────┬─────────────┬─────────┐
#> │ Sepal.Length ┆ Sepal.Width ┆ Petal.Length ┆ Petal.Width ┆ Species │
#> │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ f64 ┆ f64 ┆ cat │
#> ╞══════════════╪═════════════╪══════════════╪═════════════╪═════════╡
#> │ 5.1 ┆ 3.5 ┆ 1.4 ┆ 0.2 ┆ setosa │
#> │ 4.9 ┆ 3.0 ┆ 1.4 ┆ 0.2 ┆ setosa │
#> │ 4.7 ┆ 3.2 ┆ 1.3 ┆ 0.2 ┆ setosa │
#> └──────────────┴─────────────┴──────────────┴─────────────┴─────────┘
# this fetch-query returns 4 rows, because we started with 3 and appended one
# row in the query (see section 'Details')
as_polars_lf(iris)$
select(pl$col("Species")$append("flora gigantica, alien"))$
fetch(3)
#> shape: (4, 1)
#> ┌────────────────────────┐
#> │ Species │
#> │ --- │
#> │ str │
#> ╞════════════════════════╡
#> │ setosa │
#> │ setosa │
#> │ setosa │
#> │ flora gigantica, alien │
#> └────────────────────────┘