polars.scan_pyarrow_dataset#

polars.scan_pyarrow_dataset(source: pa.dataset.Dataset, *, allow_pyarrow_filter: bool = True) LazyFrame[source]#

Scan a pyarrow dataset.

This can be useful to connect to cloud or partitioned datasets.

Parameters:
source

Pyarrow dataset to scan.

allow_pyarrow_filter

Allow predicates to be pushed down to pyarrow. This can lead to different results if comparisons are done with null values as pyarrow handles this different than polars does.

Warning

This API is experimental and may change without it being considered a breaking change.

Examples

>>> import pyarrow.dataset as ds
>>> dset = ds.dataset("s3://my-partitioned-folder/", format="ipc")  
>>> (
...     pl.scan_pyarrow_dataset(dset)
...     .filter("bools")
...     .select(["bools", "floats", "date"])
...     .collect()
... )  
shape: (1, 3)
┌───────┬────────┬────────────┐
│ bools ┆ floats ┆ date       │
│ ---   ┆ ---    ┆ ---        │
│ bool  ┆ f64    ┆ date       │
╞═══════╪════════╪════════════╡
│ true  ┆ 2.0    ┆ 1970-05-04 │
└───────┴────────┴────────────┘