polars.scan_ds#

polars.scan_ds(ds: pa.dataset.dataset) LazyFrame[source]#

Scan a pyarrow dataset.

This can be useful to connect to cloud or partitioned datasets.

Parameters:
ds

Pyarrow dataset to scan.

Warning

This API is experimental and may change without it being considered a breaking change.

Examples

>>> import pyarrow.dataset as ds
>>> dset = ds.dataset("s3://my-partitioned-folder/", format="ipc")  
>>> out = (
...     pl.scan_ds(dset)
...     .filter("bools")
...     .select(["bools", "floats", "date"])
...     .collect()
... )  
shape: (1, 3)
┌───────┬────────┬────────────┐
│ bools ┆ floats ┆ date       │
│ ---   ┆ ---    ┆ ---        │
│ bool  ┆ f64    ┆ date       │
╞═══════╪════════╪════════════╡
│ true  ┆ 2.0    ┆ 1970-05-04 │
└───────┴────────┴────────────┘