polars.scan_parquet

polars.scan_parquet(file: Union[str, pathlib.Path], n_rows: Optional[int] = None, cache: bool = True, parallel: bool = True, rechunk: bool = True, row_count_name: Optional[str] = None, row_count_offset: int = 0, **kwargs: Any) polars.internals.lazy_frame.LazyFrame

Lazily read from a parquet file or multiple files via glob patterns.

This allows the query optimizer to push down predicates and projections to the scan level, thereby potentially reducing memory overhead.

Parameters
file

Path to a file.

n_rows

Stop reading from parquet file after reading n_rows.

cache

Cache the result after reading.

parallel

Read the parquet file in parallel. The single threaded reader consumes less memory.

rechunk

In case of reading multiple files via a glob pattern rechunk the final DataFrame into contiguous memory chunks.

row_count_name

If not None, this will insert a row count column with give name into the DataFrame

row_count_offset

Offset to start the row_count column (only use if the name is set)