polars.read_parquet

polars.read_parquet(source: Union[str, pathlib.Path, BinaryIO, _io.BytesIO, bytes], columns: Optional[Union[List[int], List[str]]] = None, n_rows: Optional[int] = None, use_pyarrow: bool = False, memory_map: bool = True, storage_options: Optional[Dict] = None, parallel: bool = True, row_count_name: Optional[str] = None, row_count_offset: int = 0, **kwargs: Any) polars.internals.frame.DataFrame

Read into a DataFrame from a parquet file.

Parameters
source

Path to a file, or a file-like object. If the path is a directory, that directory will be used as partition aware scan. If fsspec is installed, it will be used to open remote files.

columns

Columns to select. Accepts a list of column indices (starting at zero) or a list of column names.

n_rows

Stop reading from parquet file after reading n_rows. Only valid when use_pyarrow=False.

use_pyarrow

Use pyarrow instead of the rust native parquet reader. The pyarrow reader is more stable.

memory_map

Memory map underlying file. This will likely increase performance. Only used when use_pyarrow=True.

storage_options

Extra options that make sense for fsspec.open() or a particular storage connection, e.g. host, port, username, password, etc.

parallel

Read the parquet file in parallel. The single threaded reader consumes less memory.

row_count_name

If not None, this will insert a row count column with give name into the DataFrame

row_count_offset

Offset to start the row_count column (only use if the name is set)

**kwargs

kwargs for [pyarrow.parquet.read_table](https://arrow.apache.org/docs/python/generated/pyarrow.parquet.read_table.html)

Returns
DataFrame