polars.read_parquet¶
- polars.read_parquet(source: Union[str, pathlib.Path, BinaryIO, _io.BytesIO, bytes], columns: Optional[Union[List[int], List[str]]] = None, n_rows: Optional[int] = None, use_pyarrow: bool = False, memory_map: bool = True, storage_options: Optional[Dict] = None, parallel: bool = True, row_count_name: Optional[str] = None, row_count_offset: int = 0, **kwargs: Any) polars.internals.frame.DataFrame ¶
Read into a DataFrame from a parquet file.
- Parameters
- source
Path to a file, or a file-like object. If the path is a directory, that directory will be used as partition aware scan. If
fsspec
is installed, it will be used to open remote files.- columns
Columns to select. Accepts a list of column indices (starting at zero) or a list of column names.
- n_rows
Stop reading from parquet file after reading
n_rows
. Only valid when use_pyarrow=False.- use_pyarrow
Use pyarrow instead of the rust native parquet reader. The pyarrow reader is more stable.
- memory_map
Memory map underlying file. This will likely increase performance. Only used when
use_pyarrow=True
.- storage_options
Extra options that make sense for
fsspec.open()
or a particular storage connection, e.g. host, port, username, password, etc.- parallel
Read the parquet file in parallel. The single threaded reader consumes less memory.
- row_count_name
If not None, this will insert a row count column with give name into the DataFrame
- row_count_offset
Offset to start the row_count column (only use if the name is set)
- **kwargs
kwargs for [pyarrow.parquet.read_table](https://arrow.apache.org/docs/python/generated/pyarrow.parquet.read_table.html)
- Returns
- DataFrame