polars.DataFrame.write_parquet#
- DataFrame.write_parquet(file: str | Path | BytesIO, *, compression: ParquetCompression = 'zstd', compression_level: int | None = None, statistics: bool = False, row_group_size: int | None = None, use_pyarrow: bool = True, pyarrow_options: dict[str, object] | None = None) None [source]#
Write to Apache Parquet file.
- Parameters:
- file
File path to which the file should be written.
- compression{‘lz4’, ‘uncompressed’, ‘snappy’, ‘gzip’, ‘lzo’, ‘brotli’, ‘zstd’}
Choose “zstd” for good compression performance. Choose “lz4” for fast compression/decompression. Choose “snappy” for more backwards compatibility guarantees when you deal with older parquet readers.
- compression_level
The level of compression to use. Higher compression means smaller files on disk.
“gzip” : min-level: 0, max-level: 10.
“brotli” : min-level: 0, max-level: 11.
“zstd” : min-level: 1, max-level: 22.
- statistics
Write statistics to the parquet headers. This requires extra compute.
- row_group_size
Size of the row groups in number of rows. If None (default), the chunks of the DataFrame are used. Writing in smaller chunks may reduce memory pressure and improve writing speeds. If None and
use_pyarrow=True
, the row group size will be the minimum of the DataFrame size and 64 * 1024 * 1024.- use_pyarrow
Use C++ parquet implementation vs Rust parquet implementation. At the moment C++ supports more features.
- pyarrow_options
Arguments passed to
pyarrow.parquet.write_table
.
Examples
>>> import pathlib >>> >>> df = pl.DataFrame( ... { ... "foo": [1, 2, 3, 4, 5], ... "bar": [6, 7, 8, 9, 10], ... "ham": ["a", "b", "c", "d", "e"], ... } ... ) >>> path: pathlib.Path = dirpath / "new_file.parquet" >>> df.write_parquet(path)