Write to parquet file
Description
Write to parquet file
Usage
<DataFrame>$write_parquet(
file,
...,
compression = "zstd",
compression_level = 3,
statistics = TRUE,
row_group_size = NULL,
data_page_size = NULL,
partition_by = NULL,
partition_chunk_size_bytes = 4294967296
)
Arguments
file
|
File path to which the result should be written. This should be a path to a directory if writing a partitioned dataset. |
…
|
Ignored. |
compression
|
String. The compression method. One of:
|
compression_level
|
NULL or Integer. The level of compression to use. Only used
if method is one of ‘gzip’, ‘brotli’, or ‘zstd’. Higher compression
means smaller files on disk:
|
statistics
|
Whether statistics should be written to the Parquet headers. Possible
values:
|
row_group_size
|
NULL or Integer. Size of the row groups in number of rows.
If NULL (default), the chunks of the DataFrame are used.
Writing in smaller chunks may reduce memory pressure and improve writing
speeds.
|
data_page_size
|
Size of the data page in bytes. If NULL (default), it is
set to 1024^2 bytes. will be ~1MB.
|
partition_by
|
Column(s) to partition by. A partitioned dataset will be written if this is specified. |
partition_chunk_size_bytes
|
Approximate size to split DataFrames within a single partition when writing. Note this is calculated using the size of the DataFrame in memory - the size of the output file may differ depending on the file format / compression. |
Value
Invisibly returns the input DataFrame.
Examples
library("polars")
dat = as_polars_df(mtcars)
# write data to a single parquet file
destination = withr::local_tempfile(fileext = ".parquet")
dat$write_parquet(destination)
# write data to folder with a hive-partitioned structure
dest_folder = withr::local_tempdir()
dat$write_parquet(dest_folder, partition_by = c("gear", "cyl"))
list.files(dest_folder, recursive = TRUE)
#> [1] "gear=3.0/cyl=4.0/00000000.parquet" "gear=3.0/cyl=6.0/00000000.parquet"
#> [3] "gear=3.0/cyl=8.0/00000000.parquet" "gear=4.0/cyl=4.0/00000000.parquet"
#> [5] "gear=4.0/cyl=6.0/00000000.parquet" "gear=5.0/cyl=4.0/00000000.parquet"
#> [7] "gear=5.0/cyl=6.0/00000000.parquet" "gear=5.0/cyl=8.0/00000000.parquet"