Write to parquet file

Description

Usage

<DataFrame>$write_parquet(
  file,
  ...,
  compression = "zstd",
  compression_level = 3,
  statistics = TRUE,
  row_group_size = NULL,
  data_page_size = NULL,
  partition_by = NULL,
  partition_chunk_size_bytes = 4294967296
)

Arguments

`file`	File path to which the result should be written. This should be a path to a directory if writing a partitioned dataset.
`…`	Ignored.
`compression`	String. The compression method. One of: "lz4": fast compression/decompression. "uncompressed" "snappy": this guarantees that the parquet file will be compatible with older parquet readers. "gzip" "lzo" "brotli" "zstd": good compression performance.
`compression_level`	`NULL` or Integer. The level of compression to use. Only used if method is one of ‘gzip’, ‘brotli’, or ‘zstd’. Higher compression means smaller files on disk: "gzip": min-level: 0, max-level: 10. "brotli": min-level: 0, max-level: 11. "zstd": min-level: 1, max-level: 22.
`statistics`	Whether statistics should be written to the Parquet headers. Possible values: `TRUE`: enable default set of statistics (default) `FALSE`: disable all statistics `“full”`: calculate and write all available statistics. A named list where all values must be `TRUE` or `FALSE`, e.g. `list(min = TRUE, max = FALSE)`. Statistics available are `“min”`, `“max”`, `“distinct_count”`, `“null_count”`.
`row_group_size`	`NULL` or Integer. Size of the row groups in number of rows. If `NULL` (default), the chunks of the DataFrame are used. Writing in smaller chunks may reduce memory pressure and improve writing speeds.
`data_page_size`	Size of the data page in bytes. If `NULL` (default), it is set to 1024^2 bytes. will be ~1MB.
`partition_by`	Column(s) to partition by. A partitioned dataset will be written if this is specified.
`partition_chunk_size_bytes`	Approximate size to split DataFrames within a single partition when writing. Note this is calculated using the size of the DataFrame in memory - the size of the output file may differ depending on the file format / compression.

Value

Invisibly returns the input DataFrame.

Examples

library("polars")


dat = as_polars_df(mtcars)

# write data to a single parquet file
destination = withr::local_tempfile(fileext = ".parquet")
dat$write_parquet(destination)

# write data to folder with a hive-partitioned structure
dest_folder = withr::local_tempdir()
dat$write_parquet(dest_folder, partition_by = c("gear", "cyl"))
list.files(dest_folder, recursive = TRUE)

#> [1] "gear=3.0/cyl=4.0/00000000.parquet" "gear=3.0/cyl=6.0/00000000.parquet"
#> [3] "gear=3.0/cyl=8.0/00000000.parquet" "gear=4.0/cyl=4.0/00000000.parquet"
#> [5] "gear=4.0/cyl=6.0/00000000.parquet" "gear=5.0/cyl=4.0/00000000.parquet"
#> [7] "gear=5.0/cyl=6.0/00000000.parquet" "gear=5.0/cyl=8.0/00000000.parquet"