Skip to content

Convert to a data.frame

Description

Equivalent to as_polars_df(x, …)$to_data_frame(…).

Usage

## S3 method for class 'RPolarsDataFrame'
as.data.frame(x, ..., int64_conversion = polars_options()\$int64_conversion)

# S3 method for class 'RPolarsLazyFrame'
as.data.frame(
  x,
  ...,
  n_rows = Inf,
  type_coercion = TRUE,
  predicate_pushdown = TRUE,
  projection_pushdown = TRUE,
  simplify_expression = TRUE,
  slice_pushdown = TRUE,
  comm_subplan_elim = TRUE,
  comm_subexpr_elim = TRUE,
  cluster_with_columns = TRUE,
  streaming = FALSE,
  no_optimization = FALSE,
  collect_in_background = FALSE
)

Arguments

x An object to convert to a data.frame.
Additional arguments passed to methods.
int64_conversion How should Int64 values be handled when converting a polars object to R?
  • “double” (default) converts the integer values to double.
  • “bit64” uses bit64::as.integer64() to do the conversion (requires the package bit64 to be attached).
  • “string” converts Int64 values to character.
n_rows Number of rows to fetch. Defaults to Inf, meaning all rows.
type_coercion Logical. Coerce types such that operations succeed and run on minimal required memory.
predicate_pushdown Logical. Applies filters as early as possible at scan level.
projection_pushdown Logical. Select only the columns that are needed at the scan level.
simplify_expression Logical. Various optimizations, such as constant folding and replacing expensive operations with faster alternatives.
slice_pushdown Logical. Only load the required slice from the scan level. Don’t materialize sliced outputs (e.g. join$head(10)).
comm_subplan_elim Logical. Will try to cache branching subplans that occur on self-joins or unions.
comm_subexpr_elim Logical. Common subexpressions will be cached and reused.
cluster_with_columns Combine sequential independent calls to with_columns().
streaming Logical. Run parts of the query in a streaming fashion (this is in an alpha state).
no_optimization Logical. Sets the following parameters to FALSE: predicate_pushdown, projection_pushdown, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns.
collect_in_background Logical. Detach this query from R session. Computation will start in background. Get a handle which later can be converted into the resulting DataFrame. Useful in interactive mode to not lock R session.

Conversion to R data types considerations

When converting Polars objects, such as DataFrames to R objects, for example via the as.data.frame() generic function, each type in the Polars object is converted to an R type. In some cases, an error may occur because the conversion is not appropriate. In particular, there is a high possibility of an error when converting a Datetime type without a time zone. A Datetime type without a time zone in Polars is converted to the POSIXct type in R, which takes into account the time zone in which the R session is running (which can be checked with the Sys.timezone() function). In this case, if ambiguous times are included, a conversion error will occur. In such cases, change the session time zone using Sys.setenv(TZ = "UTC") and then perform the conversion, or use the $dt$replace_time_zone() method on the Datetime type column to explicitly specify the time zone before conversion.

# Due to daylight savings, clocks were turned forward 1 hour on Sunday, March 8, 2020, 2:00:00 am
# so this particular date-time doesn't exist
non_existent_time = as_polars_series("2020-03-08 02:00:00")\$str\$strptime(pl\$Datetime(), "%F %T")

withr::with_timezone(
  "America/New_York",
  {
    tryCatch(
      # This causes an error due to the time zone (the `TZ` env var is affected).
      as.vector(non_existent_time),
      error = function(e) e
    )
  }
)
#> <error: in to_r: ComputeError(ErrString("datetime '2020-03-08 02:00:00' is non-existent in time zone 'America/New_York'. You may be able to use `non_existent='null'` to return `null` in this case.")) When calling: devtools::document()>

withr::with_timezone(
  "America/New_York",
  {
    # This is safe.
    as.vector(non_existent_time\$dt\$replace_time_zone("UTC"))
  }
)
#> [1] "2020-03-08 02:00:00 UTC"

See Also

  • as_polars_df()
  • \$to_data_frame()