Inner workings of the Series-class
Description
The Series
-class is simply two environments of respectively
the public and private methods/function calls to the polars rust side.
The instantiated Series
-object is an
externalptr
to a lowlevel rust polars Series object. The
pointer address is the only statefullness of the Series object on the R
side. Any other state resides on the rust side. The S3 method
.DollarNames.RPolarsSeries
exposes all public
$foobar()
-methods which are
callable onto the object. Most methods return another
Series
-class instance or similar which allows for method
chaining. This class system in lack of a better name could be called
"environment classes" and is the same class system extendr provides,
except here there is both a public and private set of methods. For
implementation reasons, the private methods are external and must be
called from .pr$Series$methodname()
, also all private
methods must take any self as an argument, thus they are pure functions.
Having the private methods as pure functions solved/simplified
self-referential complications.
Details
Check out the source code in R/Series_frame.R how public methods are
derived from private methods. Check out extendr-wrappers.R to see the
extendr-auto-generated methods. These are moved to .pr and converted
into pure external functions in after-wrappers.R. In zzz.R (named zzz to
be last file sourced) the extendr-methods are removed and replaced by
any function prefixed Series_
.
Active bindings
dtype
$dtype
returns the data type of
the Series.
flags
$flags
returns a named list with
flag names and their values.
Flags are used internally to avoid doing unnecessary computations, such
as sorting a variable that we know is already sorted. The number of
flags varies depending on the column type: columns of type
array
and list
have the flags
SORTED_ASC
, SORTED_DESC
, and
FAST_EXPLODE
, while other column types only have the former
two.
-
SORTED_ASC
is set toTRUE
when we sort a column in increasing order, so that we can use this information later on to avoid re-sorting it. -
SORTED_DESC
is similar but applies to sort in decreasing order.
name
$name
returns the name of the
Series.
shape
$shape
returns a numeric vector
of length two with the number of length of the Series and width of the
Series (always 1).
Expression methods
Series stores most of all Expr methods.
Some of these are stored in sub-namespaces.
arr
$arr
stores all array related
methods.
bin
$bin
stores all binary related
methods.
cat
$cat
stores all categorical
related methods.
dt
$dt
stores all temporal related
methods.
list
$list
stores all list related
methods.
str
$str
stores all string related
methods.
struct
$struct
stores all struct related
methods and active bindings.
Active bindings specific to Series:
-
$struct$fields
: Returns a character vector of the fields in the struct.
Conversion to R data types considerations
When converting Polars objects, such as DataFrames to R objects, for
example via the as.data.frame()
generic function, each type
in the Polars object is converted to an R type. In some cases, an error
may occur because the conversion is not appropriate. In particular,
there is a high possibility of an error when converting a Datetime type
without a time zone. A Datetime type without a time zone in Polars is
converted to the POSIXct type in R, which takes into account the time
zone in which the R session is running (which can be checked with the
Sys.timezone()
function). In this case, if ambiguous times
are included, a conversion error will occur. In such cases, change the
session time zone using Sys.setenv(TZ = "UTC")
and then
perform the conversion, or use the $dt$replace_time_zone()
method on the Datetime type column to explicitly specify the time zone
before conversion.
# Due to daylight savings, clocks were turned forward 1 hour on Sunday, March 8, 2020, 2:00:00 am # so this particular date-time doesn't exist non_existent_time = as_polars_series("2020-03-08 02:00:00")\$str\$strptime(pl\$Datetime(), "%F %T") withr::with_timezone( "America/New_York", { tryCatch( # This causes an error due to the time zone (the `TZ` env var is affected). as.vector(non_existent_time), error = function(e) e ) } ) #> <error: in to_r: ComputeError(ErrString("datetime '2020-03-08 02:00:00' is non-existent in time zone 'America/New_York'. You may be able to use `non_existent='null'` to return `null` in this case.")) When calling: devtools::document()> withr::with_timezone( "America/New_York", { # This is safe. as.vector(non_existent_time\$dt\$replace_time_zone("UTC")) } ) #> [1] "2020-03-08 02:00:00 UTC"
Examples
#> [1] 4 1
#> $SORTED_ASC
#> [1] TRUE
#>
#> $SORTED_DESC
#> [1] FALSE
#> polars Series: shape: (4,)
#> Series: '' [f64]
#> [
#> 0.540302
#> -0.416147
#> -0.989992
#> 0.540302
#> ]
#> polars Series: shape: (3,)
#> Series: '' [i32]
#> [
#> 3
#> 1
#> null
#> ]
#> polars Series: shape: (1,)
#> Series: '' [str]
#> [
#> "1.0-2.0"
#> ]
#> polars Series: shape: (7,)
#> Series: '' [date]
#> [
#> 2024-02-18
#> 2024-02-19
#> 2024-02-20
#> 2024-02-21
#> 2024-02-22
#> 2024-02-23
#> 2024-02-24
#> ]
#> polars Series: shape: (7,)
#> Series: '' [i8]
#> [
#> 18
#> 19
#> 20
#> 21
#> 22
#> 23
#> 24
#> ]
# Other active bindings in subnamespaces
as_polars_series(data.frame(a = 1:2, b = 3:4))$struct$fields
#> [1] "a" "b"
#>
#>
#> RPolarsSeries class methods, access via object$method() ( environment ):
#>
#> RPolarsSeries ( environment ):
#> [ abs ; function ]
#> [ add ; function ]
#> [ alias ; function ]
#> [ all ; function ]
#> [ and ; function ]
#> [ any ; function ]
#> [ append ; function ]
#> [ approx_n_unique ; function ]
#> [ arccos ; function ]
#> [ arccosh ; function ]
#> [ arcsin ; function ]
#> [ arcsinh ; function ]
#> [ arctan ; function ]
#> [ arctanh ; function ]
#> [ arg_max ; function ]
#> [ arg_min ; function ]
#> [ arg_sort ; function ]
#> [ arg_unique ; function ]
#> [ arr ; property function ]
#> [ backward_fill ; function ]
#> [ bin ; property function ]
#> [ bottom_k ; function ]
#> [ cast ; function ]
#> [ cat ; property function ]
#> [ ceil ; function ]
#> [ chunk_lengths ; function ]
#> [ clear ; function ]
#> [ clip ; function ]
#> [ clone ; function ]
#> [ cos ; function ]
#> [ cosh ; function ]
#> [ count ; function ]
#> [ cum_count ; function ]
#> [ cum_max ; function ]
#> [ cum_min ; function ]
#> [ cum_prod ; function ]
#> [ cum_sum ; function ]
#> [ cumulative_eval ; function ]
#> [ cut ; function ]
#> [ diff ; function ]
#> [ div ; function ]
#> [ dot ; function ]
#> [ drop_nans ; function ]
#> [ drop_nulls ; function ]
#> [ dt ; property function ]
#> [ dtype ; property function ]
#> [ entropy ; function ]
#> [ eq ; function ]
#> [ eq_missing ; function ]
#> [ equals ; function ]
#> [ ewm_mean ; function ]
#> [ ewm_std ; function ]
#> [ ewm_var ; function ]
#> [ exp ; function ]
#> [ explode ; function ]
#> [ extend_constant ; function ]
#> [ fill_nan ; function ]
#> [ fill_null ; function ]
#> [ filter ; function ]
#> [ first ; function ]
#> [ flags ; property function ]
#> [ flatten ; function ]
#> [ floor ; function ]
#> [ floor_div ; function ]
#> [ forward_fill ; function ]
#> [ gather ; function ]
#> [ gather_every ; function ]
#> [ gt ; function ]
#> [ gt_eq ; function ]
#> [ has_nulls ; function ]
#> [ hash ; function ]
#> [ head ; function ]
#> [ implode ; function ]
#> [ interpolate ; function ]
#> [ is_between ; function ]
#> [ is_duplicated ; function ]
#> [ is_finite ; function ]
#> [ is_first_distinct ; function ]
#> [ is_in ; function ]
#> [ is_infinite ; function ]
#> [ is_last_distinct ; function ]
#> [ is_nan ; function ]
#> [ is_not_nan ; function ]
#> [ is_not_null ; function ]
#> [ is_null ; function ]
#> [ is_numeric ; function ]
#> [ is_sorted ; function ]
#> [ is_unique ; function ]
#> [ item ; function ]
#> [ kurtosis ; function ]
#> [ last ; function ]
#> [ len ; function ]
#> [ limit ; function ]
#> [ list ; property function ]
#> [ log ; function ]
#> [ log10 ; function ]
#> [ lower_bound ; function ]
#> [ lt ; function ]
#> [ lt_eq ; function ]
#> [ map_batches ; function ]
#> [ map_elements ; function ]
#> [ max ; function ]
#> [ mean ; function ]
#> [ median ; function ]
#> [ min ; function ]
#> [ mod ; function ]
#> [ mode ; function ]
#> [ mul ; function ]
#> [ n_chunks ; function ]
#> [ n_unique ; function ]
#> [ name ; property function ]
#> [ nan_max ; function ]
#> [ nan_min ; function ]
#> [ neq ; function ]
#> [ neq_missing ; function ]
#> [ not ; function ]
#> [ null_count ; function ]
#> [ or ; function ]
#> [ pct_change ; function ]
#> [ peak_max ; function ]
#> [ peak_min ; function ]
#> [ pow ; function ]
#> [ print ; function ]
#> [ product ; function ]
#> [ qcut ; function ]
#> [ quantile ; function ]
#> [ rank ; function ]
#> [ rechunk ; function ]
#> [ reinterpret ; function ]
#> [ rename ; function ]
#> [ rep ; function ]
#> [ repeat_by ; function ]
#> [ replace ; function ]
#> [ replace_strict ; function ]
#> [ reshape ; function ]
#> [ reverse ; function ]
#> [ rle ; function ]
#> [ rle_id ; function ]
#> [ rolling_max ; function ]
#> [ rolling_max_by ; function ]
#> [ rolling_mean ; function ]
#> [ rolling_mean_by ; function ]
#> [ rolling_median ; function ]
#> [ rolling_median_by ; function ]
#> [ rolling_min ; function ]
#> [ rolling_min_by ; function ]
#> [ rolling_quantile ; function ]
#> [ rolling_quantile_by ; function ]
#> [ rolling_skew ; function ]
#> [ rolling_std ; function ]
#> [ rolling_std_by ; function ]
#> [ rolling_sum ; function ]
#> [ rolling_sum_by ; function ]
#> [ rolling_var ; function ]
#> [ rolling_var_by ; function ]
#> [ round ; function ]
#> [ sample ; function ]
#> [ search_sorted ; function ]
#> [ set_sorted ; function ]
#> [ shape ; property function ]
#> [ shift ; function ]
#> [ shrink_dtype ; function ]
#> [ shuffle ; function ]
#> [ sign ; function ]
#> [ sin ; function ]
#> [ sinh ; function ]
#> [ skew ; function ]
#> [ slice ; function ]
#> [ sort ; function ]
#> [ sort_by ; function ]
#> [ sqrt ; function ]
#> [ std ; function ]
#> [ str ; property function ]
#> [ struct ; property function ]
#> [ sub ; function ]
#> [ sum ; function ]
#> [ tail ; function ]
#> [ tan ; function ]
#> [ tanh ; function ]
#> [ to_frame ; function ]
#> [ to_list ; function ]
#> [ to_lit ; function ]
#> [ to_physical ; function ]
#> [ to_r ; function ]
#> [ to_vector ; function ]
#> [ top_k ; function ]
#> [ unique ; function ]
#> [ unique_counts ; function ]
#> [ upper_bound ; function ]
#> [ value_counts ; function ]
#> [ var ; function ]
#> [ xor ; function ]