Skip to content

Apply a rolling rank based on another column

Source code

Description

[Experimental]

Given a by column <t_0, t_1, …, t_n>, then closed = “right” (the default) means the windows will be:

  • (t_0 - window_size, t_0\]
  • (t_1 - window_size, t_1\]
  • (t_n - window_size, t_n\]

Usage

<Expr>$rolling_rank_by(
  by,
  window_size,
  method = c("average", "min", "max", "dense", "random"),
  ...,
  seed = NULL,
  min_samples = 1,
  closed = c("right", "both", "left", "none")
)

Arguments

by Should be DateTime, Date, UInt64, UInt32, Int64, or Int32 data type after conversion by as_polars_expr(). Note that the integer ones require using “i” in window_size. Accepts expression input. Strings are parsed as column names.
window_size The length of the window. Can be a dynamic temporal size indicated by a timedelta or the following string language:
  • 1ns (1 nanosecond)
  • 1us (1 microsecond)
  • 1ms (1 millisecond)
  • 1s (1 second)
  • 1m (1 minute)
  • 1h (1 hour)
  • 1d (1 calendar day)
  • 1w (1 calendar week)
  • 1mo (1 calendar month)
  • 1q (1 calendar quarter)
  • 1y (1 calendar year)
Or combine them: “3d12h4m25s” \# 3 days, 12 hours, 4 minutes, and 25 seconds By "calendar day", we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for "calendar week", "calendar month", "calendar quarter", and "calendar year".
method The method used to assign ranks to tied elements. Must be one of the following:
  • “average” (default): The average of the ranks that would have been assigned to all the tied values is assigned to each value.
  • “min”: The minimum of the ranks that would have been assigned to all the tied values is assigned to each value. (This is also referred to as "competition" ranking.)
  • “max”: The maximum of the ranks that would have been assigned to all the tied values is assigned to each value.
  • “dense”: Like “min”, but the rank of the next highest element is assigned the rank immediately after those assigned to the tied elements.
  • “random”: Choose a random rank for each value in a tie.
These dots are for future extensions and must be empty.
seed Random seed used when method = “random”. If NULL (default), a random seed is generated for each rolling rank operation.
min_samples The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.
closed Define which sides of the interval are closed (inclusive). Default is “right”.

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

A polars expression

Examples

library("polars")

df_temporal <- pl$select(
  index = 0:24,
  date = pl$datetime_range(
    as.POSIXct("2001-01-01"),
    as.POSIXct("2001-01-02"),
    "1h"
  )
)

# Compute the rolling rank with the temporal windows closed on the right
# (default)
df_temporal$with_columns(
  rolling_row_rank = pl$col("index")$rolling_rank_by(
    "date",
    window_size = "2h"
  )
)
#> shape: (25, 3)
#> ┌───────┬─────────────────────┬──────────────────┐
#> │ index ┆ date                ┆ rolling_row_rank │
#> │ ---   ┆ ---                 ┆ ---              │
#> │ i32   ┆ datetime[ms]        ┆ f64              │
#> ╞═══════╪═════════════════════╪══════════════════╡
#> │ 0     ┆ 2001-01-01 00:00:00 ┆ 1.0              │
#> │ 1     ┆ 2001-01-01 01:00:00 ┆ 2.0              │
#> │ 2     ┆ 2001-01-01 02:00:00 ┆ 2.0              │
#> │ 3     ┆ 2001-01-01 03:00:00 ┆ 2.0              │
#> │ 4     ┆ 2001-01-01 04:00:00 ┆ 2.0              │
#> │ …     ┆ …                   ┆ …                │
#> │ 20    ┆ 2001-01-01 20:00:00 ┆ 2.0              │
#> │ 21    ┆ 2001-01-01 21:00:00 ┆ 2.0              │
#> │ 22    ┆ 2001-01-01 22:00:00 ┆ 2.0              │
#> │ 23    ┆ 2001-01-01 23:00:00 ┆ 2.0              │
#> │ 24    ┆ 2001-01-02 00:00:00 ┆ 2.0              │
#> └───────┴─────────────────────┴──────────────────┘
# Compute the rolling rank with the closure of windows on both sides
df_temporal$with_columns(
  rolling_row_rank = pl$col("index")$rolling_rank_by(
    "date",
    window_size = "2h",
    closed = "both"
  )
)
#> shape: (25, 3)
#> ┌───────┬─────────────────────┬──────────────────┐
#> │ index ┆ date                ┆ rolling_row_rank │
#> │ ---   ┆ ---                 ┆ ---              │
#> │ i32   ┆ datetime[ms]        ┆ f64              │
#> ╞═══════╪═════════════════════╪══════════════════╡
#> │ 0     ┆ 2001-01-01 00:00:00 ┆ 1.0              │
#> │ 1     ┆ 2001-01-01 01:00:00 ┆ 2.0              │
#> │ 2     ┆ 2001-01-01 02:00:00 ┆ 3.0              │
#> │ 3     ┆ 2001-01-01 03:00:00 ┆ 3.0              │
#> │ 4     ┆ 2001-01-01 04:00:00 ┆ 3.0              │
#> │ …     ┆ …                   ┆ …                │
#> │ 20    ┆ 2001-01-01 20:00:00 ┆ 3.0              │
#> │ 21    ┆ 2001-01-01 21:00:00 ┆ 3.0              │
#> │ 22    ┆ 2001-01-01 22:00:00 ┆ 3.0              │
#> │ 23    ┆ 2001-01-01 23:00:00 ┆ 3.0              │
#> │ 24    ┆ 2001-01-02 00:00:00 ┆ 3.0              │
#> └───────┴─────────────────────┴──────────────────┘