Skip to content

Polars R Package

qcut

Bin continuous values into discrete categories based on their quantiles

Source code

Description

Bin continuous values into discrete categories based on their quantiles

Usage

<Expr>$qcut(
  quantiles,
  ...,
  labels = NULL,
  left_closed = FALSE,
  allow_duplicates = FALSE,
  include_breaks = FALSE
)

Arguments

`quantiles`	Either a vector of quantile probabilities between 0 and 1 or a positive integer determining the number of bins with uniform probability.
`…`	Ignored.
`labels`	Names of the categories. The number of labels must be equal to the number of cut points plus one.
`left_closed`	Set the intervals to be left-closed instead of right-closed.
`allow_duplicates`	If set to `TRUE`, duplicates in the resulting quantiles are dropped, rather than raising an error. This can happen even with unique probabilities, depending on the data.
`include_breaks`	Include a column with the right endpoint of the bin each observation falls in. This will change the data type of the output from a `Categorical` to a `Struct`.

Value

Expr of data type Categorical is include_breaks is FALSE and of data type Struct if include_breaks is TRUE.

See Also

$cut()

Examples

library("polars")

df = pl$DataFrame(foo = c(-2, -1, 0, 1, 2))

# Divide a column into three categories according to pre-defined quantile
# probabilities
df$with_columns(
  qcut = pl$col("foo")$qcut(c(0.25, 0.75), labels = c("a", "b", "c"))
)

#> shape: (5, 2)
#> ┌──────┬──────┐
#> │ foo  ┆ qcut │
#> │ ---  ┆ ---  │
#> │ f64  ┆ cat  │
#> ╞══════╪══════╡
#> │ -2.0 ┆ a    │
#> │ -1.0 ┆ a    │
#> │ 0.0  ┆ b    │
#> │ 1.0  ┆ b    │
#> │ 2.0  ┆ c    │
#> └──────┴──────┘

# Divide a column into two categories using uniform quantile probabilities.
df$with_columns(
  qcut = pl$col("foo")$qcut(2, labels = c("low", "high"), left_closed = TRUE)
)

#> shape: (5, 2)
#> ┌──────┬──────┐
#> │ foo  ┆ qcut │
#> │ ---  ┆ ---  │
#> │ f64  ┆ cat  │
#> ╞══════╪══════╡
#> │ -2.0 ┆ low  │
#> │ -1.0 ┆ low  │
#> │ 0.0  ┆ high │
#> │ 1.0  ┆ high │
#> │ 2.0  ┆ high │
#> └──────┴──────┘

# Add both the category and the breakpoint
df$with_columns(
  qcut = pl$col("foo")$qcut(c(0.25, 0.75), include_breaks = TRUE)
)$unnest("qcut")

#> shape: (5, 3)
#> ┌──────┬────────────┬────────────┐
#> │ foo  ┆ breakpoint ┆ category   │
#> │ ---  ┆ ---        ┆ ---        │
#> │ f64  ┆ f64        ┆ cat        │
#> ╞══════╪════════════╪════════════╡
#> │ -2.0 ┆ -1.0       ┆ (-inf, -1] │
#> │ -1.0 ┆ -1.0       ┆ (-inf, -1] │
#> │ 0.0  ┆ 1.0        ┆ (-1, 1]    │
#> │ 1.0  ┆ 1.0        ┆ (-1, 1]    │
#> │ 2.0  ┆ inf        ┆ (1, inf]   │
#> └──────┴────────────┴────────────┘