polars.Series.qcut#

Series.qcut(
quantiles: Sequence[float] | int,
*,
labels: Sequence[str] | None = None,
left_closed: bool = False,
allow_duplicates: bool = False,
include_breaks: bool = False,
break_point_label: str = 'break_point',
category_label: str = 'category',
as_series: Literal[True] = True,
) Series[source]#
Series.qcut(
quantiles: Sequence[float] | int,
*,
labels: Sequence[str] | None = None,
left_closed: bool = False,
allow_duplicates: bool = False,
include_breaks: bool = False,
break_point_label: str = 'break_point',
category_label: str = 'category',
as_series: Literal[False],
) DataFrame
Series.qcut(
quantiles: Sequence[float] | int,
*,
labels: Sequence[str] | None = None,
left_closed: bool = False,
allow_duplicates: bool = False,
include_breaks: bool = False,
break_point_label: str = 'break_point',
category_label: str = 'category',
as_series: bool,
) Series | DataFrame

Bin continuous values into discrete categories based on their quantiles.

Parameters:
quantiles

Either a list of quantile probabilities between 0 and 1 or a positive integer determining the number of bins with uniform probability.

labels

Names of the categories. The number of labels must be equal to the number of cut points plus one.

left_closed

Set the intervals to be left-closed instead of right-closed.

allow_duplicates

If set to `True`, duplicates in the resulting quantiles are dropped, rather than raising a `DuplicateError`. This can happen even with unique probabilities, depending on the data.

include_breaks

Include a column with the right endpoint of the bin each observation falls in. This will change the data type of the output from a `Categorical` to a `Struct`.

break_point_label

Name of the breakpoint column. Only used if `include_breaks` is set to `True`.

Deprecated since version 0.19.0: This parameter will be removed. Use `Series.struct.rename_fields` to rename the field instead.

category_label

Name of the category column. Only used if `include_breaks` is set to `True`.

Deprecated since version 0.19.0: This parameter will be removed. Use `Series.struct.rename_fields` to rename the field instead.

as_series

If set to `False`, return a DataFrame containing the original values, the breakpoints, and the categories.

Deprecated since version 0.19.0: This parameter will be removed. The same behavior can be achieved by setting `include_breaks=True`, unnesting the resulting struct Series, and adding the result to the original Series.

Returns:
Series

Series of data type `Categorical` if `include_breaks` is set to `False` (default), otherwise a Series of data type `Struct`.

Warning

This functionality is experimental and may change without it being considered a breaking change.

Examples

Divide a column into three categories according to pre-defined quantile probabilities.

```>>> s = pl.Series("foo", [-2, -1, 0, 1, 2])
>>> s.qcut([0.25, 0.75], labels=["a", "b", "c"])
shape: (5,)
Series: 'foo' [cat]
[
"a"
"a"
"b"
"b"
"c"
]
```

Divide a column into two categories using uniform quantile probabilities.

```>>> s.qcut(2, labels=["low", "high"], left_closed=True)
shape: (5,)
Series: 'foo' [cat]
[
"low"
"low"
"high"
"high"
"high"
]
```

Create a DataFrame with the breakpoint and category for each value.

```>>> cut = s.qcut([0.25, 0.75], include_breaks=True).alias("cut")
>>> s.to_frame().with_columns(cut).unnest("cut")
shape: (5, 3)
┌─────┬─────────────┬────────────┐
│ foo ┆ break_point ┆ category   │
│ --- ┆ ---         ┆ ---        │
│ i64 ┆ f64         ┆ cat        │
╞═════╪═════════════╪════════════╡
│ -2  ┆ -1.0        ┆ (-inf, -1] │
│ -1  ┆ -1.0        ┆ (-inf, -1] │
│ 0   ┆ 1.0         ┆ (-1, 1]    │
│ 1   ┆ 1.0         ┆ (-1, 1]    │
│ 2   ┆ inf         ┆ (1, inf]   │
└─────┴─────────────┴────────────┘
```