polars.Series.cut#

Series.cut(
breaks: Sequence[float],
*,
labels: Sequence[str] | None = None,
break_point_label: str = 'break_point',
category_label: str = 'category',
left_closed: bool = False,
include_breaks: bool = False,
as_series: bool = True,
) Series | DataFrame[source]#

Bin continuous values into discrete categories.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Parameters:
breaks

List of unique cut points.

labels

Names of the categories. The number of labels must be equal to the number of cut points plus one.

break_point_label

Name of the breakpoint column. Only used if include_breaks is set to True.

Deprecated since version 0.19.0: This parameter will be removed. Use Series.struct.rename_fields to rename the field instead.

category_label

Name of the category column. Only used if include_breaks is set to True.

Deprecated since version 0.19.0: This parameter will be removed. Use Series.struct.rename_fields to rename the field instead.

left_closed

Set the intervals to be left-closed instead of right-closed.

include_breaks

Include a column with the right endpoint of the bin each observation falls in. This will change the data type of the output from a Categorical to a Struct.

as_series

If set to False, return a DataFrame containing the original values, the breakpoints, and the categories.

Deprecated since version 0.19.0: This parameter will be removed. The same behavior can be achieved by setting include_breaks=True, unnesting the resulting struct Series, and adding the result to the original Series.

Returns:
Series

Series of data type Categorical if include_breaks is set to False (default), otherwise a Series of data type Struct.

See also

qcut

Examples

Divide the column into three categories.

>>> s = pl.Series("foo", [-2, -1, 0, 1, 2])
>>> s.cut([-1, 1], labels=["a", "b", "c"])
shape: (5,)
Series: 'foo' [cat]
[
        "a"
        "a"
        "b"
        "b"
        "c"
]

Create a DataFrame with the breakpoint and category for each value.

>>> cut = s.cut([-1, 1], include_breaks=True).alias("cut")
>>> s.to_frame().with_columns(cut).unnest("cut")
shape: (5, 3)
┌─────┬─────────────┬────────────┐
│ foo ┆ break_point ┆ category   │
│ --- ┆ ---         ┆ ---        │
│ i64 ┆ f64         ┆ cat        │
╞═════╪═════════════╪════════════╡
│ -2  ┆ -1.0        ┆ (-inf, -1] │
│ -1  ┆ -1.0        ┆ (-inf, -1] │
│ 0   ┆ 1.0         ┆ (-1, 1]    │
│ 1   ┆ 1.0         ┆ (-1, 1]    │
│ 2   ┆ inf         ┆ (1, inf]   │
└─────┴─────────────┴────────────┘