Expressions#

This page gives an overview of all public polars expressions.

class polars.Expr[source]

Expressions that can be used in various contexts.

Methods:

abs

Compute absolute values.

agg_groups

Get the group indexes of the group by operation.

alias

Rename the output of an expression.

all

Check if all boolean values in a Boolean column are True.

any

Check if any boolean value in a Boolean column is True.

append

Append expressions.

apply

Apply a custom/user-defined function (UDF) in a GroupBy or Projection context.

arccos

Compute the element-wise value for the inverse cosine.

arccosh

Compute the element-wise value for the inverse hyperbolic cosine.

arcsin

Compute the element-wise value for the inverse sine.

arcsinh

Compute the element-wise value for the inverse hyperbolic sine.

arctan

Compute the element-wise value for the inverse tangent.

arctanh

Compute the element-wise value for the inverse hyperbolic tangent.

arg_max

Get the index of the maximal value.

arg_min

Get the index of the minimal value.

arg_sort

Get the index values that would sort this column.

arg_unique

Get index of first unique value.

argsort

Get the index values that would sort this column.

backward_fill

Fill missing values with the next to be seen values.

cast

Cast between data types.

ceil

Rounds up to the nearest integer value.

clip

Clip (limit) the values in an array to a min and max boundary.

clip_max

Clip (limit) the values in an array to a max boundary.

clip_min

Clip (limit) the values in an array to a min boundary.

cos

Compute the element-wise value for the cosine.

cosh

Compute the element-wise value for the hyperbolic cosine.

count

Count the number of values in this expression.

cumcount

Get an array with the cumulative count computed at every element.

cummax

Get an array with the cumulative max computed at every element.

cummin

Get an array with the cumulative min computed at every element.

cumprod

Get an array with the cumulative product computed at every element.

cumsum

Get an array with the cumulative sum computed at every element.

cumulative_eval

Run an expression over a sliding window that increases 1 slot every iteration.

diff

Calculate the n-th discrete difference.

dot

Compute the dot/inner product between two Expressions.

drop_nans

Drop floating point NaN values.

drop_nulls

Drop null values.

entropy

Computes the entropy.

ewm_mean

Exponentially-weighted moving average.

ewm_std

Exponentially-weighted moving standard deviation.

ewm_var

Exponentially-weighted moving variance.

exclude

Exclude certain columns from a wildcard/regex selection.

exp

Compute the exponential, element-wise.

explode

Explode a list or utf8 Series.

extend_constant

Extend the Series with given number of values.

fill_nan

Fill floating point NaN value with a fill value.

fill_null

Fill null values using the specified value or strategy.

filter

Filter a single column.

first

Get the first value.

flatten

Alias for explode().

floor

Rounds down to the nearest integer value.

forward_fill

Fill missing values with the latest seen values.

hash

Hash the elements in the selection.

head

Get the first n rows.

inspect

Print the value that this expression evaluates to and pass on the value.

interpolate

Fill nulls with linear interpolation over missing values.

is_between

Check if this expression is between start and end.

is_duplicated

Get mask of duplicated values.

is_finite

Returns a boolean Series indicating which values are finite.

is_first

Get a mask of the first unique value.

is_in

Check if elements of this expression are present in the other Series.

is_infinite

Returns a boolean Series indicating which values are infinite.

is_nan

Returns a boolean Series indicating which values are NaN.

is_not

Negate a boolean expression.

is_not_nan

Returns a boolean Series indicating which values are not NaN.

is_not_null

Returns a boolean Series indicating which values are not null.

is_null

Returns a boolean Series indicating which values are null.

is_unique

Get mask of unique values.

keep_name

Keep the original root name of the expression.

kurtosis

Compute the kurtosis (Fisher or Pearson) of a dataset.

last

Get the last value.

len

Count the number of values in this expression.

limit

Get the first n rows.

list

Aggregate to list.

log

Compute the logarithm to a given base.

log10

Compute the base 10 logarithm of the input array, element-wise.

lower_bound

Calculate the lower bound.

map

Apply a custom python function to a Series or sequence of Series.

map_alias

Rename the output of an expression by mapping a function over the root name.

max

Get maximum value.

mean

Get mean value.

median

Get median value using linear interpolation.

min

Get minimum value.

mode

Compute the most occurring value(s).

n_unique

Count unique values.

nan_max

Get maximum value, but propagate/poison encountered NaN values.

nan_min

Get minimum value, but propagate/poison encountered NaN values.

null_count

Count null values.

over

Apply window function over a subgroup.

pct_change

Computes percentage change between values.

pow

Raise expression to the power of exponent.

prefix

Add a prefix to the root column name of the expression.

product

Compute the product of an expression.

quantile

Get quantile value.

rank

Assign ranks to data, dealing with ties appropriately.

rechunk

Create a single chunk of memory for this Series.

reinterpret

Reinterpret the underlying bits as a signed/unsigned integer.

repeat_by

Repeat the elements in this Series as specified in the given expression.

reshape

Reshape this Expr to a flat Series or a Series of Lists.

reverse

Reverse the selection.

rolling_apply

Apply a custom rolling window function.

rolling_max

Apply a rolling max (moving max) over the values in this array.

rolling_mean

Apply a rolling mean (moving mean) over the values in this array.

rolling_median

Compute a rolling median.

rolling_min

Apply a rolling min (moving min) over the values in this array.

rolling_quantile

Compute a rolling quantile.

rolling_skew

Compute a rolling skew.

rolling_std

Compute a rolling standard deviation.

rolling_sum

Apply a rolling sum (moving sum) over the values in this array.

rolling_var

Compute a rolling variance.

round

Round underlying floating point data by decimals digits.

sample

Sample from this expression.

search_sorted

Find indices where elements should be inserted to maintain order.

set_sorted

Flags the expression as 'sorted'.

shift

Shift the values by a given period.

shift_and_fill

Shift the values by a given period and fill the resulting null values.

shrink_dtype

Shrink numeric columns to the minimal required datatype.

shuffle

Shuffle the contents of this expr.

sign

Compute the element-wise indication of the sign.

sin

Compute the element-wise value for the sine.

sinh

Compute the element-wise value for the hyperbolic sine.

skew

Compute the sample skewness of a data set.

slice

Get a slice of this expression.

sort

Sort this column.

sort_by

Sort this column by the ordering of another column, or multiple other columns.

sqrt

Compute the square root of the elements.

std

Get standard deviation.

suffix

Add a suffix to the root column name of the expression.

sum

Get sum value.

tail

Get the last n rows.

take

Take values by index.

take_every

Take every nth value in the Series and return as a new Series.

tan

Compute the element-wise value for the tangent.

tanh

Compute the element-wise value for the hyperbolic tangent.

to_physical

Cast to physical representation of the logical dtype.

top_k

Return the k largest elements.

unique

Get unique values of this expression.

unique_counts

Return a count of the unique values in the order of appearance.

upper_bound

Calculate the upper bound.

value_counts

Count all unique values and create a struct mapping value to count.

var

Get variance.

where

Filter a single column.

abs() Expr[source]

Compute absolute values.

Same as abs(expr).

Examples

>>> df = pl.DataFrame(
...     {
...         "A": [-1.0, 0.0, 1.0, 2.0],
...     }
... )
>>> df.select(pl.col("A").abs())
shape: (4, 1)
┌─────┐
│ A   │
│ --- │
│ f64 │
╞═════╡
│ 1.0 │
├╌╌╌╌╌┤
│ 0.0 │
├╌╌╌╌╌┤
│ 1.0 │
├╌╌╌╌╌┤
│ 2.0 │
└─────┘
agg_groups() Expr[source]

Get the group indexes of the group by operation.

Should be used in aggregation context only.

Examples

>>> df = pl.DataFrame(
...     {
...         "group": [
...             "one",
...             "one",
...             "one",
...             "two",
...             "two",
...             "two",
...         ],
...         "value": [94, 95, 96, 97, 97, 99],
...     }
... )
>>> df.groupby("group", maintain_order=True).agg(pl.col("value").agg_groups())
shape: (2, 2)
┌───────┬───────────┐
│ group ┆ value     │
│ ---   ┆ ---       │
│ str   ┆ list[u32] │
╞═══════╪═══════════╡
│ one   ┆ [0, 1, 2] │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ two   ┆ [3, 4, 5] │
└───────┴───────────┘
alias(name: str) Expr[source]

Rename the output of an expression.

Parameters:
name

New name.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, 3],
...         "b": ["a", "b", None],
...     }
... )
>>> df
shape: (3, 2)
┌─────┬──────┐
│ a   ┆ b    │
│ --- ┆ ---  │
│ i64 ┆ str  │
╞═════╪══════╡
│ 1   ┆ a    │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2   ┆ b    │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3   ┆ null │
└─────┴──────┘
>>> df.select(
...     [
...         pl.col("a").alias("bar"),
...         pl.col("b").alias("foo"),
...     ]
... )
shape: (3, 2)
┌─────┬──────┐
│ bar ┆ foo  │
│ --- ┆ ---  │
│ i64 ┆ str  │
╞═════╪══════╡
│ 1   ┆ a    │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2   ┆ b    │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3   ┆ null │
└─────┴──────┘
all() Expr[source]

Check if all boolean values in a Boolean column are True.

This method is an expression - not to be confused with polars.all() which is a function to select all columns.

Returns:
Boolean literal

Examples

>>> df = pl.DataFrame(
...     {"TT": [True, True], "TF": [True, False], "FF": [False, False]}
... )
>>> df.select(pl.col("*").all())
shape: (1, 3)
┌──────┬───────┬───────┐
│ TT   ┆ TF    ┆ FF    │
│ ---  ┆ ---   ┆ ---   │
│ bool ┆ bool  ┆ bool  │
╞══════╪═══════╪═══════╡
│ true ┆ false ┆ false │
└──────┴───────┴───────┘
any() Expr[source]

Check if any boolean value in a Boolean column is True.

Returns:
Boolean literal

Examples

>>> df = pl.DataFrame({"TF": [True, False], "FF": [False, False]})
>>> df.select(pl.all().any())
shape: (1, 2)
┌──────┬───────┐
│ TF   ┆ FF    │
│ ---  ┆ ---   │
│ bool ┆ bool  │
╞══════╪═══════╡
│ true ┆ false │
└──────┴───────┘
append(other: Expr, upcast: bool = True) Expr[source]

Append expressions.

This is done by adding the chunks of other to this Series.

Parameters:
other

Expression to append.

upcast

Cast both Series to the same supertype.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [8, 9, 10],
...         "b": [None, 4, 4],
...     }
... )
>>> df.select(pl.all().head(1).append(pl.all().tail(1)))
shape: (2, 2)
┌─────┬──────┐
│ a   ┆ b    │
│ --- ┆ ---  │
│ i64 ┆ i64  │
╞═════╪══════╡
│ 8   ┆ null │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 10  ┆ 4    │
└─────┴──────┘
apply(f: Union[Callable[[Series], Series], Callable[[Any], Any]], return_dtype: Optional[Union[Type[DataType], DataType]] = None) Expr[source]

Apply a custom/user-defined function (UDF) in a GroupBy or Projection context.

Depending on the context it has the following behavior:

  • Selection

    Expects f to be of type Callable[[Any], Any]. Applies a python function over each individual value in the column.

  • GroupBy

    Expects f to be of type Callable[[Series], Series]. Applies a python function over each group.

Implementing logic using a Python function is almost always _significantly_ slower and more memory intensive than implementing the same logic using the native expression API because:

  • The native expression engine runs in Rust; UDFs run in Python.

  • Use of Python UDFs forces the DataFrame to be materialized in memory.

  • Polars-native expressions can be parallelised (UDFs cannot).

  • Polars-native expressions can be logically optimised (UDFs cannot).

Wherever possible you should strongly prefer the native expression API to achieve the best performance.

Parameters:
f

Lambda/ function to apply.

return_dtype

Dtype of the output Series. If not set, polars will assume that the dtype remains unchanged.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, 3, 1],
...         "b": ["a", "b", "c", "c"],
...     }
... )

In a selection context, the function is applied by row.

>>> (
...     df.with_column(
...         pl.col("a").apply(lambda x: x * 2).alias("a_times_2"),
...     )
... )
shape: (4, 3)
┌─────┬─────┬───────────┐
│ a   ┆ b   ┆ a_times_2 │
│ --- ┆ --- ┆ ---       │
│ i64 ┆ str ┆ i64       │
╞═════╪═════╪═══════════╡
│ 1   ┆ a   ┆ 2         │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2   ┆ b   ┆ 4         │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 3   ┆ c   ┆ 6         │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 1   ┆ c   ┆ 2         │
└─────┴─────┴───────────┘

It is better to implement this with an expression:

>>> (
...     df.with_column(
...         (pl.col("a") * 2).alias("a_times_2"),
...     )
... )  

In a GroupBy context the function is applied by group:

>>> (
...     df.lazy()
...     .groupby("b", maintain_order=True)
...     .agg(
...         [
...             pl.col("a").apply(lambda x: x.sum()),
...         ]
...     )
...     .collect()
... )
shape: (3, 2)
┌─────┬─────┐
│ b   ┆ a   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 1   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ b   ┆ 2   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ c   ┆ 4   │
└─────┴─────┘

It is better to implement this with an expression:

>>> (
...     df.groupby("b", maintain_order=True).agg(
...         pl.col("a").sum(),
...     )
... )  
arccos() Expr[source]

Compute the element-wise value for the inverse cosine.

Returns:
Series of dtype Float64

Examples

>>> df = pl.DataFrame({"a": [0.0]})
>>> df.select(pl.col("a").arccos())
shape: (1, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 1.570796 │
└──────────┘
arccosh() Expr[source]

Compute the element-wise value for the inverse hyperbolic cosine.

Returns:
Series of dtype Float64

Examples

>>> df = pl.DataFrame({"a": [1.0]})
>>> df.select(pl.col("a").arccosh())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 0.0 │
└─────┘
arcsin() Expr[source]

Compute the element-wise value for the inverse sine.

Returns:
Series of dtype Float64

Examples

>>> df = pl.DataFrame({"a": [1.0]})
>>> df.select(pl.col("a").arcsin())
shape: (1, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 1.570796 │
└──────────┘
arcsinh() Expr[source]

Compute the element-wise value for the inverse hyperbolic sine.

Returns:
Series of dtype Float64

Examples

>>> df = pl.DataFrame({"a": [1.0]})
>>> df.select(pl.col("a").arcsinh())
shape: (1, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 0.881374 │
└──────────┘
arctan() Expr[source]

Compute the element-wise value for the inverse tangent.

Returns:
Series of dtype Float64

Examples

>>> df = pl.DataFrame({"a": [1.0]})
>>> df.select(pl.col("a").arctan())
shape: (1, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 0.785398 │
└──────────┘
arctanh() Expr[source]

Compute the element-wise value for the inverse hyperbolic tangent.

Returns:
Series of dtype Float64

Examples

>>> df = pl.DataFrame({"a": [1.0]})
>>> df.select(pl.col("a").arctanh())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ inf │
└─────┘
arg_max() Expr[source]

Get the index of the maximal value.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [20, 10, 30],
...     }
... )
>>> df.select(pl.col("a").arg_max())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ u32 │
╞═════╡
│ 2   │
└─────┘
arg_min() Expr[source]

Get the index of the minimal value.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [20, 10, 30],
...     }
... )
>>> df.select(pl.col("a").arg_min())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ u32 │
╞═════╡
│ 1   │
└─────┘
arg_sort(reverse: bool = False, nulls_last: bool = False) Expr[source]

Get the index values that would sort this column.

Parameters:
reverse

Sort in reverse (descending) order.

nulls_last

Place null values last instead of first.

Returns:
Expr

Series of dtype UInt32.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [20, 10, 30],
...     }
... )
>>> df.select(pl.col("a").arg_sort())
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ u32 │
╞═════╡
│ 1   │
├╌╌╌╌╌┤
│ 0   │
├╌╌╌╌╌┤
│ 2   │
└─────┘
arg_unique() Expr[source]

Get index of first unique value.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [8, 9, 10],
...         "b": [None, 4, 4],
...     }
... )
>>> df.select(pl.col("a").arg_unique())
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ u32 │
╞═════╡
│ 0   │
├╌╌╌╌╌┤
│ 1   │
├╌╌╌╌╌┤
│ 2   │
└─────┘
>>> df.select(pl.col("b").arg_unique())
shape: (2, 1)
┌─────┐
│ b   │
│ --- │
│ u32 │
╞═════╡
│ 0   │
├╌╌╌╌╌┤
│ 1   │
└─────┘
argsort(reverse: bool = False, nulls_last: bool = False) Expr[source]

Get the index values that would sort this column.

Alias for Expr.arg_sort().

Parameters:
reverse

Sort in reverse (descending) order.

nulls_last

Place null values last instead of first.

Returns:
Expr

Series of dtype UInt32.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [20, 10, 30],
...     }
... )
>>> df.select(pl.col("a").argsort())
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ u32 │
╞═════╡
│ 1   │
├╌╌╌╌╌┤
│ 0   │
├╌╌╌╌╌┤
│ 2   │
└─────┘
backward_fill(limit: int | None = None) Expr[source]

Fill missing values with the next to be seen values.

Parameters:
limit

The number of consecutive null values to backward fill.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, None],
...         "b": [4, None, 6],
...     }
... )
>>> df.select(pl.all().backward_fill())
shape: (3, 2)
┌──────┬─────┐
│ a    ┆ b   │
│ ---  ┆ --- │
│ i64  ┆ i64 │
╞══════╪═════╡
│ 1    ┆ 4   │
├╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 2    ┆ 6   │
├╌╌╌╌╌╌┼╌╌╌╌╌┤
│ null ┆ 6   │
└──────┴─────┘
cast(dtype: Union[Type[DataType], DataType, type[Any]], strict: bool = True) Expr[source]

Cast between data types.

Parameters:
dtype

DataType to cast to.

strict

Throw an error if a cast could not be done. For instance, due to an overflow.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, 3],
...         "b": ["4", "5", "6"],
...     }
... )
>>> df.with_columns(
...     [
...         pl.col("a").cast(pl.Float64),
...         pl.col("b").cast(pl.Int32),
...     ]
... )
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ f64 ┆ i32 │
╞═════╪═════╡
│ 1.0 ┆ 4   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2.0 ┆ 5   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 3.0 ┆ 6   │
└─────┴─────┘
ceil() Expr[source]

Rounds up to the nearest integer value.

Only works on floating point Series.

Examples

>>> df = pl.DataFrame({"a": [0.3, 0.5, 1.0, 1.1]})
>>> df.select(pl.col("a").ceil())
shape: (4, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.0 │
├╌╌╌╌╌┤
│ 1.0 │
├╌╌╌╌╌┤
│ 1.0 │
├╌╌╌╌╌┤
│ 2.0 │
└─────┘
clip(min_val: int | float, max_val: int | float) Expr[source]

Clip (limit) the values in an array to a min and max boundary.

Only works for numerical types.

If you want to clip other dtypes, consider writing a “when, then, otherwise” expression. See when() for more information.

Parameters:
min_val

Minimum value.

max_val

Maximum value.

Examples

>>> df = pl.DataFrame({"foo": [-50, 5, None, 50]})
>>> df.with_column(pl.col("foo").clip(1, 10).alias("foo_clipped"))
shape: (4, 2)
┌──────┬─────────────┐
│ foo  ┆ foo_clipped │
│ ---  ┆ ---         │
│ i64  ┆ i64         │
╞══════╪═════════════╡
│ -50  ┆ 1           │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 5    ┆ 5           │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ null ┆ null        │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 50   ┆ 10          │
└──────┴─────────────┘
clip_max(max_val: int | float) Expr[source]

Clip (limit) the values in an array to a max boundary.

Only works for numerical types.

If you want to clip other dtypes, consider writing a “when, then, otherwise” expression. See when() for more information.

Parameters:
max_val

Maximum value.

Examples

>>> df = pl.DataFrame({"foo": [-50, 5, None, 50]})
>>> df.with_column(pl.col("foo").clip_max(0).alias("foo_clipped"))
shape: (4, 2)
┌──────┬─────────────┐
│ foo  ┆ foo_clipped │
│ ---  ┆ ---         │
│ i64  ┆ i64         │
╞══════╪═════════════╡
│ -50  ┆ -50         │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 5    ┆ 0           │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ null ┆ null        │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 50   ┆ 0           │
└──────┴─────────────┘
clip_min(min_val: int | float) Expr[source]

Clip (limit) the values in an array to a min boundary.

Only works for numerical types.

If you want to clip other dtypes, consider writing a “when, then, otherwise” expression. See when() for more information.

Parameters:
min_val

Minimum value.

Examples

>>> df = pl.DataFrame({"foo": [-50, 5, None, 50]})
>>> df.with_column(pl.col("foo").clip_min(0).alias("foo_clipped"))
shape: (4, 2)
┌──────┬─────────────┐
│ foo  ┆ foo_clipped │
│ ---  ┆ ---         │
│ i64  ┆ i64         │
╞══════╪═════════════╡
│ -50  ┆ 0           │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 5    ┆ 5           │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ null ┆ null        │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 50   ┆ 50          │
└──────┴─────────────┘
cos() Expr[source]

Compute the element-wise value for the cosine.

Returns:
Series of dtype Float64

Examples

>>> df = pl.DataFrame({"a": [0.0]})
>>> df.select(pl.col("a").cos())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.0 │
└─────┘
cosh() Expr[source]

Compute the element-wise value for the hyperbolic cosine.

Returns:
Series of dtype Float64

Examples

>>> df = pl.DataFrame({"a": [1.0]})
>>> df.select(pl.col("a").cosh())
shape: (1, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 1.543081 │
└──────────┘
count() Expr[source]

Count the number of values in this expression.

Examples

>>> df = pl.DataFrame({"a": [8, 9, 10], "b": [None, 4, 4]})
>>> df.select(pl.all().count())  # counts nulls
shape: (1, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ u32 ┆ u32 │
╞═════╪═════╡
│ 3   ┆ 3   │
└─────┴─────┘
cumcount(reverse: bool = False) Expr[source]

Get an array with the cumulative count computed at every element.

Counting from 0 to len

Parameters:
reverse

Reverse the operation.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3, 4]})
>>> df.select(
...     [
...         pl.col("a").cumcount(),
...         pl.col("a").cumcount(reverse=True).alias("a_reverse"),
...     ]
... )
shape: (4, 2)
┌─────┬───────────┐
│ a   ┆ a_reverse │
│ --- ┆ ---       │
│ u32 ┆ u32       │
╞═════╪═══════════╡
│ 0   ┆ 3         │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 1   ┆ 2         │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2   ┆ 1         │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 3   ┆ 0         │
└─────┴───────────┘
cummax(reverse: bool = False) Expr[source]

Get an array with the cumulative max computed at every element.

Parameters:
reverse

Reverse the operation.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3, 4]})
>>> df.select(
...     [
...         pl.col("a").cummax(),
...         pl.col("a").cummax(reverse=True).alias("a_reverse"),
...     ]
... )
shape: (4, 2)
┌─────┬───────────┐
│ a   ┆ a_reverse │
│ --- ┆ ---       │
│ i64 ┆ i64       │
╞═════╪═══════════╡
│ 1   ┆ 4         │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2   ┆ 4         │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 3   ┆ 4         │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 4   ┆ 4         │
└─────┴───────────┘
cummin(reverse: bool = False) Expr[source]

Get an array with the cumulative min computed at every element.

Parameters:
reverse

Reverse the operation.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3, 4]})
>>> df.select(
...     [
...         pl.col("a").cummin(),
...         pl.col("a").cummin(reverse=True).alias("a_reverse"),
...     ]
... )
shape: (4, 2)
┌─────┬───────────┐
│ a   ┆ a_reverse │
│ --- ┆ ---       │
│ i64 ┆ i64       │
╞═════╪═══════════╡
│ 1   ┆ 1         │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 1   ┆ 2         │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 1   ┆ 3         │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 1   ┆ 4         │
└─────┴───────────┘
cumprod(reverse: bool = False) Expr[source]

Get an array with the cumulative product computed at every element.

Parameters:
reverse

Reverse the operation.

Notes

Dtypes in {Int8, UInt8, Int16, UInt16} are cast to Int64 before summing to prevent overflow issues.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3, 4]})
>>> df.select(
...     [
...         pl.col("a").cumprod(),
...         pl.col("a").cumprod(reverse=True).alias("a_reverse"),
...     ]
... )
shape: (4, 2)
┌─────┬───────────┐
│ a   ┆ a_reverse │
│ --- ┆ ---       │
│ i64 ┆ i64       │
╞═════╪═══════════╡
│ 1   ┆ 24        │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2   ┆ 24        │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 6   ┆ 12        │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 24  ┆ 4         │
└─────┴───────────┘
cumsum(reverse: bool = False) Expr[source]

Get an array with the cumulative sum computed at every element.

Parameters:
reverse

Reverse the operation.

Notes

Dtypes in {Int8, UInt8, Int16, UInt16} are cast to Int64 before summing to prevent overflow issues.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3, 4]})
>>> df.select(
...     [
...         pl.col("a").cumsum(),
...         pl.col("a").cumsum(reverse=True).alias("a_reverse"),
...     ]
... )
shape: (4, 2)
┌─────┬───────────┐
│ a   ┆ a_reverse │
│ --- ┆ ---       │
│ i64 ┆ i64       │
╞═════╪═══════════╡
│ 1   ┆ 10        │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 3   ┆ 9         │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 6   ┆ 7         │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 10  ┆ 4         │
└─────┴───────────┘
cumulative_eval(expr: Expr, min_periods: int = 1, parallel: bool = False) Expr[source]

Run an expression over a sliding window that increases 1 slot every iteration.

Parameters:
expr

Expression to evaluate

min_periods

Number of valid values there should be in the window before the expression is evaluated. valid values = length - null_count

parallel

Run in parallel. Don’t do this in a groupby or another operation that already has much parallelization.

Warning

This functionality is experimental and may change without it being considered a breaking change.

This can be really slow as it can have O(n^2) complexity. Don’t use this for operations that visit all elements.

Examples

>>> df = pl.DataFrame({"values": [1, 2, 3, 4, 5]})
>>> df.select(
...     [
...         pl.col("values").cumulative_eval(
...             pl.element().first() - pl.element().last() ** 2
...         )
...     ]
... )
shape: (5, 1)
┌────────┐
│ values │
│ ---    │
│ f64    │
╞════════╡
│ 0.0    │
├╌╌╌╌╌╌╌╌┤
│ -3.0   │
├╌╌╌╌╌╌╌╌┤
│ -8.0   │
├╌╌╌╌╌╌╌╌┤
│ -15.0  │
├╌╌╌╌╌╌╌╌┤
│ -24.0  │
└────────┘
diff(n: int = 1, null_behavior: NullBehavior = 'ignore') Expr[source]

Calculate the n-th discrete difference.

Parameters:
n

Number of slots to shift.

null_behavior{‘ignore’, ‘drop’}

How to handle null values.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [20, 10, 30],
...     }
... )
>>> df.select(pl.col("a").diff())
shape: (3, 1)
┌──────┐
│ a    │
│ ---  │
│ i64  │
╞══════╡
│ null │
├╌╌╌╌╌╌┤
│ -10  │
├╌╌╌╌╌╌┤
│ 20   │
└──────┘
dot(other: Expr | str) Expr[source]

Compute the dot/inner product between two Expressions.

Parameters:
other

Expression to compute dot product with.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 3, 5],
...         "b": [2, 4, 6],
...     }
... )
>>> df.select(pl.col("a").dot(pl.col("b")))
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 44  │
└─────┘
drop_nans() Expr[source]

Drop floating point NaN values.

Warning

Note that NaN values are not null values! To drop null values, use drop_nulls().

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [8, 9, 10, 11],
...         "b": [None, 4.0, 4.0, float("nan")],
...     }
... )
>>> df.select(pl.col("b").drop_nans())
shape: (3, 1)
┌──────┐
│ b    │
│ ---  │
│ f64  │
╞══════╡
│ null │
├╌╌╌╌╌╌┤
│ 4.0  │
├╌╌╌╌╌╌┤
│ 4.0  │
└──────┘
drop_nulls() Expr[source]

Drop null values.

Warning

Note that null values are not floating point NaN values! To drop NaN values, use drop_nans().

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [8, 9, 10, 11],
...         "b": [None, 4.0, 4.0, float("nan")],
...     }
... )
>>> df.select(pl.col("b").drop_nulls())
shape: (3, 1)
┌─────┐
│ b   │
│ --- │
│ f64 │
╞═════╡
│ 4.0 │
├╌╌╌╌╌┤
│ 4.0 │
├╌╌╌╌╌┤
│ NaN │
└─────┘
entropy(base: float = 2.718281828459045, normalize: bool = True) Expr[source]

Computes the entropy.

Uses the formula -sum(pk * log(pk) where pk are discrete probabilities.

Parameters:
base

Given base, defaults to e

normalize

Normalize pk if it doesn’t sum to 1.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3]})
>>> df.select(pl.col("a").entropy(base=2))
shape: (1, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 1.459148 │
└──────────┘
>>> df.select(pl.col("a").entropy(base=2, normalize=False))
shape: (1, 1)
┌───────────┐
│ a         │
│ ---       │
│ f64       │
╞═══════════╡
│ -6.754888 │
└───────────┘
ewm_mean(com: float | None = None, span: float | None = None, half_life: float | None = None, alpha: float | None = None, adjust: bool = True, min_periods: int = 1) Expr[source]

Exponentially-weighted moving average.

Parameters:
com

Specify decay in terms of center of mass, \(\gamma\), with

\[\alpha = \frac{1}{1 + \gamma} \; \forall \; \gamma \geq 0\]
span

Specify decay in terms of span, \(\theta\), with

\[\alpha = \frac{2}{\theta + 1} \; \forall \; \theta \geq 1\]
half_life

Specify decay in terms of half-life, \(\lambda\), with

\[\alpha = 1 - \exp \left\{ \frac{ -\ln(2) }{ \lambda } \right\} \; \forall \; \lambda > 0\]
alpha

Specify smoothing factor alpha directly, \(0 < \alpha \leq 1\).

adjust

Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings

  • When adjust=True the EW function is calculated using weights \(w_i = (1 - \alpha)^i\)

  • When adjust=False the EW function is calculated recursively by

    \[\begin{split}y_0 &= x_0 \\ y_t &= (1 - \alpha)y_{t - 1} + \alpha x_t\end{split}\]
min_periods

Minimum number of observations in window required to have a value (otherwise result is null).

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3]})
>>> df.select(pl.col("a").ewm_mean(com=1))
shape: (3, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 1.0      │
├╌╌╌╌╌╌╌╌╌╌┤
│ 1.666667 │
├╌╌╌╌╌╌╌╌╌╌┤
│ 2.428571 │
└──────────┘
ewm_std(com: float | None = None, span: float | None = None, half_life: float | None = None, alpha: float | None = None, adjust: bool = True, bias: bool = False, min_periods: int = 1) Expr[source]

Exponentially-weighted moving standard deviation.

Parameters:
com

Specify decay in terms of center of mass, \(\gamma\), with

\[\alpha = \frac{1}{1 + \gamma} \; \forall \; \gamma \geq 0\]
span

Specify decay in terms of span, \(\theta\), with

\[\alpha = \frac{2}{\theta + 1} \; \forall \; \theta \geq 1\]
half_life

Specify decay in terms of half-life, \(\lambda\), with

\[\alpha = 1 - \exp \left\{ \frac{ -\ln(2) }{ \lambda } \right\} \; \forall \; \lambda > 0\]
alpha

Specify smoothing factor alpha directly, \(0 < \alpha \leq 1\).

adjust

Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings

  • When adjust=True the EW function is calculated using weights \(w_i = (1 - \alpha)^i\)

  • When adjust=False the EW function is calculated recursively by

    \[\begin{split}y_0 &= x_0 \\ y_t &= (1 - \alpha)y_{t - 1} + \alpha x_t\end{split}\]
bias

When bias=False, apply a correction to make the estimate statistically unbiased.

min_periods

Minimum number of observations in window required to have a value (otherwise result is null).

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3]})
>>> df.select(pl.col("a").ewm_std(com=1))
shape: (3, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 0.0      │
├╌╌╌╌╌╌╌╌╌╌┤
│ 0.707107 │
├╌╌╌╌╌╌╌╌╌╌┤
│ 0.963624 │
└──────────┘
ewm_var(com: float | None = None, span: float | None = None, half_life: float | None = None, alpha: float | None = None, adjust: bool = True, bias: bool = False, min_periods: int = 1) Expr[source]

Exponentially-weighted moving variance.

Parameters:
com

Specify decay in terms of center of mass, \(\gamma\), with

\[\alpha = \frac{1}{1 + \gamma} \; \forall \; \gamma \geq 0\]
span

Specify decay in terms of span, \(\theta\), with

\[\alpha = \frac{2}{\theta + 1} \; \forall \; \theta \geq 1\]
half_life

Specify decay in terms of half-life, \(\lambda\), with

\[\alpha = 1 - \exp \left\{ \frac{ -\ln(2) }{ \lambda } \right\} \; \forall \; \lambda > 0\]
alpha

Specify smoothing factor alpha directly, \(0 < \alpha \leq 1\).

adjust

Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings

  • When adjust=True the EW function is calculated using weights \(w_i = (1 - \alpha)^i\)

  • When adjust=False the EW function is calculated recursively by

    \[\begin{split}y_0 &= x_0 \\ y_t &= (1 - \alpha)y_{t - 1} + \alpha x_t\end{split}\]
bias

When bias=False, apply a correction to make the estimate statistically unbiased.

min_periods

Minimum number of observations in window required to have a value (otherwise result is null).

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3]})
>>> df.select(pl.col("a").ewm_var(com=1))
shape: (3, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 0.0      │
├╌╌╌╌╌╌╌╌╌╌┤
│ 0.5      │
├╌╌╌╌╌╌╌╌╌╌┤
│ 0.928571 │
└──────────┘
exclude(columns: Union[str, Sequence[str], DataType, type[DataType], Sequence[DataType | type[DataType]]]) Expr[source]

Exclude certain columns from a wildcard/regex selection.

You may also use regexes in the exclude list. They must start with ^ and end with $.

Parameters:
columns

Column(s) to exclude from selection. This can be:

  • a column name, or multiple column names

  • a regular expression starting with ^ and ending with $

  • a dtype or multiple dtypes

Examples

>>> df = pl.DataFrame(
...     {
...         "aa": [1, 2, 3],
...         "ba": ["a", "b", None],
...         "cc": [None, 2.5, 1.5],
...     }
... )
>>> df
shape: (3, 3)
┌─────┬──────┬──────┐
│ aa  ┆ ba   ┆ cc   │
│ --- ┆ ---  ┆ ---  │
│ i64 ┆ str  ┆ f64  │
╞═════╪══════╪══════╡
│ 1   ┆ a    ┆ null │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2   ┆ b    ┆ 2.5  │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3   ┆ null ┆ 1.5  │
└─────┴──────┴──────┘

Exclude by column name(s):

>>> df.select(pl.all().exclude("ba"))
shape: (3, 2)
┌─────┬──────┐
│ aa  ┆ cc   │
│ --- ┆ ---  │
│ i64 ┆ f64  │
╞═════╪══════╡
│ 1   ┆ null │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2   ┆ 2.5  │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3   ┆ 1.5  │
└─────┴──────┘

Exclude by regex, e.g. removing all columns whose names end with the letter “a”:

>>> df.select(pl.all().exclude("^.*a$"))
shape: (3, 1)
┌──────┐
│ cc   │
│ ---  │
│ f64  │
╞══════╡
│ null │
├╌╌╌╌╌╌┤
│ 2.5  │
├╌╌╌╌╌╌┤
│ 1.5  │
└──────┘

Exclude by dtype(s), e.g. removing all columns of type Int64 or Float64:

>>> df.select(pl.all().exclude([pl.Int64, pl.Float64]))
shape: (3, 1)
┌──────┐
│ ba   │
│ ---  │
│ str  │
╞══════╡
│ a    │
├╌╌╌╌╌╌┤
│ b    │
├╌╌╌╌╌╌┤
│ null │
└──────┘
exp() Expr[source]

Compute the exponential, element-wise.

Examples

>>> df = pl.DataFrame({"values": [1.0, 2.0, 4.0]})
>>> df.select(pl.col("values").exp())
shape: (3, 1)
┌──────────┐
│ values   │
│ ---      │
│ f64      │
╞══════════╡
│ 2.718282 │
├╌╌╌╌╌╌╌╌╌╌┤
│ 7.389056 │
├╌╌╌╌╌╌╌╌╌╌┤
│ 54.59815 │
└──────────┘
explode() Expr[source]

Explode a list or utf8 Series.

This means that every item is expanded to a new row.

Returns:
Exploded Series of same dtype

Examples

>>> df = pl.DataFrame({"b": [[1, 2, 3], [4, 5, 6]]})
>>> df.select(pl.col("b").explode())
shape: (6, 1)
┌─────┐
│ b   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
├╌╌╌╌╌┤
│ 2   │
├╌╌╌╌╌┤
│ 3   │
├╌╌╌╌╌┤
│ 4   │
├╌╌╌╌╌┤
│ 5   │
├╌╌╌╌╌┤
│ 6   │
└─────┘
extend_constant(value: int | float | str | bool | None, n: int) Expr[source]

Extend the Series with given number of values.

Parameters:
value

The value to extend the Series with. This value may be None to fill with nulls.

n

The number of values to extend.

Examples

>>> df = pl.DataFrame({"values": [1, 2, 3]})
>>> df.select(pl.col("values").extend_constant(99, n=2))
shape: (5, 1)
┌────────┐
│ values │
│ ---    │
│ i64    │
╞════════╡
│ 1      │
├╌╌╌╌╌╌╌╌┤
│ 2      │
├╌╌╌╌╌╌╌╌┤
│ 3      │
├╌╌╌╌╌╌╌╌┤
│ 99     │
├╌╌╌╌╌╌╌╌┤
│ 99     │
└────────┘
fill_nan(fill_value: int | float | Expr | None) Expr[source]

Fill floating point NaN value with a fill value.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1.0, None, float("nan")],
...         "b": [4.0, float("nan"), 6],
...     }
... )
>>> df.fill_nan("zero")
shape: (3, 2)
┌──────┬──────┐
│ a    ┆ b    │
│ ---  ┆ ---  │
│ str  ┆ str  │
╞══════╪══════╡
│ 1.0  ┆ 4.0  │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ null ┆ zero │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ zero ┆ 6.0  │
└──────┴──────┘
fill_null(value: Any | None = None, strategy: FillNullStrategy | None = None, limit: int | None = None) Expr[source]

Fill null values using the specified value or strategy.

To interpolate over null values see interpolate.

Parameters:
value

Value used to fill null values.

strategy{None, ‘forward’, ‘backward’, ‘min’, ‘max’, ‘mean’, ‘zero’, ‘one’}

Strategy used to fill null values.

limit

Number of consecutive null values to fill when using the ‘forward’ or ‘backward’ strategy.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, None],
...         "b": [4, None, 6],
...     }
... )
>>> df.fill_null(strategy="zero")
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ 0   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 0   ┆ 6   │
└─────┴─────┘
>>> df.fill_null(99)
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ 99  │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 99  ┆ 6   │
└─────┴─────┘
>>> df.fill_null(strategy="forward")
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ 4   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ 6   │
└─────┴─────┘
filter(predicate: Expr) Expr[source]

Filter a single column.

Mostly useful in an aggregation context. If you want to filter on a DataFrame level, use LazyFrame.filter.

Parameters:
predicate

Boolean expression.

Examples

>>> df = pl.DataFrame(
...     {
...         "group_col": ["g1", "g1", "g2"],
...         "b": [1, 2, 3],
...     }
... )
>>> (
...     df.groupby("group_col").agg(
...         [
...             pl.col("b").filter(pl.col("b") < 2).sum().alias("lt"),
...             pl.col("b").filter(pl.col("b") >= 2).sum().alias("gte"),
...         ]
...     )
... ).sort("group_col")
shape: (2, 3)
┌───────────┬──────┬─────┐
│ group_col ┆ lt   ┆ gte │
│ ---       ┆ ---  ┆ --- │
│ str       ┆ i64  ┆ i64 │
╞═══════════╪══════╪═════╡
│ g1        ┆ 1    ┆ 2   │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┤
│ g2        ┆ null ┆ 3   │
└───────────┴──────┴─────┘
first() Expr[source]

Get the first value.

Examples

>>> df = pl.DataFrame({"a": [1, 1, 2]})
>>> df.select(pl.col("a").first())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
└─────┘
flatten() Expr[source]

Alias for explode().

Explode a list or utf8 Series. This means that every item is expanded to a new row.

Returns:
Exploded Series of same dtype

Examples

The following example turns each character into a separate row:

>>> df = pl.DataFrame({"foo": ["hello", "world"]})
>>> (df.select(pl.col("foo").flatten()))
shape: (10, 1)
┌─────┐
│ foo │
│ --- │
│ str │
╞═════╡
│ h   │
├╌╌╌╌╌┤
│ e   │
├╌╌╌╌╌┤
│ l   │
├╌╌╌╌╌┤
│ l   │
├╌╌╌╌╌┤
│ ... │
├╌╌╌╌╌┤
│ o   │
├╌╌╌╌╌┤
│ r   │
├╌╌╌╌╌┤
│ l   │
├╌╌╌╌╌┤
│ d   │
└─────┘

This example turns each word into a separate row:

>>> df = pl.DataFrame({"foo": ["hello world"]})
>>> (df.select(pl.col("foo").str.split(by=" ").flatten()))
shape: (2, 1)
┌───────┐
│ foo   │
│ ---   │
│ str   │
╞═══════╡
│ hello │
├╌╌╌╌╌╌╌┤
│ world │
└───────┘
floor() Expr[source]

Rounds down to the nearest integer value.

Only works on floating point Series.

Examples

>>> df = pl.DataFrame({"a": [0.3, 0.5, 1.0, 1.1]})
>>> df.select(pl.col("a").floor())
shape: (4, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 0.0 │
├╌╌╌╌╌┤
│ 0.0 │
├╌╌╌╌╌┤
│ 1.0 │
├╌╌╌╌╌┤
│ 1.0 │
└─────┘
forward_fill(limit: int | None = None) Expr[source]

Fill missing values with the latest seen values.

Parameters:
limit

The number of consecutive null values to forward fill.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, None],
...         "b": [4, None, 6],
...     }
... )
>>> df.select(pl.all().forward_fill())
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ 4   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ 6   │
└─────┴─────┘
hash(seed: int = 0, seed_1: int | None = None, seed_2: int | None = None, seed_3: int | None = None) Expr[source]

Hash the elements in the selection.

The hash value is of type UInt64.

Parameters:
seed

Random seed parameter. Defaults to 0.

seed_1

Random seed parameter. Defaults to seed if not set.

seed_2

Random seed parameter. Defaults to seed if not set.

seed_3

Random seed parameter. Defaults to seed if not set.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, None],
...         "b": ["x", None, "z"],
...     }
... )
>>> df.with_column(pl.all().hash(10, 20, 30, 40))
shape: (3, 2)
┌──────────────────────┬──────────────────────┐
│ a                    ┆ b                    │
│ ---                  ┆ ---                  │
│ u64                  ┆ u64                  │
╞══════════════════════╪══════════════════════╡
│ 9774092659964970114  ┆ 13614470193936745724 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1101441246220388612  ┆ 11638928888656214026 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 11638928888656214026 ┆ 13382926553367784577 │
└──────────────────────┴──────────────────────┘
head(n: int = 10) Expr[source]

Get the first n rows.

Parameters:
n

Number of rows to return.

Examples

>>> df = pl.DataFrame({"foo": [1, 2, 3, 4, 5, 6, 7]})
>>> df.head(3)
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 1   │
├╌╌╌╌╌┤
│ 2   │
├╌╌╌╌╌┤
│ 3   │
└─────┘
inspect(fmt: str = '{}') Expr[source]

Print the value that this expression evaluates to and pass on the value.

Examples

>>> df = pl.DataFrame({"foo": [1, 1, 2]})
>>> df.select(pl.col("foo").cumsum().inspect("value is: {}").alias("bar"))
value is: shape: (3,)
Series: 'foo' [i64]
[
    1
    2
    4
]
shape: (3, 1)
┌─────┐
│ bar │
│ --- │
│ i64 │
╞═════╡
│ 1   │
├╌╌╌╌╌┤
│ 2   │
├╌╌╌╌╌┤
│ 4   │
└─────┘
interpolate(method: InterpolationMethod = 'linear') Expr[source]

Fill nulls with linear interpolation over missing values.

Can also be used to regrid data to a new grid - see examples below.

Parameters:
method{‘linear’, ‘linear’}

Interpolation method

Examples

>>> # Fill nulls with linear interpolation
>>> df = pl.DataFrame(
...     {
...         "a": [1, None, 3],
...         "b": [1.0, float("nan"), 3.0],
...     }
... )
>>> df.select(pl.all().interpolate())
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═════╡
│ 1   ┆ 1.0 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ NaN │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 3   ┆ 3.0 │
└─────┴─────┘
>>> df_original_grid = pl.DataFrame(
...     {
...         "grid_points": [1, 3, 10],
...         "values": [2.0, 6.0, 20.0],
...     }
... )  # Interpolate from this to the new grid
>>> df_new_grid = pl.DataFrame({"grid_points": range(1, 11)})
>>> (
...     df_new_grid.join(
...         df_original_grid, on="grid_points", how="left"
...     ).with_column(pl.col("values").interpolate())
... )
shape: (10, 2)
┌─────────────┬────────┐
│ grid_points ┆ values │
│ ---         ┆ ---    │
│ i64         ┆ f64    │
╞═════════════╪════════╡
│ 1           ┆ 2.0    │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2           ┆ 4.0    │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 3           ┆ 6.0    │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 4           ┆ 8.0    │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ ...         ┆ ...    │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 7           ┆ 14.0   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 8           ┆ 16.0   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 9           ┆ 18.0   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 10          ┆ 20.0   │
└─────────────┴────────┘
is_between(start: Expr | datetime | date | int | float, end: Expr | datetime | date | int | float, include_bounds: bool | tuple[bool, bool] = False) Expr[source]

Check if this expression is between start and end.

Parameters:
start

Lower bound as primitive type or datetime.

end

Upper bound as primitive type or datetime.

include_bounds

False: Exclude both start and end (default). True: Include both start and end. (False, False): Exclude start and exclude end. (True, True): Include start and include end. (False, True): Exclude start and include end. (True, False): Include start and exclude end.

Returns:
Expr that evaluates to a Boolean Series.

Examples

>>> df = pl.DataFrame({"num": [1, 2, 3, 4, 5]})
>>> df.with_column(pl.col("num").is_between(2, 4))
shape: (5, 2)
┌─────┬────────────┐
│ num ┆ is_between │
│ --- ┆ ---        │
│ i64 ┆ bool       │
╞═════╪════════════╡
│ 1   ┆ false      │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2   ┆ false      │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3   ┆ true       │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4   ┆ false      │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 5   ┆ false      │
└─────┴────────────┘
is_duplicated() Expr[source]

Get mask of duplicated values.

Examples

>>> df = pl.DataFrame({"a": [1, 1, 2]})
>>> (df.select(pl.col("a").is_duplicated()))
shape: (3, 1)
┌───────┐
│ a     │
│ ---   │
│ bool  │
╞═══════╡
│ true  │
├╌╌╌╌╌╌╌┤
│ true  │
├╌╌╌╌╌╌╌┤
│ false │
└───────┘
is_finite() Expr[source]

Returns a boolean Series indicating which values are finite.

Returns:
out

Series of type Boolean

Examples

>>> df = pl.DataFrame(
...     {
...         "A": [1.0, 2],
...         "B": [3.0, float("inf")],
...     }
... )
>>> df.select(pl.all().is_finite())
shape: (2, 2)
┌──────┬───────┐
│ A    ┆ B     │
│ ---  ┆ ---   │
│ bool ┆ bool  │
╞══════╪═══════╡
│ true ┆ true  │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ true ┆ false │
└──────┴───────┘
is_first() Expr[source]

Get a mask of the first unique value.

Returns:
Boolean Series

Examples

>>> df = pl.DataFrame(
...     {
...         "num": [1, 2, 3, 1, 5],
...     }
... )
>>> (df.with_column(pl.col("num").is_first().alias("is_first")))
shape: (5, 2)
┌─────┬──────────┐
│ num ┆ is_first │
│ --- ┆ ---      │
│ i64 ┆ bool     │
╞═════╪══════════╡
│ 1   ┆ true     │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 2   ┆ true     │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 3   ┆ true     │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 1   ┆ false    │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 5   ┆ true     │
└─────┴──────────┘
is_in(other: Union[Expr, Sequence[Any], str, Series]) Expr[source]

Check if elements of this expression are present in the other Series.

Parameters:
other

Series or sequence of primitive type.

Returns:
Expr that evaluates to a Boolean Series.

Examples

>>> df = pl.DataFrame(
...     {"sets": [[1, 2, 3], [1, 2], [9, 10]], "optional_members": [1, 2, 3]}
... )
>>> (df.select([pl.col("optional_members").is_in("sets").alias("contains")]))
shape: (3, 1)
┌──────────┐
│ contains │
│ ---      │
│ bool     │
╞══════════╡
│ true     │
├╌╌╌╌╌╌╌╌╌╌┤
│ true     │
├╌╌╌╌╌╌╌╌╌╌┤
│ false    │
└──────────┘
is_infinite() Expr[source]

Returns a boolean Series indicating which values are infinite.

Returns:
out

Series of type Boolean

Examples

>>> df = pl.DataFrame(
...     {
...         "A": [1.0, 2],
...         "B": [3.0, float("inf")],
...     }
... )
>>> df.select(pl.all().is_infinite())
shape: (2, 2)
┌───────┬───────┐
│ A     ┆ B     │
│ ---   ┆ ---   │
│ bool  ┆ bool  │
╞═══════╪═══════╡
│ false ┆ false │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ false ┆ true  │
└───────┴───────┘
is_nan() Expr[source]

Returns a boolean Series indicating which values are NaN.

Notes

Floating point `NaN (Not A Number) should not be confused with missing data represented as Null/None.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, None, 1, 5],
...         "b": [1.0, 2.0, float("nan"), 1.0, 5.0],
...     }
... )
>>> df.with_column(pl.col(pl.Float64).is_nan().suffix("_isnan"))
shape: (5, 3)
┌──────┬─────┬─────────┐
│ a    ┆ b   ┆ b_isnan │
│ ---  ┆ --- ┆ ---     │
│ i64  ┆ f64 ┆ bool    │
╞══════╪═════╪═════════╡
│ 1    ┆ 1.0 ┆ false   │
├╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 2    ┆ 2.0 ┆ false   │
├╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ null ┆ NaN ┆ true    │
├╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 1    ┆ 1.0 ┆ false   │
├╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 5    ┆ 5.0 ┆ false   │
└──────┴─────┴─────────┘
is_not() Expr[source]

Negate a boolean expression.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [True, False, False],
...         "b": ["a", "b", None],
...     }
... )
>>> df
shape: (3, 2)
┌───────┬──────┐
│ a     ┆ b    │
│ ---   ┆ ---  │
│ bool  ┆ str  │
╞═══════╪══════╡
│ true  ┆ a    │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ false ┆ b    │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ false ┆ null │
└───────┴──────┘
>>> df.select(pl.col("a").is_not())
shape: (3, 1)
┌───────┐
│ a     │
│ ---   │
│ bool  │
╞═══════╡
│ false │
├╌╌╌╌╌╌╌┤
│ true  │
├╌╌╌╌╌╌╌┤
│ true  │
└───────┘
is_not_nan() Expr[source]

Returns a boolean Series indicating which values are not NaN.

Notes

Floating point `NaN (Not A Number) should not be confused with missing data represented as Null/None.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, None, 1, 5],
...         "b": [1.0, 2.0, float("nan"), 1.0, 5.0],
...     }
... )
>>> df.with_column(pl.col(pl.Float64).is_not_nan().suffix("_is_not_nan"))
shape: (5, 3)
┌──────┬─────┬──────────────┐
│ a    ┆ b   ┆ b_is_not_nan │
│ ---  ┆ --- ┆ ---          │
│ i64  ┆ f64 ┆ bool         │
╞══════╪═════╪══════════════╡
│ 1    ┆ 1.0 ┆ true         │
├╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2    ┆ 2.0 ┆ true         │
├╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ null ┆ NaN ┆ false        │
├╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1    ┆ 1.0 ┆ true         │
├╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 5    ┆ 5.0 ┆ true         │
└──────┴─────┴──────────────┘
is_not_null() Expr[source]

Returns a boolean Series indicating which values are not null.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, None, 1, 5],
...         "b": [1.0, 2.0, float("nan"), 1.0, 5.0],
...     }
... )
>>> df.with_column(pl.all().is_not_null().suffix("_not_null"))  # nan != null
shape: (5, 4)
┌──────┬─────┬────────────┬────────────┐
│ a    ┆ b   ┆ a_not_null ┆ b_not_null │
│ ---  ┆ --- ┆ ---        ┆ ---        │
│ i64  ┆ f64 ┆ bool       ┆ bool       │
╞══════╪═════╪════════════╪════════════╡
│ 1    ┆ 1.0 ┆ true       ┆ true       │
├╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2    ┆ 2.0 ┆ true       ┆ true       │
├╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ null ┆ NaN ┆ false      ┆ true       │
├╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1    ┆ 1.0 ┆ true       ┆ true       │
├╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 5    ┆ 5.0 ┆ true       ┆ true       │
└──────┴─────┴────────────┴────────────┘
is_null() Expr[source]

Returns a boolean Series indicating which values are null.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, None, 1, 5],
...         "b": [1.0, 2.0, float("nan"), 1.0, 5.0],
...     }
... )
>>> df.with_column(pl.all().is_null().suffix("_isnull"))  # nan != null
shape: (5, 4)
┌──────┬─────┬──────────┬──────────┐
│ a    ┆ b   ┆ a_isnull ┆ b_isnull │
│ ---  ┆ --- ┆ ---      ┆ ---      │
│ i64  ┆ f64 ┆ bool     ┆ bool     │
╞══════╪═════╪══════════╪══════════╡
│ 1    ┆ 1.0 ┆ false    ┆ false    │
├╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 2    ┆ 2.0 ┆ false    ┆ false    │
├╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ null ┆ NaN ┆ true     ┆ false    │
├╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 1    ┆ 1.0 ┆ false    ┆ false    │
├╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 5    ┆ 5.0 ┆ false    ┆ false    │
└──────┴─────┴──────────┴──────────┘
is_unique() Expr[source]

Get mask of unique values.

Examples

>>> df = pl.DataFrame({"a": [1, 1, 2]})
>>> (df.select(pl.col("a").is_unique()))
shape: (3, 1)
┌───────┐
│ a     │
│ ---   │
│ bool  │
╞═══════╡
│ false │
├╌╌╌╌╌╌╌┤
│ false │
├╌╌╌╌╌╌╌┤
│ true  │
└───────┘
keep_name() Expr[source]

Keep the original root name of the expression.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2],
...         "b": [3, 4],
...     }
... )

Keep original column name to undo an alias operation.

>>> df.with_columns([(pl.col("a") * 9).alias("c").keep_name()])
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 9   ┆ 3   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 18  ┆ 4   │
└─────┴─────┘

Prevent “DuplicateError: Column with name: ‘literal’ has more than one occurrences” errors.

>>> df.select([(pl.lit(10) / pl.all()).keep_name()])
shape: (2, 2)
┌──────┬──────────┐
│ a    ┆ b        │
│ ---  ┆ ---      │
│ f64  ┆ f64      │
╞══════╪══════════╡
│ 10.0 ┆ 3.333333 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 5.0  ┆ 2.5      │
└──────┴──────────┘
kurtosis(fisher: bool = True, bias: bool = True) Expr[source]

Compute the kurtosis (Fisher or Pearson) of a dataset.

Kurtosis is the fourth central moment divided by the square of the variance. If Fisher’s definition is used, then 3.0 is subtracted from the result to give 0.0 for a normal distribution. If bias is False then the kurtosis is calculated using k statistics to eliminate bias coming from biased moment estimators

See scipy.stats for more information

Parameters:
fisherbool, optional

If True, Fisher’s definition is used (normal ==> 0.0). If False, Pearson’s definition is used (normal ==> 3.0).

biasbool, optional

If False, the calculations are corrected for statistical bias.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3, 2, 1]})
>>> df.select(pl.col("a").kurtosis())
shape: (1, 1)
┌───────────┐
│ a         │
│ ---       │
│ f64       │
╞═══════════╡
│ -1.153061 │
└───────────┘
last() Expr[source]

Get the last value.

Examples

>>> df = pl.DataFrame({"a": [1, 1, 2]})
>>> df.select(pl.col("a").last())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 2   │
└─────┘
len() Expr[source]

Count the number of values in this expression.

Alias for count().

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [8, 9, 10],
...         "b": [None, 4, 4],
...     }
... )
>>> df.select(pl.all().len())  # counts nulls
shape: (1, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ u32 ┆ u32 │
╞═════╪═════╡
│ 3   ┆ 3   │
└─────┴─────┘
limit(n: int = 10) Expr[source]

Get the first n rows.

Alias for Expr.head().

Parameters:
n

Number of rows to return.

list() Expr[source]

Aggregate to list.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 2, 3],
...         "b": [4, 5, 6],
...     }
... )
>>> df.select(pl.all().list())
shape: (1, 2)
┌───────────┬───────────┐
│ a         ┆ b         │
│ ---       ┆ ---       │
│ list[i64] ┆ list[i64] │
╞═══════════╪═══════════╡
│ [1, 2, 3] ┆ [4, 5, 6] │
└───────────┴───────────┘
log(base: float = 2.718281828459045) Expr[source]

Compute the logarithm to a given base.

Parameters:
base

Given base, defaults to e

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3]})
>>> df.select(pl.col("a").log(base=2))
shape: (3, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 0.0      │
├╌╌╌╌╌╌╌╌╌╌┤
│ 1.0      │
├╌╌╌╌╌╌╌╌╌╌┤
│ 1.584963 │
└──────────┘
log10() Expr[source]

Compute the base 10 logarithm of the input array, element-wise.

Examples

>>> df = pl.DataFrame({"values": [1.0, 2.0, 4.0]})
>>> df.select(pl.col("values").log10())
shape: (3, 1)
┌─────────┐
│ values  │
│ ---     │
│ f64     │
╞═════════╡
│ 0.0     │
├╌╌╌╌╌╌╌╌╌┤
│ 0.30103 │
├╌╌╌╌╌╌╌╌╌┤
│ 0.60206 │
└─────────┘
lower_bound() Expr[source]

Calculate the lower bound.

Returns a unit Series with the lowest value possible for the dtype of this expression.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3, 2, 1]})
>>> df.select(pl.col("a").lower_bound())
shape: (1, 1)
┌──────────────────────┐
│ a                    │
│ ---                  │
│ i64                  │
╞══════════════════════╡
│ -9223372036854775808 │
└──────────────────────┘
map(f: Callable[[Series], Series | Any], return_dtype: Optional[Union[Type[DataType], DataType]] = None, agg_list: bool = False) Expr[source]

Apply a custom python function to a Series or sequence of Series.

The output of this custom function must be a Series. If you want to apply a custom function elementwise over single values, see apply(). A use case for map is when you want to transform an expression with a third-party library.

Read more in the book.

Parameters:
f

Lambda/ function to apply.

return_dtype

Dtype of the output Series.

agg_list

Aggregate list

Examples

>>> df = pl.DataFrame(
...     {
...         "sine": [0.0, 1.0, 0.0, -1.0],
...         "cosine": [1.0, 0.0, -1.0, 0.0],
...     }
... )
>>> (df.select(pl.all().map(lambda x: x.to_numpy().argmax())))
shape: (1, 2)
┌──────┬────────┐
│ sine ┆ cosine │
│ ---  ┆ ---    │
│ i64  ┆ i64    │
╞══════╪════════╡
│ 1    ┆ 0      │
└──────┴────────┘
map_alias(f: Callable[[str], str]) Expr[source]

Rename the output of an expression by mapping a function over the root name.

Parameters:
f

Function that maps root name to new name.

Examples

>>> df = pl.DataFrame(
...     {
...         "A": [1, 2],
...         "B": [3, 4],
...     }
... )
>>> df.select(
...     pl.all().reverse().map_alias(lambda colName: colName + "_reverse")
... )
shape: (2, 2)
┌───────────┬───────────┐
│ A_reverse ┆ B_reverse │
│ ---       ┆ ---       │
│ i64       ┆ i64       │
╞═══════════╪═══════════╡
│ 2         ┆ 4         │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 1         ┆ 3         │
└───────────┴───────────┘
max() Expr[source]

Get maximum value.

Examples

>>> df = pl.DataFrame({"a": [-1, float("nan"), 1]})
>>> df.select(pl.col("a").max())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.0 │
└─────┘
mean() Expr[source]

Get mean value.

Examples

>>> df = pl.DataFrame({"a": [-1, 0, 1]})
>>> df.select(pl.col("a").mean())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 0.0 │
└─────┘
median() Expr[source]

Get median value using linear interpolation.

Examples

>>> df = pl.DataFrame({"a": [-1, 0, 1]})
>>> df.select(pl.col("a").median())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 0.0 │
└─────┘
min() Expr[source]

Get minimum value.

Examples

>>> df = pl.DataFrame({"a": [-1, float("nan"), 1]})
>>> df.select(pl.col("a").min())
shape: (1, 1)
┌──────┐
│ a    │
│ ---  │
│ f64  │
╞══════╡
│ -1.0 │
└──────┘
mode() Expr[source]

Compute the most occurring value(s).

Can return multiple Values.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 1, 2, 3],
...         "b": [1, 1, 2, 2],
...     }
... )
>>> df.select(pl.all().mode())  
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 1   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 1   ┆ 2   │
└─────┴─────┘
n_unique() Expr[source]

Count unique values.

Examples

>>> df = pl.DataFrame({"a": [1, 1, 2]})
>>> df.select(pl.col("a").n_unique())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ u32 │
╞═════╡
│ 2   │
└─────┘
nan_max() Expr[source]

Get maximum value, but propagate/poison encountered NaN values.

This differs from numpy’s nanmax as numpy defaults to propagating NaN values, whereas polars defaults to ignoring them.

Examples

>>> df = pl.DataFrame({"a": [0, float("nan")]})
>>> df.select(pl.col("a").nan_max())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ NaN │
└─────┘
nan_min() Expr[source]

Get minimum value, but propagate/poison encountered NaN values.

This differs from numpy’s nanmax as numpy defaults to propagating NaN values, whereas polars defaults to ignoring them.

Examples

>>> df = pl.DataFrame({"a": [0, float("nan")]})
>>> df.select(pl.col("a").nan_min())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ NaN │
└─────┘
null_count() Expr[source]

Count null values.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [None, 1, None],
...         "b": [1, 2, 3],
...     }
... )
>>> df.select(pl.all().null_count())
shape: (1, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ u32 ┆ u32 │
╞═════╪═════╡
│ 2   ┆ 0   │
└─────┴─────┘
over(expr: str | Expr | list[Expr | str]) Expr[source]

Apply window function over a subgroup.

This is similar to a groupby + aggregation + self join. Or similar to window functions in Postgres.

Parameters:
expr

Column(s) to group by.

Examples

>>> df = pl.DataFrame(
...     {
...         "groups": ["g1", "g1", "g2"],
...         "values": [1, 2, 3],
...     }
... )
>>> (
...     df.with_column(
...         pl.col("values").max().over("groups").alias("max_by_group")
...     )
... )
shape: (3, 3)
┌────────┬────────┬──────────────┐
│ groups ┆ values ┆ max_by_group │
│ ---    ┆ ---    ┆ ---          │
│ str    ┆ i64    ┆ i64          │
╞════════╪════════╪══════════════╡
│ g1     ┆ 1      ┆ 2            │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ g1     ┆ 2      ┆ 2            │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ g2     ┆ 3      ┆ 3            │
└────────┴────────┴──────────────┘
>>> df = pl.DataFrame(
...     {
...         "groups": [1, 1, 2, 2, 1, 2, 3, 3, 1],
...         "values": [1, 2, 3, 4, 5, 6, 7, 8, 8],
...     }
... )
>>> (
...     df.lazy()
...     .select(
...         [
...             pl.col("groups").sum().over("groups"),
...         ]
...     )
...     .collect()
... )
shape: (9, 1)
┌────────┐
│ groups │
│ ---    │
│ i64    │
╞════════╡
│ 4      │
├╌╌╌╌╌╌╌╌┤
│ 4      │
├╌╌╌╌╌╌╌╌┤
│ 6      │
├╌╌╌╌╌╌╌╌┤
│ 6      │
├╌╌╌╌╌╌╌╌┤
│ ...    │
├╌╌╌╌╌╌╌╌┤
│ 6      │
├╌╌╌╌╌╌╌╌┤
│ 6      │
├╌╌╌╌╌╌╌╌┤
│ 6      │
├╌╌╌╌╌╌╌╌┤
│ 4      │
└────────┘
pct_change(n: int = 1) Expr[source]

Computes percentage change between values.

Percentage change (as fraction) between current element and most-recent non-null element at least n period(s) before the current element.

Computes the change from the previous row by default.

Parameters:
n

periods to shift for forming percent change.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [10, 11, 12, None, 12],
...     }
... )
>>> df.with_column(pl.col("a").pct_change().alias("pct_change"))
shape: (5, 2)
┌──────┬────────────┐
│ a    ┆ pct_change │
│ ---  ┆ ---        │
│ i64  ┆ f64        │
╞══════╪════════════╡
│ 10   ┆ null       │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 11   ┆ 0.1        │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 12   ┆ 0.090909   │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ null ┆ 0.0        │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 12   ┆ 0.0        │
└──────┴────────────┘
pow(exponent: int | float | Series | Expr) Expr[source]

Raise expression to the power of exponent.

Examples

>>> df = pl.DataFrame({"foo": [1, 2, 3, 4]})
>>> df.select(pl.col("foo").pow(3))
shape: (4, 1)
┌──────┐
│ foo  │
│ ---  │
│ f64  │
╞══════╡
│ 1.0  │
├╌╌╌╌╌╌┤
│ 8.0  │
├╌╌╌╌╌╌┤
│ 27.0 │
├╌╌╌╌╌╌┤
│ 64.0 │
└──────┘
prefix(prefix: str) Expr[source]

Add a prefix to the root column name of the expression.

Examples

>>> df = pl.DataFrame(
...     {
...         "A": [1, 2, 3, 4, 5],
...         "fruits": ["banana", "banana", "apple", "apple", "banana"],
...         "B": [5, 4, 3, 2, 1],
...         "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
...     }
... )
>>> df
shape: (5, 4)
┌─────┬────────┬─────┬────────┐
│ A   ┆ fruits ┆ B   ┆ cars   │
│ --- ┆ ---    ┆ --- ┆ ---    │
│ i64 ┆ str    ┆ i64 ┆ str    │
╞═════╪════════╪═════╪════════╡
│ 1   ┆ banana ┆ 5   ┆ beetle │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2   ┆ banana ┆ 4   ┆ audi   │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 3   ┆ apple  ┆ 3   ┆ beetle │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 4   ┆ apple  ┆ 2   ┆ beetle │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 5   ┆ banana ┆ 1   ┆ beetle │
└─────┴────────┴─────┴────────┘
>>> df.select(
...     [
...         pl.all(),
...         pl.all().reverse().prefix("reverse_"),
...     ]
... )
shape: (5, 8)
┌─────┬────────┬─────┬────────┬───────────┬────────────────┬───────────┬──────────────┐
│ A   ┆ fruits ┆ B   ┆ cars   ┆ reverse_A ┆ reverse_fruits ┆ reverse_B ┆ reverse_cars │
│ --- ┆ ---    ┆ --- ┆ ---    ┆ ---       ┆ ---            ┆ ---       ┆ ---          │
│ i64 ┆ str    ┆ i64 ┆ str    ┆ i64       ┆ str            ┆ i64       ┆ str          │
╞═════╪════════╪═════╪════════╪═══════════╪════════════════╪═══════════╪══════════════╡
│ 1   ┆ banana ┆ 5   ┆ beetle ┆ 5         ┆ banana         ┆ 1         ┆ beetle       │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2   ┆ banana ┆ 4   ┆ audi   ┆ 4         ┆ apple          ┆ 2         ┆ beetle       │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3   ┆ apple  ┆ 3   ┆ beetle ┆ 3         ┆ apple          ┆ 3         ┆ beetle       │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4   ┆ apple  ┆ 2   ┆ beetle ┆ 2         ┆ banana         ┆ 4         ┆ audi         │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 5   ┆ banana ┆ 1   ┆ beetle ┆ 1         ┆ banana         ┆ 5         ┆ beetle       │
└─────┴────────┴─────┴────────┴───────────┴────────────────┴───────────┴──────────────┘
product() Expr[source]

Compute the product of an expression.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3]})
>>> df.select(pl.col("a").product())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 6   │
└─────┘
quantile(quantile: float, interpolation: RollingInterpolationMethod = 'nearest') Expr[source]

Get quantile value.

Parameters:
quantile

Quantile between 0.0 and 1.0.

interpolation{‘nearest’, ‘higher’, ‘lower’, ‘midpoint’, ‘linear’}

Interpolation method.

Examples

>>> df = pl.DataFrame({"a": [0, 1, 2, 3, 4, 5]})
>>> (df.select(pl.col("a").quantile(0.3)))
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.0 │
└─────┘
>>> (df.select(pl.col("a").quantile(0.3, interpolation="higher")))
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 2.0 │
└─────┘
>>> (df.select(pl.col("a").quantile(0.3, interpolation="lower")))
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.0 │
└─────┘
>>> (df.select(pl.col("a").quantile(0.3, interpolation="midpoint")))
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.5 │
└─────┘
>>> (df.select(pl.col("a").quantile(0.3, interpolation="linear")))
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.5 │
└─────┘
rank(method: RankMethod = 'average', reverse: bool = False) Expr[source]

Assign ranks to data, dealing with ties appropriately.

Parameters:
method{‘average’, ‘min’, ‘max’, ‘dense’, ‘ordinal’, ‘random’}

The method used to assign ranks to tied elements. The following methods are available (default is ‘average’):

  • ‘average’ : The average of the ranks that would have been assigned to all the tied values is assigned to each value.

  • ‘min’ : The minimum of the ranks that would have been assigned to all the tied values is assigned to each value. (This is also referred to as “competition” ranking.)

  • ‘max’ : The maximum of the ranks that would have been assigned to all the tied values is assigned to each value.

  • ‘dense’ : Like ‘min’, but the rank of the next highest element is assigned the rank immediately after those assigned to the tied elements.

  • ‘ordinal’ : All values are given a distinct rank, corresponding to the order that the values occur in the Series.

  • ‘random’ : Like ‘ordinal’, but the rank for ties is not dependent on the order that the values occur in the Series.

reverse

Reverse the operation.

Examples

The ‘average’ method:

>>> df = pl.DataFrame({"a": [3, 6, 1, 1, 6]})
>>> df.select(pl.col("a").rank())
shape: (5, 1)
┌─────┐
│ a   │
│ --- │
│ f32 │
╞═════╡
│ 3.0 │
├╌╌╌╌╌┤
│ 4.5 │
├╌╌╌╌╌┤
│ 1.5 │
├╌╌╌╌╌┤
│ 1.5 │
├╌╌╌╌╌┤
│ 4.5 │
└─────┘

The ‘ordinal’ method:

>>> df = pl.DataFrame({"a": [3, 6, 1, 1, 6]})
>>> df.select(pl.col("a").rank("ordinal"))
shape: (5, 1)
┌─────┐
│ a   │
│ --- │
│ u32 │
╞═════╡
│ 3   │
├╌╌╌╌╌┤
│ 4   │
├╌╌╌╌╌┤
│ 1   │
├╌╌╌╌╌┤
│ 2   │
├╌╌╌╌╌┤
│ 5   │
└─────┘
rechunk() Expr[source]

Create a single chunk of memory for this Series.

Examples

>>> df = pl.DataFrame({"a": [1, 1, 2]})
>>> # Create a Series with 3 nulls, append column a then rechunk
>>> (df.select(pl.repeat(None, 3).append(pl.col("a")).rechunk()))
shape: (6, 1)
┌─────────┐
│ literal │
│ ---     │
│ i64     │
╞═════════╡
│ null    │
├╌╌╌╌╌╌╌╌╌┤
│ null    │
├╌╌╌╌╌╌╌╌╌┤
│ null    │
├╌╌╌╌╌╌╌╌╌┤
│ 1       │
├╌╌╌╌╌╌╌╌╌┤
│ 1       │
├╌╌╌╌╌╌╌╌╌┤
│ 2       │
└─────────┘
reinterpret(signed: bool = True) Expr[source]

Reinterpret the underlying bits as a signed/unsigned integer.

This operation is only allowed for 64bit integers. For lower bits integers, you can safely use that cast operation.

Parameters:
signed

If True, reinterpret as pl.Int64. Otherwise, reinterpret as pl.UInt64.

Examples

>>> s = pl.Series("a", [1, 1, 2], dtype=pl.UInt64)
>>> df = pl.DataFrame([s])
>>> df.select(
...     [
...         pl.col("a").reinterpret(signed=True).alias("reinterpreted"),
...         pl.col("a").alias("original"),
...     ]
... )
shape: (3, 2)
┌───────────────┬──────────┐
│ reinterpreted ┆ original │
│ ---           ┆ ---      │
│ i64           ┆ u64      │
╞═══════════════╪══════════╡
│ 1             ┆ 1        │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 1             ┆ 1        │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 2             ┆ 2        │
└───────────────┴──────────┘
repeat_by(by: Expr | str) Expr[source]

Repeat the elements in this Series as specified in the given expression.

The repeated elements are expanded into a List.

Parameters:
by

Numeric column that determines how often the values will be repeated. The column will be coerced to UInt32. Give this dtype to make the coercion a no-op.

Returns:
Series of type List

Examples

>>> df = pl.DataFrame(
...     {
...         "a": ["x", "y", "z"],
...         "n": [1, 2, 3],
...     }
... )
>>> df.select(pl.col("a").repeat_by("n"))
shape: (3, 1)
┌─────────────────┐
│ a               │
│ ---             │
│ list[str]       │
╞═════════════════╡
│ ["x"]           │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ["y", "y"]      │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ["z", "z", "z"] │
└─────────────────┘
reshape(dims: tuple[int, ...]) Expr[source]

Reshape this Expr to a flat Series or a Series of Lists.

Parameters:
dims

Tuple of the dimension sizes. If a -1 is used in any of the dimensions, that dimension is inferred.

Returns:
Expr

If a single dimension is given, results in a flat Series of shape (len,). If a multiple dimensions are given, results in a Series of Lists with shape (rows, cols).

Examples

>>> df = pl.DataFrame({"foo": [1, 2, 3, 4, 5, 6, 7, 8, 9]})
>>> df.select(pl.col("foo").reshape((3, 3)))
shape: (3, 1)
┌───────────┐
│ foo       │
│ ---       │
│ list[i64] │
╞═══════════╡
│ [1, 2, 3] │
├╌╌╌╌╌╌╌╌╌╌╌┤
│ [4, 5, 6] │
├╌╌╌╌╌╌╌╌╌╌╌┤
│ [7, 8, 9] │
└───────────┘
reverse() Expr[source]

Reverse the selection.

Examples

>>> df = pl.DataFrame(
...     {
...         "A": [1, 2, 3, 4, 5],
...         "fruits": ["banana", "banana", "apple", "apple", "banana"],
...         "B": [5, 4, 3, 2, 1],
...         "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
...     }
... )
>>> df.select(
...     [
...         pl.all(),
...         pl.all().reverse().suffix("_reverse"),
...     ]
... )
shape: (5, 8)
┌─────┬────────┬─────┬────────┬───────────┬────────────────┬───────────┬──────────────┐
│ A   ┆ fruits ┆ B   ┆ cars   ┆ A_reverse ┆ fruits_reverse ┆ B_reverse ┆ cars_reverse │
│ --- ┆ ---    ┆ --- ┆ ---    ┆ ---       ┆ ---            ┆ ---       ┆ ---          │
│ i64 ┆ str    ┆ i64 ┆ str    ┆ i64       ┆ str            ┆ i64       ┆ str          │
╞═════╪════════╪═════╪════════╪═══════════╪════════════════╪═══════════╪══════════════╡
│ 1   ┆ banana ┆ 5   ┆ beetle ┆ 5         ┆ banana         ┆ 1         ┆ beetle       │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2   ┆ banana ┆ 4   ┆ audi   ┆ 4         ┆ apple          ┆ 2         ┆ beetle       │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3   ┆ apple  ┆ 3   ┆ beetle ┆ 3         ┆ apple          ┆ 3         ┆ beetle       │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4   ┆ apple  ┆ 2   ┆ beetle ┆ 2         ┆ banana         ┆ 4         ┆ audi         │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 5   ┆ banana ┆ 1   ┆ beetle ┆ 1         ┆ banana         ┆ 5         ┆ beetle       │
└─────┴────────┴─────┴────────┴───────────┴────────────────┴───────────┴──────────────┘
rolling_apply(function: Callable[[Series], Any], window_size: int, weights: list[float] | None = None, min_periods: int | None = None, center: bool = False) Expr[source]

Apply a custom rolling window function.

Prefer the specific rolling window functions over this one, as they are faster.

Prefer:

  • rolling_min

  • rolling_max

  • rolling_mean

  • rolling_sum

Parameters:
function

Aggregation function

window_size

The length of the window.

weights

An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.

min_periods

The number of values in the window that should be non-null before computing a result. If None, it will be set equal to window size.

center

Set the labels at the center of the window

Examples

>>> df = pl.DataFrame(
...     {
...         "A": [1.0, 2.0, 9.0, 2.0, 13.0],
...     }
... )
>>> df.select(
...     [
...         pl.col("A").rolling_apply(lambda s: s.std(), window_size=3),
...     ]
... )
 shape: (5, 1)
┌──────────┐
│ A        │
│ ---      │
│ f64      │
╞══════════╡
│ null     │
├╌╌╌╌╌╌╌╌╌╌┤
│ null     │
├╌╌╌╌╌╌╌╌╌╌┤
│ 4.358899 │
├╌╌╌╌╌╌╌╌╌╌┤
│ 4.041452 │
├╌╌╌╌╌╌╌╌╌╌┤
│ 5.567764 │
└──────────┘
rolling_max(window_size: int | timedelta | str, weights: list[float] | None = None, min_periods: int | None = None, center: bool = False, by: str | None = None, closed: ClosedWindow = 'left') Expr[source]

Apply a rolling max (moving max) over the values in this array.

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weight vector. The resulting values will be aggregated to their sum.

Parameters:
window_size

The length of the window. Can be a fixed integer size, or a dynamic temporal size indicated by a timedelta or the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 day)

  • 1w (1 week)

  • 1mo (1 calendar month)

  • 1y (1 calendar year)

  • 1i (1 index count)

If a timedelta or the dynamic string language is used, the by and closed arguments must also be set.

weights

An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.

min_periods

The number of values in the window that should be non-null before computing a result. If None, it will be set equal to window size.

center

Set the labels at the center of the window

by

If the window_size is temporal, for instance “5h” or “3s, you must set the column that will be used to determine the windows. This column must be of dtype {Date, Datetime}

closed{‘left’, ‘right’, ‘both’, ‘none’}

Define whether the temporal window interval is closed or not.

Warning

This functionality is experimental and may change without it being considered a breaking change.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using groupby_rolling this method can cache the window size computation.

Examples

>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]})
>>> (
...     df.select(
...         [
...             pl.col("A").rolling_max(window_size=2),
...         ]
...     )
... )
shape: (6, 1)
┌──────┐
│ A    │
│ ---  │
│ f64  │
╞══════╡
│ null │
├╌╌╌╌╌╌┤
│ 2.0  │
├╌╌╌╌╌╌┤
│ 3.0  │
├╌╌╌╌╌╌┤
│ 4.0  │
├╌╌╌╌╌╌┤
│ 5.0  │
├╌╌╌╌╌╌┤
│ 6.0  │
└──────┘
rolling_mean(window_size: int | timedelta | str, weights: list[float] | None = None, min_periods: int | None = None, center: bool = False, by: str | None = None, closed: ClosedWindow = 'left') Expr[source]

Apply a rolling mean (moving mean) over the values in this array.

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weight vector. The resulting values will be aggregated to their sum.

Parameters:
window_size

The length of the window. Can be a fixed integer size, or a dynamic temporal size indicated by a timedelta or the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 day)

  • 1w (1 week)

  • 1mo (1 calendar month)

  • 1y (1 calendar year)

  • 1i (1 index count)

If a timedelta or the dynamic string language is used, the by and closed arguments must also be set.

weights

An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.

min_periods

The number of values in the window that should be non-null before computing a result. If None, it will be set equal to window size.

center

Set the labels at the center of the window

by

If the window_size is temporal for instance “5h” or “3s, you must set the column that will be used to determine the windows. This column must be of dtype {Date, Datetime}

closed{‘left’, ‘right’, ‘both’, ‘none’}

Define whether the temporal window interval is closed or not.

Warning

This functionality is experimental and may change without it being considered a breaking change.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using groupby_rolling this method can cache the window size computation.

Examples

>>> df = pl.DataFrame({"A": [1.0, 8.0, 6.0, 2.0, 16.0, 10.0]})
>>> df.select(
...     [
...         pl.col("A").rolling_mean(window_size=2),
...     ]
... )
shape: (6, 1)
┌──────┐
│ A    │
│ ---  │
│ f64  │
╞══════╡
│ null │
├╌╌╌╌╌╌┤
│ 4.5  │
├╌╌╌╌╌╌┤
│ 7.0  │
├╌╌╌╌╌╌┤
│ 4.0  │
├╌╌╌╌╌╌┤
│ 9.0  │
├╌╌╌╌╌╌┤
│ 13.0 │
└──────┘
rolling_median(window_size: int | timedelta | str, weights: list[float] | None = None, min_periods: int | None = None, center: bool = False, by: str | None = None, closed: ClosedWindow = 'left') Expr[source]

Compute a rolling median.

Parameters:
window_size

The length of the window. Can be a fixed integer size, or a dynamic temporal size indicated by a timedelta or the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 day)

  • 1w (1 week)

  • 1mo (1 calendar month)

  • 1y (1 calendar year)

  • 1i (1 index count)

If a timedelta or the dynamic string language is used, the by and closed arguments must also be set.

weights

An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.

min_periods

The number of values in the window that should be non-null before computing a result. If None, it will be set equal to window size.

center

Set the labels at the center of the window

by

If the window_size is temporal for instance “5h” or “3s, you must set the column that will be used to determine the windows. This column must be of dtype {Date, Datetime}

closed{‘left’, ‘right’, ‘both’, ‘none’}

Define whether the temporal window interval is closed or not.

Warning

This functionality is experimental and may change without it being considered a breaking change.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using groupby_rolling this method can cache the window size computation.

Examples

>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 6.0, 8.0]})
>>> (
...     df.select(
...         [
...             pl.col("A").rolling_median(window_size=3),
...         ]
...     )
... )
shape: (6, 1)
┌──────┐
│ A    │
│ ---  │
│ f64  │
╞══════╡
│ null │
├╌╌╌╌╌╌┤
│ null │
├╌╌╌╌╌╌┤
│ 2.0  │
├╌╌╌╌╌╌┤
│ 3.0  │
├╌╌╌╌╌╌┤
│ 4.0  │
├╌╌╌╌╌╌┤
│ 6.0  │
└──────┘
rolling_min(window_size: int | timedelta | str, weights: list[float] | None = None, min_periods: int | None = None, center: bool = False, by: str | None = None, closed: ClosedWindow = 'left') Expr[source]

Apply a rolling min (moving min) over the values in this array.

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weight vector. The resulting values will be aggregated to their sum.

Parameters:
window_size

The length of the window. Can be a fixed integer size, or a dynamic temporal size indicated by a timedelta or the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 day)

  • 1w (1 week)

  • 1mo (1 calendar month)

  • 1y (1 calendar year)

  • 1i (1 index count)

If a timedelta or the dynamic string language is used, the by and closed arguments must also be set.

weights

An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.

min_periods

The number of values in the window that should be non-null before computing a result. If None, it will be set equal to window size.

center

Set the labels at the center of the window

by

If the window_size is temporal for instance “5h” or “3s, you must set the column that will be used to determine the windows. This column must be of dtype {Date, Datetime}

closed{‘left’, ‘right’, ‘both’, ‘none’}

Define whether the temporal window interval is closed or not.

Warning

This functionality is experimental and may change without it being considered a breaking change.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using groupby_rolling this method can cache the window size computation.

Examples

>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]})
>>> (
...     df.select(
...         [
...             pl.col("A").rolling_min(window_size=2),
...         ]
...     )
... )
shape: (6, 1)
┌──────┐
│ A    │
│ ---  │
│ f64  │
╞══════╡
│ null │
├╌╌╌╌╌╌┤
│ 1.0  │
├╌╌╌╌╌╌┤
│ 2.0  │
├╌╌╌╌╌╌┤
│ 3.0  │
├╌╌╌╌╌╌┤
│ 4.0  │
├╌╌╌╌╌╌┤
│ 5.0  │
└──────┘
rolling_quantile(quantile: float, interpolation: RollingInterpolationMethod = 'nearest', window_size: int | timedelta | str = 2, weights: list[float] | None = None, min_periods: int | None = None, center: bool = False, by: str | None = None, closed: ClosedWindow = 'left') Expr[source]

Compute a rolling quantile.

Parameters:
quantile

Quantile between 0.0 and 1.0.

interpolation{‘nearest’, ‘higher’, ‘lower’, ‘midpoint’, ‘linear’}

Interpolation method.

window_size

The length of the window. Can be a fixed integer size, or a dynamic temporal size indicated by a timedelta or the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 day)

  • 1w (1 week)

  • 1mo (1 calendar month)

  • 1y (1 calendar year)

  • 1i (1 index count)

If a timedelta or the dynamic string language is used, the by and closed arguments must also be set.

weights

An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.

min_periods

The number of values in the window that should be non-null before computing a result. If None, it will be set equal to window size.

center

Set the labels at the center of the window

by

If the window_size is temporal for instance “5h” or “3s, you must set the column that will be used to determine the windows. This column must be of dtype {Date, Datetime}

closed{‘left’, ‘right’, ‘both’, ‘none’}

Define whether the temporal window interval is closed or not.

Warning

This functionality is experimental and may change without it being considered a breaking change.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using groupby_rolling this method can cache the window size computation.

Examples

>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 6.0, 8.0]})
>>> (
...     df.select(
...         [
...             pl.col("A").rolling_quantile(quantile=0.33, window_size=3),
...         ]
...     )
... )
shape: (6, 1)
┌──────┐
│ A    │
│ ---  │
│ f64  │
╞══════╡
│ null │
├╌╌╌╌╌╌┤
│ null │
├╌╌╌╌╌╌┤
│ 1.0  │
├╌╌╌╌╌╌┤
│ 2.0  │
├╌╌╌╌╌╌┤
│ 3.0  │
├╌╌╌╌╌╌┤
│ 4.0  │
└──────┘
rolling_skew(window_size: int, bias: bool = True) Expr[source]

Compute a rolling skew.

Parameters:
window_size

Integer size of the rolling window.

bias

If False, the calculations are corrected for statistical bias.

rolling_std(window_size: int | timedelta | str, weights: list[float] | None = None, min_periods: int | None = None, center: bool = False, by: str | None = None, closed: ClosedWindow = 'left') Expr[source]

Compute a rolling standard deviation.

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weight vector. The resulting values will be aggregated to their sum.

Parameters:
window_size

The length of the window. Can be a fixed integer size, or a dynamic temporal size indicated by a timedelta or the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 day)

  • 1w (1 week)

  • 1mo (1 calendar month)

  • 1y (1 calendar year)

  • 1i (1 index count)

If a timedelta or the dynamic string language is used, the by and closed arguments must also be set.

weights

An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.

min_periods

The number of values in the window that should be non-null before computing a result. If None, it will be set equal to window size.

center

Set the labels at the center of the window

by

If the window_size is temporal for instance “5h” or “3s, you must set the column that will be used to determine the windows. This column must be of dtype {Date, Datetime}

closed{‘left’, ‘right’, ‘both’, ‘none’}

Define whether the temporal window interval is closed or not.

Warning

This functionality is experimental and may change without it being considered a breaking change.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using groupby_rolling this method can cache the window size computation.

Examples

>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 6.0, 8.0]})
>>> (
...     df.select(
...         [
...             pl.col("A").rolling_std(window_size=3),
...         ]
...     )
... )
shape: (6, 1)
┌──────────┐
│ A        │
│ ---      │
│ f64      │
╞══════════╡
│ null     │
├╌╌╌╌╌╌╌╌╌╌┤
│ null     │
├╌╌╌╌╌╌╌╌╌╌┤
│ 1.0      │
├╌╌╌╌╌╌╌╌╌╌┤
│ 1.0      │
├╌╌╌╌╌╌╌╌╌╌┤
│ 1.527525 │
├╌╌╌╌╌╌╌╌╌╌┤
│ 2.0      │
└──────────┘
rolling_sum(window_size: int | timedelta | str, weights: list[float] | None = None, min_periods: int | None = None, center: bool = False, by: str | None = None, closed: ClosedWindow = 'left') Expr[source]

Apply a rolling sum (moving sum) over the values in this array.

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weight vector. The resulting values will be aggregated to their sum.

Parameters:
window_size

The length of the window. Can be a fixed integer size, or a dynamic temporal size indicated by a timedelta or the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 day)

  • 1w (1 week)

  • 1mo (1 calendar month)

  • 1y (1 calendar year)

  • 1i (1 index count)

If a timedelta or the dynamic string language is used, the by and closed arguments must also be set.

weights

An optional slice with the same length of the window that will be multiplied elementwise with the values in the window.

min_periods

The number of values in the window that should be non-null before computing a result. If None, it will be set equal to window size.

center

Set the labels at the center of the window

by

If the window_size is temporal for instance “5h” or “3s, you must set the column that will be used to determine the windows. This column must of dtype {Date, Datetime}

closed{‘left’, ‘right’, ‘both’, ‘none’}

Define whether the temporal window interval is closed or not.

Warning

This functionality is experimental and may change without it being considered a breaking change.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using groupby_rolling this method can cache the window size computation.

Examples

>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]})
>>> (
...     df.select(
...         [
...             pl.col("A").rolling_sum(window_size=2),
...         ]
...     )
... )
shape: (6, 1)
┌──────┐
│ A    │
│ ---  │
│ f64  │
╞══════╡
│ null │
├╌╌╌╌╌╌┤
│ 3.0  │
├╌╌╌╌╌╌┤
│ 5.0  │
├╌╌╌╌╌╌┤
│ 7.0  │
├╌╌╌╌╌╌┤
│ 9.0  │
├╌╌╌╌╌╌┤
│ 11.0 │
└──────┘
rolling_var(window_size: int | timedelta | str, weights: list[float] | None = None, min_periods: int | None = None, center: bool = False, by: str | None = None, closed: ClosedWindow = 'left') Expr[source]

Compute a rolling variance.

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weight vector. The resulting values will be aggregated to their sum.

Parameters:
window_size

The length of the window. Can be a fixed integer size, or a dynamic temporal size indicated by a timedelta or the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 day)

  • 1w (1 week)

  • 1mo (1 calendar month)

  • 1y (1 calendar year)

  • 1i (1 index count)

If a timedelta or the dynamic string language is used, the by and closed arguments must also be set.

weights

An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.

min_periods

The number of values in the window that should be non-null before computing a result. If None, it will be set equal to window size.

center

Set the labels at the center of the window

by

If the window_size is temporal for instance “5h” or “3s, you must set the column that will be used to determine the windows. This column must be of dtype {Date, Datetime}

closed{‘left’, ‘right’, ‘both’, ‘none’}

Define whether the temporal window interval is closed or not.

Warning

This functionality is experimental and may change without it being considered a breaking change.

Notes

If you want to compute multiple aggregation statistics over the same dynamic window, consider using groupby_rolling this method can cache the window size computation.

Examples

>>> df = pl.DataFrame({"A": [1.0, 2.0, 3.0, 4.0, 6.0, 8.0]})
>>> (
...     df.select(
...         [
...             pl.col("A").rolling_var(window_size=3),
...         ]
...     )
... )
shape: (6, 1)
┌──────────┐
│ A        │
│ ---      │
│ f64      │
╞══════════╡
│ null     │
├╌╌╌╌╌╌╌╌╌╌┤
│ null     │
├╌╌╌╌╌╌╌╌╌╌┤
│ 1.0      │
├╌╌╌╌╌╌╌╌╌╌┤
│ 1.0      │
├╌╌╌╌╌╌╌╌╌╌┤
│ 2.333333 │
├╌╌╌╌╌╌╌╌╌╌┤
│ 4.0      │
└──────────┘
round(decimals: int) Expr[source]

Round underlying floating point data by decimals digits.

Parameters:
decimals

Number of decimals to round by.

Examples

>>> df = pl.DataFrame({"a": [0.33, 0.52, 1.02, 1.17]})
>>> df.select(pl.col("a").round(1))
shape: (4, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 0.3 │
├╌╌╌╌╌┤
│ 0.5 │
├╌╌╌╌╌┤
│ 1.0 │
├╌╌╌╌╌┤
│ 1.2 │
└─────┘
sample(n: int | None = None, frac: float | None = None, with_replacement: bool = False, shuffle: bool = False, seed: int | None = None) Expr[source]

Sample from this expression.

Parameters:
n

Number of items to return. Cannot be used with frac. Defaults to 1 if frac is None.

frac

Fraction of items to return. Cannot be used with n.

with_replacement

Allow values to be sampled more than once.

shuffle

Shuffle the order of sampled data points.

seed

Seed for the random number generator. If set to None (default), a random seed is generated using the random module.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3]})
>>> df.select(pl.col("a").sample(frac=1.0, with_replacement=True, seed=1))
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 3   │
├╌╌╌╌╌┤
│ 1   │
├╌╌╌╌╌┤
│ 1   │
└─────┘
search_sorted(element: Expr | int | float) Expr[source]

Find indices where elements should be inserted to maintain order.

\[a[i-1] < v <= a[i]\]
Parameters:
element

Expression or scalar value.

Examples

>>> df = pl.DataFrame(
...     {
...         "values": [1, 2, 3, 5],
...     }
... )
>>> df.select(
...     [
...         pl.col("values").search_sorted(0).alias("zero"),
...         pl.col("values").search_sorted(3).alias("three"),
...         pl.col("values").search_sorted(6).alias("six"),
...     ]
... )
shape: (1, 3)
┌──────┬───────┬─────┐
│ zero ┆ three ┆ six │
│ ---  ┆ ---   ┆ --- │
│ u32  ┆ u32   ┆ u32 │
╞══════╪═══════╪═════╡
│ 0    ┆ 2     ┆ 4   │
└──────┴───────┴─────┘
set_sorted(reverse: bool = False) Expr[source]

Flags the expression as ‘sorted’.

Enables downstream code to user fast paths for sorted arrays.

Parameters:
reverse

If the Series order is reversed, e.g. descending.

Warning

This can lead to incorrect results if this Series is not sorted!! Use with care!

Examples

>>> df = pl.DataFrame({"values": [1, 2, 3]})
>>> df.select(pl.col("values").set_sorted().max())
shape: (1, 1)
┌────────┐
│ values │
│ ---    │
│ i64    │
╞════════╡
│ 3      │
└────────┘
shift(periods: int = 1) Expr[source]

Shift the values by a given period.

Parameters:
periods

Number of places to shift (may be negative).

Examples

>>> df = pl.DataFrame({"foo": [1, 2, 3, 4]})
>>> df.select(pl.col("foo").shift(1))
shape: (4, 1)
┌──────┐
│ foo  │
│ ---  │
│ i64  │
╞══════╡
│ null │
├╌╌╌╌╌╌┤
│ 1    │
├╌╌╌╌╌╌┤
│ 2    │
├╌╌╌╌╌╌┤
│ 3    │
└──────┘
shift_and_fill(periods: int, fill_value: int | float | bool | str | Expr | list[Any]) Expr[source]

Shift the values by a given period and fill the resulting null values.

Parameters:
periods

Number of places to shift (may be negative).

fill_value

Fill None values with the result of this expression.

Examples

>>> df = pl.DataFrame({"foo": [1, 2, 3, 4]})
>>> df.select(pl.col("foo").shift_and_fill(1, "a"))
shape: (4, 1)
┌─────┐
│ foo │
│ --- │
│ str │
╞═════╡
│ a   │
├╌╌╌╌╌┤
│ 1   │
├╌╌╌╌╌┤
│ 2   │
├╌╌╌╌╌┤
│ 3   │
└─────┘
shrink_dtype() Expr[source]

Shrink numeric columns to the minimal required datatype.

Shrink to the dtype needed to fit the extrema of this [Series]. This can be used to reduce memory pressure.

Examples

>>> pl.DataFrame(
...     {
...         "a": [1, 2, 3],
...         "b": [1, 2, 2 << 32],
...         "c": [-1, 2, 1 << 30],
...         "d": [-112, 2, 112],
...         "e": [-112, 2, 129],
...         "f": ["a", "b", "c"],
...         "g": [0.1, 1.32, 0.12],
...         "h": [True, None, False],
...     }
... ).select(pl.all().shrink_dtype())
shape: (3, 8)
┌─────┬────────────┬────────────┬──────┬──────┬─────┬──────┬───────┐
│ a   ┆ b          ┆ c          ┆ d    ┆ e    ┆ f   ┆ g    ┆ h     │
│ --- ┆ ---        ┆ ---        ┆ ---  ┆ ---  ┆ --- ┆ ---  ┆ ---   │
│ i8  ┆ i64        ┆ i32        ┆ i8   ┆ i16  ┆ str ┆ f32  ┆ bool  │
╞═════╪════════════╪════════════╪══════╪══════╪═════╪══════╪═══════╡
│ 1   ┆ 1          ┆ -1         ┆ -112 ┆ -112 ┆ a   ┆ 0.1  ┆ true  │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2   ┆ 2          ┆ 2          ┆ 2    ┆ 2    ┆ b   ┆ 1.32 ┆ null  │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 3   ┆ 8589934592 ┆ 1073741824 ┆ 112  ┆ 129  ┆ c   ┆ 0.12 ┆ false │
└─────┴────────────┴────────────┴──────┴──────┴─────┴──────┴───────┘
shuffle(seed: int | None = None) Expr[source]

Shuffle the contents of this expr.

Parameters:
seed

Seed for the random number generator. If set to None (default), a random seed is generated using the random module.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3]})
>>> df.select(pl.col("a").shuffle(seed=1))
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 2   │
├╌╌╌╌╌┤
│ 1   │
├╌╌╌╌╌┤
│ 3   │
└─────┘
sign() Expr[source]

Compute the element-wise indication of the sign.

Examples

>>> df = pl.DataFrame({"a": [-9.0, -0.0, 0.0, 4.0, None]})
>>> df.select(pl.col("a").sign())
shape: (5, 1)
┌──────┐
│ a    │
│ ---  │
│ i64  │
╞══════╡
│ -1   │
├╌╌╌╌╌╌┤
│ 0    │
├╌╌╌╌╌╌┤
│ 0    │
├╌╌╌╌╌╌┤
│ 1    │
├╌╌╌╌╌╌┤
│ null │
└──────┘
sin() Expr[source]

Compute the element-wise value for the sine.

Returns:
Series of dtype Float64

Examples

>>> df = pl.DataFrame({"a": [0.0]})
>>> df.select(pl.col("a").sin())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 0.0 │
└─────┘
sinh() Expr[source]

Compute the element-wise value for the hyperbolic sine.

Returns:
Series of dtype Float64

Examples

>>> df = pl.DataFrame({"a": [1.0]})
>>> df.select(pl.col("a").sinh())
shape: (1, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 1.175201 │
└──────────┘
skew(bias: bool = True) Expr[source]

Compute the sample skewness of a data set.

For normally distributed data, the skewness should be about zero. For unimodal continuous distributions, a skewness value greater than zero means that there is more weight in the right tail of the distribution. The function skewtest can be used to determine if the skewness value is close enough to zero, statistically speaking.

See scipy.stats for more information.

Parameters:
biasbool, optional

If False, the calculations are corrected for statistical bias.

Notes

The sample skewness is computed as the Fisher-Pearson coefficient of skewness, i.e.

\[g_1=\frac{m_3}{m_2^{3/2}}\]

where

\[m_i=\frac{1}{N}\sum_{n=1}^N(x[n]-\bar{x})^i\]

is the biased sample \(i\texttt{th}\) central moment, and \(\bar{x}\) is the sample mean. If bias is False, the calculations are corrected for bias and the value computed is the adjusted Fisher-Pearson standardized moment coefficient, i.e.

\[G_1 = \frac{k_3}{k_2^{3/2}} = \frac{\sqrt{N(N-1)}}{N-2}\frac{m_3}{m_2^{3/2}}\]

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3, 2, 1]})
>>> df.select(pl.col("a").skew())
shape: (1, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 0.343622 │
└──────────┘
slice(offset: int | Expr, length: int | Expr | None = None) Expr[source]

Get a slice of this expression.

Parameters:
offset

Start index. Negative indexing is supported.

length

Length of the slice. If set to None, all rows starting at the offset will be selected.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [8, 9, 10, 11],
...         "b": [None, 4, 4, 4],
...     }
... )
>>> df.select(pl.all().slice(1, 2))
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 9   ┆ 4   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 10  ┆ 4   │
└─────┴─────┘
sort(reverse: bool = False, nulls_last: bool = False) Expr[source]

Sort this column. In projection/ selection context the whole column is sorted.

If used in a groupby context, the groups are sorted.

Parameters:
reverse

False -> order from small to large. True -> order from large to small.

nulls_last

If True nulls are considered to be larger than any valid value

Examples

>>> df = pl.DataFrame(
...     {
...         "group": [
...             "one",
...             "one",
...             "one",
...             "two",
...             "two",
...             "two",
...         ],
...         "value": [1, 98, 2, 3, 99, 4],
...     }
... )
>>> df.select(pl.col("value").sort())
shape: (6, 1)
┌───────┐
│ value │
│ ---   │
│ i64   │
╞═══════╡
│ 1     │
├╌╌╌╌╌╌╌┤
│ 2     │
├╌╌╌╌╌╌╌┤
│ 3     │
├╌╌╌╌╌╌╌┤
│ 4     │
├╌╌╌╌╌╌╌┤
│ 98    │
├╌╌╌╌╌╌╌┤
│ 99    │
└───────┘
>>> df.select(pl.col("value").sort())
shape: (6, 1)
┌───────┐
│ value │
│ ---   │
│ i64   │
╞═══════╡
│ 1     │
├╌╌╌╌╌╌╌┤
│ 2     │
├╌╌╌╌╌╌╌┤
│ 3     │
├╌╌╌╌╌╌╌┤
│ 4     │
├╌╌╌╌╌╌╌┤
│ 98    │
├╌╌╌╌╌╌╌┤
│ 99    │
└───────┘
>>> df.groupby("group").agg(pl.col("value").sort())  
shape: (2, 2)
┌───────┬────────────┐
│ group ┆ value      │
│ ---   ┆ ---        │
│ str   ┆ list[i64]  │
╞═══════╪════════════╡
│ two   ┆ [3, 4, 99] │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ one   ┆ [1, 2, 98] │
└───────┴────────────┘
sort_by(by: Expr | str | list[Expr | str], reverse: bool | list[bool] = False) Expr[source]

Sort this column by the ordering of another column, or multiple other columns.

In projection/ selection context the whole column is sorted. If used in a groupby context, the groups are sorted.

Parameters:
by

The column(s) used for sorting.

reverse

False -> order from small to large. True -> order from large to small.

Examples

>>> df = pl.DataFrame(
...     {
...         "group": [
...             "one",
...             "one",
...             "one",
...             "two",
...             "two",
...             "two",
...         ],
...         "value": [1, 98, 2, 3, 99, 4],
...     }
... )
>>> df.select(pl.col("group").sort_by("value"))
shape: (6, 1)
┌───────┐
│ group │
│ ---   │
│ str   │
╞═══════╡
│ one   │
├╌╌╌╌╌╌╌┤
│ one   │
├╌╌╌╌╌╌╌┤
│ two   │
├╌╌╌╌╌╌╌┤
│ two   │
├╌╌╌╌╌╌╌┤
│ one   │
├╌╌╌╌╌╌╌┤
│ two   │
└───────┘
sqrt() Expr[source]

Compute the square root of the elements.

Examples

>>> df = pl.DataFrame({"values": [1.0, 2.0, 4.0]})
>>> df.select(pl.col("values").sqrt())
shape: (3, 1)
┌──────────┐
│ values   │
│ ---      │
│ f64      │
╞══════════╡
│ 1.0      │
├╌╌╌╌╌╌╌╌╌╌┤
│ 1.414214 │
├╌╌╌╌╌╌╌╌╌╌┤
│ 2.0      │
└──────────┘
std(ddof: int = 1) Expr[source]

Get standard deviation.

Parameters:
ddof

Degrees of freedom.

Examples

>>> df = pl.DataFrame({"a": [-1, 0, 1]})
>>> df.select(pl.col("a").std())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.0 │
└─────┘
suffix(suffix: str) Expr[source]

Add a suffix to the root column name of the expression.

Examples

>>> df = pl.DataFrame(
...     {
...         "A": [1, 2, 3, 4, 5],
...         "fruits": ["banana", "banana", "apple", "apple", "banana"],
...         "B": [5, 4, 3, 2, 1],
...         "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
...     }
... )
>>> df
shape: (5, 4)
┌─────┬────────┬─────┬────────┐
│ A   ┆ fruits ┆ B   ┆ cars   │
│ --- ┆ ---    ┆ --- ┆ ---    │
│ i64 ┆ str    ┆ i64 ┆ str    │
╞═════╪════════╪═════╪════════╡
│ 1   ┆ banana ┆ 5   ┆ beetle │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2   ┆ banana ┆ 4   ┆ audi   │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 3   ┆ apple  ┆ 3   ┆ beetle │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 4   ┆ apple  ┆ 2   ┆ beetle │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 5   ┆ banana ┆ 1   ┆ beetle │
└─────┴────────┴─────┴────────┘
>>> df.select(
...     [
...         pl.all(),
...         pl.all().reverse().suffix("_reverse"),
...     ]
... )
shape: (5, 8)
┌─────┬────────┬─────┬────────┬───────────┬────────────────┬───────────┬──────────────┐
│ A   ┆ fruits ┆ B   ┆ cars   ┆ A_reverse ┆ fruits_reverse ┆ B_reverse ┆ cars_reverse │
│ --- ┆ ---    ┆ --- ┆ ---    ┆ ---       ┆ ---            ┆ ---       ┆ ---          │
│ i64 ┆ str    ┆ i64 ┆ str    ┆ i64       ┆ str            ┆ i64       ┆ str          │
╞═════╪════════╪═════╪════════╪═══════════╪════════════════╪═══════════╪══════════════╡
│ 1   ┆ banana ┆ 5   ┆ beetle ┆ 5         ┆ banana         ┆ 1         ┆ beetle       │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2   ┆ banana ┆ 4   ┆ audi   ┆ 4         ┆ apple          ┆ 2         ┆ beetle       │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3   ┆ apple  ┆ 3   ┆ beetle ┆ 3         ┆ apple          ┆ 3         ┆ beetle       │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4   ┆ apple  ┆ 2   ┆ beetle ┆ 2         ┆ banana         ┆ 4         ┆ audi         │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 5   ┆ banana ┆ 1   ┆ beetle ┆ 1         ┆ banana         ┆ 5         ┆ beetle       │
└─────┴────────┴─────┴────────┴───────────┴────────────────┴───────────┴──────────────┘
sum() Expr[source]

Get sum value.

Notes

Dtypes in {Int8, UInt8, Int16, UInt16} are cast to Int64 before summing to prevent overflow issues.

Examples

>>> df = pl.DataFrame({"a": [-1, 0, 1]})
>>> df.select(pl.col("a").sum())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│  0  │
└─────┘
tail(n: int = 10) Expr[source]

Get the last n rows.

Parameters:
n

Number of rows to return.

Examples

>>> df = pl.DataFrame({"foo": [1, 2, 3, 4, 5, 6, 7]})
>>> df.tail(3)
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 5   │
├╌╌╌╌╌┤
│ 6   │
├╌╌╌╌╌┤
│ 7   │
└─────┘
take(indices: int | list[int] | Expr | Series | numpy.ndarray[Any, Any]) Expr[source]

Take values by index.

Parameters:
indices

An expression that leads to a UInt32 dtyped Series.

Returns:
Values taken by index

Examples

>>> df = pl.DataFrame(
...     {
...         "group": [
...             "one",
...             "one",
...             "one",
...             "two",
...             "two",
...             "two",
...         ],
...         "value": [1, 98, 2, 3, 99, 4],
...     }
... )
>>> df.groupby("group", maintain_order=True).agg(pl.col("value").take(1))
shape: (2, 2)
┌───────┬───────┐
│ group ┆ value │
│ ---   ┆ ---   │
│ str   ┆ i64   │
╞═══════╪═══════╡
│ one   ┆ 98    │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ two   ┆ 99    │
└───────┴───────┘
take_every(n: int) Expr[source]

Take every nth value in the Series and return as a new Series.

Examples

>>> df = pl.DataFrame({"foo": [1, 2, 3, 4, 5, 6, 7, 8, 9]})
>>> df.select(pl.col("foo").take_every(3))
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 1   │
├╌╌╌╌╌┤
│ 4   │
├╌╌╌╌╌┤
│ 7   │
└─────┘
tan() Expr[source]

Compute the element-wise value for the tangent.

Returns:
Series of dtype Float64

Examples

>>> df = pl.DataFrame({"a": [1.0]})
>>> df.select(pl.col("a").tan().round(2))
shape: (1, 1)
┌──────┐
│ a    │
│ ---  │
│ f64  │
╞══════╡
│ 1.56 │
└──────┘
tanh() Expr[source]

Compute the element-wise value for the hyperbolic tangent.

Returns:
Series of dtype Float64

Examples

>>> df = pl.DataFrame({"a": [1.0]})
>>> df.select(pl.col("a").tanh())
shape: (1, 1)
┌──────────┐
│ a        │
│ ---      │
│ f64      │
╞══════════╡
│ 0.761594 │
└──────────┘
to_physical() Expr[source]

Cast to physical representation of the logical dtype.

Examples

Replicating the pandas pd.factorize function.

>>> pl.DataFrame({"vals": ["a", "x", None, "a"]}).with_columns(
...     [
...         pl.col("vals").cast(pl.Categorical),
...         pl.col("vals")
...         .cast(pl.Categorical)
...         .to_physical()
...         .alias("vals_physical"),
...     ]
... )
shape: (4, 2)
┌──────┬───────────────┐
│ vals ┆ vals_physical │
│ ---  ┆ ---           │
│ cat  ┆ u32           │
╞══════╪═══════════════╡
│ a    ┆ 0             │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ x    ┆ 1             │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ null ┆ null          │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ a    ┆ 0             │
└──────┴───────────────┘
top_k(k: int = 5, reverse: bool = False) Expr[source]

Return the k largest elements.

If ‘reverse=True` the smallest elements will be given.

This has time complexity:

\[\begin{split}O(n + k \\log{}n - \frac{k}{2})\end{split}\]
Parameters:
k

Number of elements to return.

reverse

Return the smallest elements.

Examples

>>> df = pl.DataFrame(
...     {
...         "value": [1, 98, 2, 3, 99, 4],
...     }
... )
>>> df.select(
...     [
...         pl.col("value").top_k().alias("top_k"),
...         pl.col("value").top_k(reverse=True).alias("bottom_k"),
...     ]
... )
shape: (5, 2)
┌───────┬──────────┐
│ top_k ┆ bottom_k │
│ ---   ┆ ---      │
│ i64   ┆ i64      │
╞═══════╪══════════╡
│ 99    ┆ 1        │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 98    ┆ 2        │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 4     ┆ 3        │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 3     ┆ 4        │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 2     ┆ 98       │
└───────┴──────────┘
unique(maintain_order: bool = False) Expr[source]

Get unique values of this expression.

Parameters:
maintain_order

Maintain order of data. This requires more work.

Examples

>>> df = pl.DataFrame({"a": [1, 1, 2]})
>>> df.select(pl.col("a").unique())  
shape: (2, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 2   │
├╌╌╌╌╌┤
│ 1   │
└─────┘
>>> df.select(pl.col("a").unique(maintain_order=True))
shape: (2, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
├╌╌╌╌╌┤
│ 2   │
└─────┘
unique_counts() Expr[source]

Return a count of the unique values in the order of appearance.

This method differs from value_counts in that it does not return the values, only the counts and might be faster

Examples

>>> df = pl.DataFrame(
...     {
...         "id": ["a", "b", "b", "c", "c", "c"],
...     }
... )
>>> df.select(
...     [
...         pl.col("id").unique_counts(),
...     ]
... )
shape: (3, 1)
┌─────┐
│ id  │
│ --- │
│ u32 │
╞═════╡
│ 1   │
├╌╌╌╌╌┤
│ 2   │
├╌╌╌╌╌┤
│ 3   │
└─────┘
upper_bound() Expr[source]

Calculate the upper bound.

Returns a unit Series with the highest value possible for the dtype of this expression.

Examples

>>> df = pl.DataFrame({"a": [1, 2, 3, 2, 1]})
>>> df.select(pl.col("a").upper_bound())
shape: (1, 1)
┌─────────────────────┐
│ a                   │
│ ---                 │
│ i64                 │
╞═════════════════════╡
│ 9223372036854775807 │
└─────────────────────┘
value_counts(multithreaded: bool = False, sort: bool = False) Expr[source]

Count all unique values and create a struct mapping value to count.

Parameters:
multithreaded:

Better to turn this off in the aggregation context, as it can lead to contention.

sort:

Ensure the output is sorted from most values to least.

Returns:
Dtype Struct

Examples

>>> df = pl.DataFrame(
...     {
...         "id": ["a", "b", "b", "c", "c", "c"],
...     }
... )
>>> df.select(
...     [
...         pl.col("id").value_counts(sort=True),
...     ]
... )
shape: (3, 1)
┌───────────┐
│ id        │
│ ---       │
│ struct[2] │
╞═══════════╡
│ {"c",3}   │
├╌╌╌╌╌╌╌╌╌╌╌┤
│ {"b",2}   │
├╌╌╌╌╌╌╌╌╌╌╌┤
│ {"a",1}   │
└───────────┘
var(ddof: int = 1) Expr[source]

Get variance.

Parameters:
ddof

Degrees of freedom.

Examples

>>> df = pl.DataFrame({"a": [-1, 0, 1]})
>>> df.select(pl.col("a").var())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.0 │
└─────┘
where(predicate: Expr) Expr[source]

Filter a single column.

Alias for filter().

Parameters:
predicate

Boolean expression.

Examples

>>> df = pl.DataFrame(
...     {
...         "group_col": ["g1", "g1", "g2"],
...         "b": [1, 2, 3],
...     }
... )
>>> (
...     df.groupby("group_col").agg(
...         [
...             pl.col("b").where(pl.col("b") < 2).sum().alias("lt"),
...             pl.col("b").where(pl.col("b") >= 2).sum().alias("gte"),
...         ]
...     )
... ).sort("group_col")
shape: (2, 3)
┌───────────┬──────┬─────┐
│ group_col ┆ lt   ┆ gte │
│ ---       ┆ ---  ┆ --- │
│ str       ┆ i64  ┆ i64 │
╞═══════════╪══════╪═════╡
│ g1        ┆ 1    ┆ 2   │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┤
│ g2        ┆ null ┆ 3   │
└───────────┴──────┴─────┘