Datetime namespace
List namespace
String namespace
Struct namespace
Clip (limit) the values in an array to any value that fits in 64 floating point range. Only works for the following dtypes: {Int32, Int64, Float32, Float64, UInt32}. If you want to clip other dtypes, consider writing a when -> then -> otherwise expression
Minimum value
Maximum value
Sample from this DataFrame by setting either n
or frac
.
Optional
opts: { Optional
seed?: number | bigintOptional
withOptional
opts: { Optional
seed?: number | bigintOptional
withOptional
n: numberOptional
frac: numberOptional
withReplacement: booleanOptional
seed: number | bigintGet the group indexes of the group by operation. Should be used in aggregation context only.
>>> const df = pl.DataFrame(
... {
... "group": [
... "one",
... "one",
... "one",
... "two",
... "two",
... "two",
... ],
... "value": [94, 95, 96, 97, 97, 99],
... }
... )
>>> df.group_by("group", maintainOrder=True).agg(pl.col("value").aggGroups())
shape: (2, 2)
┌───────┬───────────┐
│ group ┆ value │
│ --- ┆ --- │
│ str ┆ list[u32] │
╞═══════╪═══════════╡
│ one ┆ [0, 1, 2] │
│ two ┆ [3, 4, 5] │
└───────┴───────────┘
Rename the output of an expression.
new name
> const df = pl.DataFrame({
... "a": [1, 2, 3],
... "b": ["a", "b", None],
... });
> df
shape: (3, 2)
╭─────┬──────╮
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪══════╡
│ 1 ┆ "a" │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ "b" │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3 ┆ null │
╰─────┴──────╯
> df.select([
... pl.col("a").alias("bar"),
... pl.col("b").alias("foo"),
... ])
shape: (3, 2)
╭─────┬──────╮
│ bar ┆ foo │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪══════╡
│ 1 ┆ "a" │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ "b" │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3 ┆ null │
╰─────┴──────╯
Get the index values that would sort this column.
Optional
descending: booleanfalse -> order from small to large. - true -> order from large to small.
Optional
maintainOrder: booleanUInt32 Series
Optional
maintainOptional
reverse?: booleanOptional
descending?: booleanOptional
maintainCalculate the n-th discrete difference.
number of slots to shift
ignore or drop
Exponentially-weighted moving average.
Expr that evaluates to a float 64 Series.
Optional
alpha: numberOptional
adjust: booleanOptional
minPeriods: numberOptional
bias: booleanOptional
ignoreNulls: booleanOptional
adjust?: booleanOptional
alpha?: numberOptional
bias?: booleanOptional
ignoreOptional
minExponentially-weighted standard deviation.
Expr that evaluates to a float 64 Series.
Optional
alpha: numberOptional
adjust: booleanOptional
minPeriods: numberOptional
bias: booleanOptional
ignoreNulls: booleanOptional
adjust?: booleanOptional
alpha?: numberOptional
bias?: booleanOptional
ignoreOptional
minExponentially-weighted variance.
Expr that evaluates to a float 64 Series.
Optional
alpha: numberOptional
adjust: booleanOptional
minPeriods: numberOptional
bias: booleanOptional
ignoreNulls: booleanOptional
adjust?: booleanOptional
alpha?: numberOptional
bias?: booleanOptional
ignoreOptional
minExclude certain columns from a wildcard/regex selection.
You may also use regexes in the exclude list. They must start with ^
and end with $
.
Rest
...columns: string[]Column(s) to exclude from selection
> const df = pl.DataFrame({
... "a": [1, 2, 3],
... "b": ["a", "b", None],
... "c": [None, 2, 1],
...});
> df
shape: (3, 3)
╭─────┬──────┬──────╮
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 │
╞═════╪══════╪══════╡
│ 1 ┆ "a" ┆ null │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ "b" ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3 ┆ null ┆ 1 │
╰─────┴──────┴──────╯
> df.select(
... pl.col("*").exclude("b"),
... );
shape: (3, 2)
╭─────┬──────╮
│ a ┆ c │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪══════╡
│ 1 ┆ null │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3 ┆ 1 │
╰─────┴──────╯
Extend the Series with given number of values.
The value to extend the Series with. This value may be null to fill with nulls.
The number of values to extend.
Extend the Series with given number of values.
The value to extend the Series with. This value may be null to fill with nulls.
The number of values to extend.
Hash the Series.
Optional
k0: numberOptional
k1: numberOptional
k2: numberOptional
k3: numberOptional
k0?: numberOptional
k1?: numberOptional
k2?: numberOptional
k3?: numberCheck if elements of this Series are in the right Series, or List values of the right Series.
Series of primitive type or List type.
Expr that evaluates to a Boolean Series.
> const df = pl.DataFrame({
... "sets": [[1, 2, 3], [1, 2], [9, 10]],
... "optional_members": [1, 2, 3]
... });
> df.select(
... pl.col("optional_members").isIn("sets").alias("contains")
... );
shape: (3, 1)
┌──────────┐
│ contains │
│ --- │
│ bool │
╞══════════╡
│ true │
├╌╌╌╌╌╌╌╌╌╌┤
│ true │
├╌╌╌╌╌╌╌╌╌╌┤
│ false │
└──────────┘
Keep the original root name of the expression.
A groupby aggregation often changes the name of a column.
With keepName
we can keep the original name of the column
> const df = pl.DataFrame({
... "a": [1, 2, 3],
... "b": ["a", "b", None],
... });
> df
... .groupBy("a")
... .agg(pl.col("b").list())
... .sort({by:"a"});
shape: (3, 2)
╭─────┬────────────╮
│ a ┆ b_agg_list │
│ --- ┆ --- │
│ i64 ┆ list [str] │
╞═════╪════════════╡
│ 1 ┆ [a] │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ [b] │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ [null] │
╰─────┴────────────╯
Keep the original column name:
> df
... .groupby("a")
... .agg(col("b").list().keepName())
... .sort({by:"a"})
shape: (3, 2)
╭─────┬────────────╮
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ list [str] │
╞═════╪════════════╡
│ 1 ┆ [a] │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ [b] │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ [null] │
╰─────┴────────────╯
Apply window function over a subgroup.
This is similar to a groupby + aggregation + self join. Or similar to window functions in Postgres
Rest
...partitionBy: ExprOrString[]Column(s) to partition by.
> const df = pl.DataFrame({
... "groups": [1, 1, 2, 2, 1, 2, 3, 3, 1],
... "values": [1, 2, 3, 4, 5, 6, 7, 8, 8],
... });
> df.select(
... pl.col("groups").sum().over("groups")
... );
╭────────┬────────╮
│ groups ┆ values │
│ --- ┆ --- │
│ i32 ┆ i32 │
╞════════╪════════╡
│ 1 ┆ 16 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1 ┆ 16 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ 13 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ 13 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ ... ┆ ... │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1 ┆ 16 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ 13 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 3 ┆ 15 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 3 ┆ 15 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1 ┆ 16 │
╰────────┴────────╯
Add a prefix the to root column name of the expression.
> const df = pl.DataFrame({
... "A": [1, 2, 3, 4, 5],
... "fruits": ["banana", "banana", "apple", "apple", "banana"],
... "B": [5, 4, 3, 2, 1],
... "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
... });
shape: (5, 4)
╭─────┬──────────┬─────┬──────────╮
│ A ┆ fruits ┆ B ┆ cars │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞═════╪══════════╪═════╪══════════╡
│ 1 ┆ "banana" ┆ 5 ┆ "beetle" │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ "banana" ┆ 4 ┆ "audi" │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ "apple" ┆ 3 ┆ "beetle" │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 4 ┆ "apple" ┆ 2 ┆ "beetle" │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 5 ┆ "banana" ┆ 1 ┆ "beetle" │
╰─────┴──────────┴─────┴──────────╯
> df.select(
... pl.col("*").reverse().prefix("reverse_"),
... )
shape: (5, 8)
╭───────────┬────────────────┬───────────┬──────────────╮
│ reverse_A ┆ reverse_fruits ┆ reverse_B ┆ reverse_cars │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str │
╞═══════════╪════════════════╪═══════════╪══════════════╡
│ 5 ┆ "banana" ┆ 1 ┆ "beetle" │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4 ┆ "apple" ┆ 2 ┆ "beetle" │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ "apple" ┆ 3 ┆ "beetle" │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ "banana" ┆ 4 ┆ "audi" │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ "banana" ┆ 5 ┆ "beetle" │
╰───────────┴────────────────┴───────────┴──────────────╯
Assign ranks to data, dealing with ties appropriately.
Optional
method: RankMethodOptional
descending: booleanReplace the given values by different values of the same data type.
Value or sequence of values to replace. Accepts expression input. Sequences are parsed as Series, other non-expression inputs are parsed as literals.
Value or sequence of values to replace by.
Accepts expression input. Sequences are parsed as Series, other non-expression inputs are parsed as literals.
Length must match the length of old
or have length 1.
Replace a single value by another value. Values that were not replaced remain unchanged.
>>> const df = pl.DataFrame({"a": [1, 2, 2, 3]});
>>> df.withColumns(pl.col("a").replace(2, 100).alias("replaced"));
shape: (4, 2)
┌─────┬──────────┐
│ a ┆ replaced │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪══════════╡
│ 1 ┆ 1 │
│ 2 ┆ 100 │
│ 2 ┆ 100 │
│ 3 ┆ 3 │
└─────┴──────────┘
Replace multiple values by passing sequences to the old
and new_
parameters.
>>> df.withColumns(pl.col("a").replace([2, 3], [100, 200]).alias("replaced"));
shape: (4, 2)
┌─────┬──────────┐
│ a ┆ replaced │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪══════════╡
│ 1 ┆ 1 │
│ 2 ┆ 100 │
│ 2 ┆ 100 │
│ 3 ┆ 200 │
└─────┴──────────┘
Passing a mapping with replacements is also supported as syntactic sugar. Specify a default to set all values that were not matched.
>>> const mapping = {2: 100, 3: 200};
>>> df.withColumns(pl.col("a").replace({ old: mapping }).alias("replaced");
shape: (4, 2)
┌─────┬──────────┐
│ a ┆ replaced │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪══════════╡
│ 1 ┆ -1 │
│ 2 ┆ 100 │
│ 2 ┆ 100 │
│ 3 ┆ 200 │
└─────┴──────────┘
Replace values by different values.
Value or sequence of values to replace. Accepts expression input. Sequences are parsed as Series, other non-expression inputs are parsed as literals.
Value or sequence of values to replace by.
Accepts expression input. Sequences are parsed as Series, other non-expression inputs are parsed as literals.
Length must match the length of old
or have length 1.
Optional
default_: Set values that were not replaced to this value. Defaults to keeping the original value. Accepts expression input. Non-expression inputs are parsed as literals.
Optional
returnDtype: DataTypeThe data type of the resulting expression. If set to None
(default), the data type is determined automatically based on the other inputs.
Replace a single value by another value. Values that were not replaced remain unchanged.
>>> const df = pl.DataFrame({"a": [1, 2, 2, 3]});
>>> df.withColumns(pl.col("a").replace(2, 100).alias("replaced"));
shape: (4, 2)
┌─────┬──────────┐
│ a ┆ replaced │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪══════════╡
│ 1 ┆ 1 │
│ 2 ┆ 100 │
│ 2 ┆ 100 │
│ 3 ┆ 3 │
└─────┴──────────┘
Replace multiple values by passing sequences to the old
and new_
parameters.
>>> df.withColumns(pl.col("a").replace([2, 3], [100, 200]).alias("replaced"));
shape: (4, 2)
┌─────┬──────────┐
│ a ┆ replaced │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪══════════╡
│ 1 ┆ 1 │
│ 2 ┆ 100 │
│ 2 ┆ 100 │
│ 3 ┆ 200 │
└─────┴──────────┘
Passing a mapping with replacements is also supported as syntactic sugar. Specify a default to set all values that were not matched.
>>> const mapping = {2: 100, 3: 200};
>>> df.withColumns(pl.col("a").replaceStrict({ old: mapping, default_: -1, returnDtype: pl.Int64 }).alias("replaced");
shape: (4, 2)
┌─────┬──────────┐
│ a ┆ replaced │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪══════════╡
│ 1 ┆ -1 │
│ 2 ┆ 100 │
│ 2 ┆ 100 │
│ 3 ┆ 200 │
└─────┴──────────┘
Replacing by values of a different data type sets the return type based on
a combination of the new_
data type and either the original data type or the
default data type if it was set.
>>> const df = pl.DataFrame({"a": ["x", "y", "z"]});
>>> const mapping = {"x": 1, "y": 2, "z": 3};
>>> df.withColumns(pl.col("a").replaceStrict({ old: mapping }).alias("replaced"));
shape: (3, 2)
┌─────┬──────────┐
│ a ┆ replaced │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪══════════╡
│ x ┆ 1 │
│ y ┆ 2 │
│ z ┆ 3 │
└─────┴──────────┘
>>> df.withColumns(pl.col("a").replaceStrict({ old: mapping, default_: None }).alias("replaced"));
shape: (3, 2)
┌─────┬──────────┐
│ a ┆ replaced │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪══════════╡
│ x ┆ 1 │
│ y ┆ 2 │
│ z ┆ 3 │
└─────┴──────────┘
Set the returnDtype
parameter to control the resulting data type directly.
>>> df.withColumns(pl.col("a").replaceStrict({ old: mapping, returnDtype: pl.UInt8 }).alias("replaced"));
shape: (3, 2)
┌─────┬──────────┐
│ a ┆ replaced │
│ --- ┆ --- │
│ str ┆ u8 │
╞═════╪══════════╡
│ x ┆ 1 │
│ y ┆ 2 │
│ z ┆ 3 │
└─────┴──────────┘
Expression input is supported for all parameters.
>>> const df = pl.DataFrame({"a": [1, 2, 2, 3], "b": [1.5, 2.5, 5.0, 1.0]});
>>> df.withColumns(
... pl.col("a").replaceStrict({
... old: pl.col("a").max(),
... new_: pl.col("b").sum(),
... default_: pl.col("b"),
... }).alias("replaced")
... );
shape: (4, 3)
┌─────┬─────┬──────────┐
│ a ┆ b ┆ replaced │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 │
╞═════╪═════╪══════════╡
│ 1 ┆ 1.5 ┆ 1.5 │
│ 2 ┆ 2.5 ┆ 2.5 │
│ 2 ┆ 5.0 ┆ 5.0 │
│ 3 ┆ 1.0 ┆ 10.0 │
└─────┴─────┴──────────┘
Serializes object to desired format via serde
Shift the values by a given period and fill the parts that will be empty due to this operation
Optional
periods: numbernumber of places to shift (may be negative).
Shift the values by a given period and fill the parts that will be empty due to this operation
Number of places to shift (may be negative).
Fill null values with the result of this expression.
Compute the sample skewness of a data set. For normally distributed data, the skewness should be about zero. For unimodal continuous distributions, a skewness value greater than zero means that there is more weight in the right tail of the distribution.
Optional
bias: booleanIf False, then the calculations are corrected for statistical bias.
Sort this column. In projection/ selection context the whole column is sorted.
Optional
descending: booleanOptional
nullsLast: booleanIf true nulls are considered to be larger than any valid value
Optional
descending?: booleanOptional
nullsOptional
nullsOptional
reverse?: booleanSort this column by the ordering of another column, or multiple other columns. In projection/ selection context the whole column is sorted. If used in a groupby context, the groups are sorted.
The column(s) used for sorting.
Optional
descending: boolean | boolean[]false -> order from small to large. true -> order from large to small.
Optional
descending?: boolean | boolean[]Optional
reverse?: boolean | boolean[]Apply a rolling max (moving max) over the values in this Series.
A window of length window_size
will traverse the series. The values that fill this window
will (optionally) be multiplied with the weights given by the weight
vector.
The resulting parameters' values will be aggregated into their sum.
Optional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanApply a rolling mean (moving mean) over the values in this Series.
A window of length window_size
will traverse the series. The values that fill this window
will (optionally) be multiplied with the weights given by the weight
vector.
The resulting parameters' values will be aggregated into their sum.
Optional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanApply a rolling min (moving min) over the values in this Series.
A window of length window_size
will traverse the series. The values that fill this window
will (optionally) be multiplied with the weights given by the weight
vector.
The resulting parameters' values will be aggregated into their sum.
Optional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanCompute a rolling quantile
Optional
interpolation: InterpolationMethodOptional
windowSize: numberOptional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanOptional
by: stringOptional
closed: ClosedWindowCompute a rolling skew
Size of the rolling window
Optional
bias: booleanIf false, then the calculations are corrected for statistical bias.
Compute a rolling skew
Compute a rolling std dev
A window of length window_size
will traverse the array. The values that fill this window
will (optionally) be multiplied with the weights given by the weight
vector. The resulting
values will be aggregated to their sum.
Optional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanOptional
ddof: numberApply a rolling sum (moving sum) over the values in this Series.
A window of length window_size
will traverse the series. The values that fill this window
will (optionally) be multiplied with the weights given by the weight
vector.
The resulting parameters' values will be aggregated into their sum.
Optional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanCompute a rolling variance.
A window of length window_size
will traverse the series. The values that fill this window
will (optionally) be multiplied with the weights given by the weight
vector.
The resulting parameters' values will be aggregated into their sum.
Optional
weights: number[]Optional
minPeriods: number[]Optional
center: booleanOptional
ddof: number
Expressions that can be used in various contexts.