polars.align_frames#

polars.align_frames(*frames: DataFrame, on: Union[str, Expr, Sequence[str], Sequence[Expr], Sequence[str | polars.internals.expr.expr.Expr]], select: Optional[Union[str, Expr, Sequence[str | polars.internals.expr.expr.Expr]]] = None, reverse: Union[bool, Sequence[bool]] = False) list[polars.internals.dataframe.frame.DataFrame][source]#
polars.align_frames(*frames: LazyFrame, on: Union[str, Expr, Sequence[str], Sequence[Expr], Sequence[str | polars.internals.expr.expr.Expr]], select: Optional[Union[str, Expr, Sequence[str | polars.internals.expr.expr.Expr]]] = None, reverse: Union[bool, Sequence[bool]] = False) list[polars.internals.lazyframe.frame.LazyFrame]

Align a sequence of frames using the uique values from one or more columns as a key.

Frames that do not contain the given key values have rows injected (with nulls filling the non-key columns), and each resulting frame is sorted by the key.

The original column order of input frames is not changed unless select is specified (in which case the final column order is determined from that).

Note that this does not result in a joined frame - you receive the same number of frames back that you passed in, but each is now aligned by key and has the same number of rows.

Parameters:
frames

sequence of DataFrames or LazyFrames.

on

one or more columns whose unique values will be used to align the frames.

select

optional post-alignment column select to constrain and/or order the columns returned from the newly aligned frames.

reverse

sort the alignment column values in descending order; can be a single boolean or a list of booleans associated with each column in on.

Examples

>>> df1 = pl.DataFrame(
...     {
...         "dt": [date(2022, 9, 1), date(2022, 9, 2), date(2022, 9, 3)],
...         "x": [3.5, 4.0, 1.0],
...         "y": [10.0, 2.5, 1.5],
...     }
... )
>>> df2 = pl.DataFrame(
...     {
...         "dt": [date(2022, 9, 2), date(2022, 9, 3), date(2022, 9, 1)],
...         "x": [8.0, 1.0, 3.5],
...         "y": [1.5, 12.0, 5.0],
...     }
... )
>>> df3 = pl.DataFrame(
...     {
...         "dt": [date(2022, 9, 3), date(2022, 9, 2)],
...         "x": [2.0, 5.0],
...         "y": [2.5, 2.0],
...     }
... )  
#
# df1                              df2                              df3
# shape: (3, 3)                    shape: (3, 3)                    shape: (2, 3)
# ┌────────────┬─────┬──────┐      ┌────────────┬─────┬──────┐      ┌────────────┬─────┬─────┐
# │ dt         ┆ x   ┆ y    │      │ dt         ┆ x   ┆ y    │      │ dt         ┆ x   ┆ y   │
# │ ---        ┆ --- ┆ ---  │      │ ---        ┆ --- ┆ ---  │      │ ---        ┆ --- ┆ --- │
# │ date       ┆ f64 ┆ f64  │      │ date       ┆ f64 ┆ f64  │      │ date       ┆ f64 ┆ f64 │
# ╞════════════╪═════╪══════╡      ╞════════════╪═════╪══════╡      ╞════════════╪═════╪═════╡
# │ 2022-09-01 ┆ 3.5 ┆ 10.0 │\  ,->│ 2022-09-02 ┆ 8.0 ┆ 1.5  │\  ,->│ 2022-09-03 ┆ 2.0 ┆ 2.5 │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤ \/   ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤ \/   ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
# │ 2022-09-02 ┆ 4.0 ┆ 2.5  │_/\,->│ 2022-09-03 ┆ 1.0 ┆ 12.0 │_/`-->│ 2022-09-02 ┆ 5.0 ┆ 2.0 │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤  /\  ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤      └────────────┴─────┴─────┘
# │ 2022-09-03 ┆ 1.0 ┆ 1.5  │_/  `>│ 2022-09-01 ┆ 3.5 ┆ 5.0  │-//-
# └────────────┴─────┴──────┘      └────────────┴─────┴──────┘
...
>>> # align frames by the "dt" column:
>>> af1, af2, af3 = pl.align_frames(
...     df1, df2, df3, on="dt"
... )  
#
# df1                              df2                              df3
# shape: (3, 3)                    shape: (3, 3)                    shape: (3, 3)
# ┌────────────┬─────┬──────┐      ┌────────────┬─────┬──────┐      ┌────────────┬──────┬──────┐
# │ dt         ┆ x   ┆ y    │      │ dt         ┆ x   ┆ y    │      │ dt         ┆ x    ┆ y    │
# │ ---        ┆ --- ┆ ---  │      │ ---        ┆ --- ┆ ---  │      │ ---        ┆ ---  ┆ ---  │
# │ date       ┆ f64 ┆ f64  │      │ date       ┆ f64 ┆ f64  │      │ date       ┆ f64  ┆ f64  │
# ╞════════════╪═════╪══════╡      ╞════════════╪═════╪══════╡      ╞════════════╪══════╪══════╡
# │ 2022-09-01 ┆ 3.5 ┆ 10.0 │----->│ 2022-09-01 ┆ 3.5 ┆ 5.0  │----->│ 2022-09-01 ┆ null ┆ null │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤      ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤      ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ 2022-09-02 ┆ 4.0 ┆ 2.5  │----->│ 2022-09-02 ┆ 8.0 ┆ 1.5  │----->│ 2022-09-02 ┆ 5.0  ┆ 2.0  │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤      ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤      ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ 2022-09-03 ┆ 1.0 ┆ 1.5  │----->│ 2022-09-03 ┆ 1.0 ┆ 12.0 │----->│ 2022-09-03 ┆ 2.0  ┆ 2.5  │
# └────────────┴─────┴──────┘      └────────────┴─────┴──────┘      └────────────┴──────┴──────┘
...
>>> # align frames by "dt", but keep only cols "x" and "y":
>>> af1, af2, af3 = pl.align_frames(
...     df1, df2, df3, on="dt", select=["x", "y"]
... )  
#
# af1                 af2                 af3
# shape: (3, 3)       shape: (3, 3)       shape: (3, 3)
# ┌─────┬──────┐      ┌─────┬──────┐      ┌──────┬──────┐
# │ x   ┆ y    │      │ x   ┆ y    │      │ x    ┆ y    │
# │ --- ┆ ---  │      │ --- ┆ ---  │      │ ---  ┆ ---  │
# │ f64 ┆ f64  │      │ f64 ┆ f64  │      │ f64  ┆ f64  │
# ╞═════╪══════╡      ╞═════╪══════╡      ╞══════╪══════╡
# │ 3.5 ┆ 10.0 │      │ 3.5 ┆ 5.0  │      │ null ┆ null │
# ├╌╌╌╌╌┼╌╌╌╌╌╌┤      ├╌╌╌╌╌┼╌╌╌╌╌╌┤      ├╌╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ 4.0 ┆ 2.5  │      │ 8.0 ┆ 1.5  │      │ 5.0  ┆ 2.0  │
# ├╌╌╌╌╌┼╌╌╌╌╌╌┤      ├╌╌╌╌╌┼╌╌╌╌╌╌┤      ├╌╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ 1.0 ┆ 1.5  │      │ 1.0 ┆ 12.0 │      │ 2.0  ┆ 2.5  │
# └─────┴──────┘      └─────┴──────┘      └──────┴──────┘
...
>>> # now data is aligned, can easily calculate the row-wise dot product:
>>> (af1 * af2 * af3).fill_null(0).select(pl.sum(pl.col("*")).alias("dot"))
shape: (3, 1)
┌───────┐
│ dot   │
│ ---   │
│ f64   │
╞═══════╡
│ 0.0   │
├╌╌╌╌╌╌╌┤
│ 167.5 │
├╌╌╌╌╌╌╌┤
│ 47.0  │
└───────┘