polars.align_frames#
- polars.align_frames(*frames: DataFrame, on: str | Expr | Sequence[str] | Sequence[Expr] | Sequence[str | Expr], select: str | Expr | Sequence[str | Expr] | None = None, descending: bool | Sequence[bool] = False) list[DataFrame] [source]#
- polars.align_frames(*frames: LazyFrame, on: str | Expr | Sequence[str] | Sequence[Expr] | Sequence[str | Expr], select: str | Expr | Sequence[str | Expr] | None = None, descending: bool | Sequence[bool] = False) list[LazyFrame]
Align a sequence of frames using the unique values from one or more columns as a key.
Frames that do not contain the given key values have rows injected (with nulls filling the non-key columns), and each resulting frame is sorted by the key.
The original column order of input frames is not changed unless
select
is specified (in which case the final column order is determined from that).Note that this does not result in a joined frame - you receive the same number of frames back that you passed in, but each is now aligned by key and has the same number of rows.
- Parameters:
- frames
sequence of DataFrames or LazyFrames.
- on
one or more columns whose unique values will be used to align the frames.
- select
optional post-alignment column select to constrain and/or order the columns returned from the newly aligned frames.
- descending
sort the alignment column values in descending order; can be a single boolean or a list of booleans associated with each column in
on
.
Examples
>>> from datetime import date >>> df1 = pl.DataFrame( ... { ... "dt": [date(2022, 9, 1), date(2022, 9, 2), date(2022, 9, 3)], ... "x": [3.5, 4.0, 1.0], ... "y": [10.0, 2.5, 1.5], ... } ... ) >>> df2 = pl.DataFrame( ... { ... "dt": [date(2022, 9, 2), date(2022, 9, 3), date(2022, 9, 1)], ... "x": [8.0, 1.0, 3.5], ... "y": [1.5, 12.0, 5.0], ... } ... ) >>> df3 = pl.DataFrame( ... { ... "dt": [date(2022, 9, 3), date(2022, 9, 2)], ... "x": [2.0, 5.0], ... "y": [2.5, 2.0], ... } ... ) >>> pl.Config.set_tbl_formatting("UTF8_FULL") # # df1 df2 df3 # shape: (3, 3) shape: (3, 3) shape: (2, 3) # ┌────────────┬─────┬──────┐ ┌────────────┬─────┬──────┐ ┌────────────┬─────┬─────┐ # │ dt ┆ x ┆ y │ │ dt ┆ x ┆ y │ │ dt ┆ x ┆ y │ # │ --- ┆ --- ┆ --- │ │ --- ┆ --- ┆ --- │ │ --- ┆ --- ┆ --- │ # │ date ┆ f64 ┆ f64 │ │ date ┆ f64 ┆ f64 │ │ date ┆ f64 ┆ f64 │ # ╞════════════╪═════╪══════╡ ╞════════════╪═════╪══════╡ ╞════════════╪═════╪═════╡ # │ 2022-09-01 ┆ 3.5 ┆ 10.0 │\ ,->│ 2022-09-02 ┆ 8.0 ┆ 1.5 │\ ,->│ 2022-09-03 ┆ 2.0 ┆ 2.5 │ # ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤ \/ ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤ \/ ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤ # │ 2022-09-02 ┆ 4.0 ┆ 2.5 │_/\,->│ 2022-09-03 ┆ 1.0 ┆ 12.0 │_/`-->│ 2022-09-02 ┆ 5.0 ┆ 2.0 │ # ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤ /\ ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤ └────────────┴─────┴─────┘ # │ 2022-09-03 ┆ 1.0 ┆ 1.5 │_/ `>│ 2022-09-01 ┆ 3.5 ┆ 5.0 │-//- # └────────────┴─────┴──────┘ └────────────┴─────┴──────┘ ...
Align frames by the “dt” column:
>>> af1, af2, af3 = pl.align_frames( ... df1, df2, df3, on="dt" ... ) # # df1 df2 df3 # shape: (3, 3) shape: (3, 3) shape: (3, 3) # ┌────────────┬─────┬──────┐ ┌────────────┬─────┬──────┐ ┌────────────┬──────┬──────┐ # │ dt ┆ x ┆ y │ │ dt ┆ x ┆ y │ │ dt ┆ x ┆ y │ # │ --- ┆ --- ┆ --- │ │ --- ┆ --- ┆ --- │ │ --- ┆ --- ┆ --- │ # │ date ┆ f64 ┆ f64 │ │ date ┆ f64 ┆ f64 │ │ date ┆ f64 ┆ f64 │ # ╞════════════╪═════╪══════╡ ╞════════════╪═════╪══════╡ ╞════════════╪══════╪══════╡ # │ 2022-09-01 ┆ 3.5 ┆ 10.0 │----->│ 2022-09-01 ┆ 3.5 ┆ 5.0 │----->│ 2022-09-01 ┆ null ┆ null │ # ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤ ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤ ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤ # │ 2022-09-02 ┆ 4.0 ┆ 2.5 │----->│ 2022-09-02 ┆ 8.0 ┆ 1.5 │----->│ 2022-09-02 ┆ 5.0 ┆ 2.0 │ # ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤ ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤ ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤ # │ 2022-09-03 ┆ 1.0 ┆ 1.5 │----->│ 2022-09-03 ┆ 1.0 ┆ 12.0 │----->│ 2022-09-03 ┆ 2.0 ┆ 2.5 │ # └────────────┴─────┴──────┘ └────────────┴─────┴──────┘ └────────────┴──────┴──────┘ ...
Align frames by “dt”, but keep only cols “x” and “y”:
>>> af1, af2, af3 = pl.align_frames( ... df1, df2, df3, on="dt", select=["x", "y"] ... ) # # af1 af2 af3 # shape: (3, 3) shape: (3, 3) shape: (3, 3) # ┌─────┬──────┐ ┌─────┬──────┐ ┌──────┬──────┐ # │ x ┆ y │ │ x ┆ y │ │ x ┆ y │ # │ --- ┆ --- │ │ --- ┆ --- │ │ --- ┆ --- │ # │ f64 ┆ f64 │ │ f64 ┆ f64 │ │ f64 ┆ f64 │ # ╞═════╪══════╡ ╞═════╪══════╡ ╞══════╪══════╡ # │ 3.5 ┆ 10.0 │ │ 3.5 ┆ 5.0 │ │ null ┆ null │ # ├╌╌╌╌╌┼╌╌╌╌╌╌┤ ├╌╌╌╌╌┼╌╌╌╌╌╌┤ ├╌╌╌╌╌╌┼╌╌╌╌╌╌┤ # │ 4.0 ┆ 2.5 │ │ 8.0 ┆ 1.5 │ │ 5.0 ┆ 2.0 │ # ├╌╌╌╌╌┼╌╌╌╌╌╌┤ ├╌╌╌╌╌┼╌╌╌╌╌╌┤ ├╌╌╌╌╌╌┼╌╌╌╌╌╌┤ # │ 1.0 ┆ 1.5 │ │ 1.0 ┆ 12.0 │ │ 2.0 ┆ 2.5 │ # └─────┴──────┘ └─────┴──────┘ └──────┴──────┘ ...
Now data is aligned, and you can easily calculate the row-wise dot product:
>>> (af1 * af2 * af3).fill_null(0).select(pl.sum(pl.col("*")).alias("dot")) shape: (3, 1) ┌───────┐ │ dot │ │ --- │ │ f64 │ ╞═══════╡ │ 0.0 │ ├╌╌╌╌╌╌╌┤ │ 167.5 │ ├╌╌╌╌╌╌╌┤ │ 47.0 │ └───────┘