Skip to content

Combining DataFrames

There are two ways DataFrames can be combined depending on the use case: join and concat.

Join

Polars supports all types of join (e.g. left, right, inner, outer). Let's have a closer look on how to join two DataFrames into a single DataFrame. Our two DataFrames both have an 'id'-like column: a and x. We can use those columns to join the DataFrames in this example.

join

df = pl.DataFrame(
    {
        "a": np.arange(0, 8),
        "b": np.random.rand(8),
        "d": [1, 2.0, np.NaN, np.NaN, 0, -5, -42, None],
    }
)

df2 = pl.DataFrame(
    {
        "x": np.arange(0, 8),
        "y": ["A", "A", "A", "B", "B", "C", "X", "X"],
    }
)
joined = df.join(df2, left_on="a", right_on="x")
print(joined)

join

use rand::Rng;
let mut rng = rand::thread_rng();

let df: DataFrame = df!("a" => 0..8,
                        "b"=> (0..8).map(|_| rng.gen::<f64>()).collect::<Vec<f64>>(),
                        "d"=> [Some(1.0), Some(2.0), None, None, Some(0.0), Some(-5.0), Some(-42.), None]
                    ).expect("should not fail");
let df2: DataFrame = df!("x" => 0..8,
                        "y"=> &["A", "A", "A", "B", "B", "C", "X", "X"],
                    ).expect("should not fail");
let joined = df.join(&df2,["a"],["x"],JoinType::Left,None)?;
println!("{}",joined);

shape: (8, 4)
┌─────┬──────────┬───────┬─────┐
│ a   ┆ b        ┆ d     ┆ y   │
│ --- ┆ ---      ┆ ---   ┆ --- │
│ i64 ┆ f64      ┆ f64   ┆ str │
╞═════╪══════════╪═══════╪═════╡
│ 0   ┆ 0.419112 ┆ 1.0   ┆ A   │
│ 1   ┆ 0.248841 ┆ 2.0   ┆ A   │
│ 2   ┆ 0.468882 ┆ NaN   ┆ A   │
│ 3   ┆ 0.507387 ┆ NaN   ┆ B   │
│ 4   ┆ 0.909377 ┆ 0.0   ┆ B   │
│ 5   ┆ 0.40115  ┆ -5.0  ┆ C   │
│ 6   ┆ 0.912623 ┆ -42.0 ┆ X   │
│ 7   ┆ 0.71882  ┆ null  ┆ X   │
└─────┴──────────┴───────┴─────┘

To see more examples with other types of joins, go the User Guide.

Concat

We can also concatenate two DataFrames. Vertical concatenation will make the DataFrame longer. Horizontal concatenation will make the DataFrame wider. Below you can see the result of an horizontal concatenation of our two DataFrames.

hstack

stacked = df.hstack(df2)
print(stacked)

hstack

let stacked = df.hstack(df2.get_columns())?;
println!("{}",stacked);

shape: (8, 5)
┌─────┬──────────┬───────┬─────┬─────┐
│ a   ┆ b        ┆ d     ┆ x   ┆ y   │
│ --- ┆ ---      ┆ ---   ┆ --- ┆ --- │
│ i64 ┆ f64      ┆ f64   ┆ i64 ┆ str │
╞═════╪══════════╪═══════╪═════╪═════╡
│ 0   ┆ 0.419112 ┆ 1.0   ┆ 0   ┆ A   │
│ 1   ┆ 0.248841 ┆ 2.0   ┆ 1   ┆ A   │
│ 2   ┆ 0.468882 ┆ NaN   ┆ 2   ┆ A   │
│ 3   ┆ 0.507387 ┆ NaN   ┆ 3   ┆ B   │
│ 4   ┆ 0.909377 ┆ 0.0   ┆ 4   ┆ B   │
│ 5   ┆ 0.40115  ┆ -5.0  ┆ 5   ┆ C   │
│ 6   ┆ 0.912623 ┆ -42.0 ┆ 6   ┆ X   │
│ 7   ┆ 0.71882  ┆ null  ┆ 7   ┆ X   │
└─────┴──────────┴───────┴─────┴─────┘