Pivot a frame from long to wide format

Description

Reshape data from long to wide format, known as "pivot wider".

Unlike for DataFrame, the values contained in the columns must be specified beforehand using on_columns.

Usage

<LazyFrame>$pivot(
  on,
  on_columns,
  ...,
  index = NULL,
  values = NULL,
  aggregate_function = NULL,
  maintain_order = FALSE,
  separator = "_"
)

Arguments

on The column(s) whose values will be used as the new columns of the output.

on_columns What value combinations will be considered for the output table. Something can be converted to a DataFrame by as_polars_series(on_columns) |\> as_polars_df(). If on contains multiple columns, the DataFrame passed to on_columns must have exactly the same columns in the same order as on. See examples for details.

… These dots are for future extensions and must be empty.

index The column(s) that remain from the input to the output. The output will have one row for each unique combination of the index’s values. If NULL, all remaining columns not specified in on and values will be used. At least one of index and values must be specified.

values The existing column(s) of values which will be moved under the new columns from index. If an aggregation is specified, these are the values on which the aggregation will be computed. If NULL, all remaining columns not specified in on and index will be used. At least one of index and values must be specified.

aggregate_function

Choose from:

NULL (default): no aggregation takes place, will raise error if multiple values are in group. Same as pl$element()$item(allow_empty = TRUE).
A predefined aggregate function string, one of “min”, “max”, “first”, “last”, “sum”, “mean”, “median”, “len”, “item”. Same as pl$element()$\().
An expression to do the aggregation.

maintain_order Ensure the values of index are sorted by discovery order.

separator Used as separator/delimiter in generated column names in case of multiple values columns.

Value

A polars LazyFrame

Examples

library("polars")


df <- pl$DataFrame(
  name = c("Cady", "Cady", "Karen", "Karen"),
  subject = c("maths", "physics", "maths", "physics"),
  test_1 = c(98, 99, 61, 58),
  test_2 = c(100, 100, 60, 60),
)
df

#> shape: (4, 4)
#> ┌───────┬─────────┬────────┬────────┐
#> │ name  ┆ subject ┆ test_1 ┆ test_2 │
#> │ ---   ┆ ---     ┆ ---    ┆ ---    │
#> │ str   ┆ str     ┆ f64    ┆ f64    │
#> ╞═══════╪═════════╪════════╪════════╡
#> │ Cady  ┆ maths   ┆ 98.0   ┆ 100.0  │
#> │ Cady  ┆ physics ┆ 99.0   ┆ 100.0  │
#> │ Karen ┆ maths   ┆ 61.0   ┆ 60.0   │
#> │ Karen ┆ physics ┆ 58.0   ┆ 60.0   │
#> └───────┴─────────┴────────┴────────┘

# Using `pivot`, we can reshape so we have one row per student, with different
# subjects as columns, and their `test_1` scores as values:
df$lazy()$pivot(
  "subject",
  on_columns = c("maths", "physics"),
  index = "name",
  values = "test_1",
)$collect()

#> shape: (2, 3)
#> ┌───────┬───────┬─────────┐
#> │ name  ┆ maths ┆ physics │
#> │ ---   ┆ ---   ┆ ---     │
#> │ str   ┆ f64   ┆ f64     │
#> ╞═══════╪═══════╪═════════╡
#> │ Karen ┆ 61.0  ┆ 58.0    │
#> │ Cady  ┆ 98.0  ┆ 99.0    │
#> └───────┴───────┴─────────┘

# You can use selectors too - here we include all test scores in the pivoted table:
df$lazy()$pivot(
  "subject",
  on_columns = c("maths", "physics"),
  values = cs$starts_with("test"),
)$collect()

#> shape: (2, 5)
#> ┌───────┬──────────────┬────────────────┬──────────────┬────────────────┐
#> │ name  ┆ test_1_maths ┆ test_1_physics ┆ test_2_maths ┆ test_2_physics │
#> │ ---   ┆ ---          ┆ ---            ┆ ---          ┆ ---            │
#> │ str   ┆ f64          ┆ f64            ┆ f64          ┆ f64            │
#> ╞═══════╪══════════════╪════════════════╪══════════════╪════════════════╡
#> │ Cady  ┆ 98.0         ┆ 99.0           ┆ 100.0        ┆ 100.0          │
#> │ Karen ┆ 61.0         ┆ 58.0           ┆ 60.0         ┆ 60.0           │
#> └───────┴──────────────┴────────────────┴──────────────┴────────────────┘

# If you end up with multiple values per cell, you can specify how to aggregate
# them with `aggregate_function`:
lf <- pl$LazyFrame(
  ix = c(1, 1, 2, 2, 1, 2),
  col = c("a", "a", "a", "a", "b", "b"),
  foo = c(0, 1, 2, 2, 7, 1),
  bar = c(0, 2, 0, 0, 9, 4),
)
lf$pivot(
  "col", on_columns = c("a", "b"), index = "ix", aggregate_function = "sum"
)$collect()

#> shape: (2, 5)
#> ┌─────┬───────┬───────┬───────┬───────┐
#> │ ix  ┆ foo_a ┆ foo_b ┆ bar_a ┆ bar_b │
#> │ --- ┆ ---   ┆ ---   ┆ ---   ┆ ---   │
#> │ f64 ┆ f64   ┆ f64   ┆ f64   ┆ f64   │
#> ╞═════╪═══════╪═══════╪═══════╪═══════╡
#> │ 2.0 ┆ 4.0   ┆ 1.0   ┆ 0.0   ┆ 4.0   │
#> │ 1.0 ┆ 1.0   ┆ 7.0   ┆ 2.0   ┆ 9.0   │
#> └─────┴───────┴───────┴───────┴───────┘

# You can also pass a custom aggregation function using `pl$element()` expressions:
lf <- pl$LazyFrame(
  col1 = c("a", "a", "a", "b", "b", "b"),
  col2 = c("x", "x", "x", "x", "y", "y"),
  col3 = c(6, 7, 3, 2, 5, 7),
)
lf$pivot(
  "col2",
  on_columns = c("x", "y"),
  index = "col1",
  values = "col3",
  aggregate_function = pl$element()$tanh()$mean(),
)$collect()

#> shape: (2, 3)
#> ┌──────┬──────────┬──────────┐
#> │ col1 ┆ x        ┆ y        │
#> │ ---  ┆ ---      ┆ ---      │
#> │ str  ┆ f64      ┆ f64      │
#> ╞══════╪══════════╪══════════╡
#> │ b    ┆ 0.964028 ┆ 0.999954 │
#> │ a    ┆ 0.998347 ┆ null     │
#> └──────┴──────────┴──────────┘

# Note that `on_columns` must contain all combinations of the values in `on`.
# For example, you can use the `expand.grid()` function to create all combinations
# of multiple columns as follows:
as_polars_lf(datasets::penguins)$pivot(
  on = c("species", "sex"),
  on_columns = expand.grid(
    species = c("Adelie", "Gentoo", "Chinstrap"),
    sex = c("male", "female")
  ),
  index = "island",
  values = "body_mass",
  aggregate_function = "mean",
)$collect()

#> shape: (3, 7)
#> ┌───────────┬──────────────┬──────────────┬──────────────┬─────────────┬─────────────┬─────────────┐
#> │ island    ┆ {"Adelie","m ┆ {"Gentoo","m ┆ {"Chinstrap" ┆ {"Adelie"," ┆ {"Gentoo"," ┆ {"Chinstrap │
#> │ ---       ┆ ale"}        ┆ ale"}        ┆ ,"male"}     ┆ female"}    ┆ female"}    ┆ ","female"} │
#> │ cat       ┆ ---          ┆ ---          ┆ ---          ┆ ---         ┆ ---         ┆ ---         │
#> │           ┆ f64          ┆ f64          ┆ f64          ┆ f64         ┆ f64         ┆ f64         │
#> ╞═══════════╪══════════════╪══════════════╪══════════════╪═════════════╪═════════════╪═════════════╡
#> │ Torgersen ┆ 4034.782609  ┆ null         ┆ null         ┆ 3395.833333 ┆ null        ┆ null        │
#> │ Biscoe    ┆ 4050.0       ┆ 5484.836066  ┆ null         ┆ 3369.318182 ┆ 4679.741379 ┆ null        │
#> │ Dream     ┆ 4045.535714  ┆ null         ┆ 3938.970588  ┆ 3344.444444 ┆ null        ┆ 3527.205882 │
#> └───────────┴──────────────┴──────────────┴──────────────┴─────────────┴─────────────┴─────────────┘