Pivot a frame from long to wide format
Description
Reshape data from long to wide format, known as "pivot wider".
Unlike for DataFrame, the values contained in the columns must be
specified beforehand using on_columns.
Usage
<LazyFrame>$pivot(
on,
on_columns,
...,
index = NULL,
values = NULL,
aggregate_function = NULL,
maintain_order = FALSE,
separator = "_"
)
Arguments
on
|
The column(s) whose values will be used as the new columns of the output. |
on_columns
|
What value combinations will be considered for the output table.
Something can be converted to a DataFrame by
as_polars_series(on_columns) |\> as_polars_df(). If
on contains multiple columns, the DataFrame passed to
on_columns must have exactly the same columns in the same
order as on. See examples for details.
|
…
|
These dots are for future extensions and must be empty. |
index
|
The column(s) that remain from the input to the output. The output will
have one row for each unique combination of the index’s
values. If NULL, all remaining columns not specified in
on and values will be used. At least one of
index and values must be specified.
|
values
|
The existing column(s) of values which will be moved under the new
columns from index. If an aggregation is specified, these
are the values on which the aggregation will be computed. If
NULL, all remaining columns not specified in
on and index will be used. At least one of
index and values must be specified.
|
aggregate_function
|
Choose from:
|
maintain_order
|
Ensure the values of index are sorted by discovery order.
|
separator
|
Used as separator/delimiter in generated column names in case of multiple values columns. |
Value
A polars LazyFrame
Examples
library("polars")
df <- pl$DataFrame(
name = c("Cady", "Cady", "Karen", "Karen"),
subject = c("maths", "physics", "maths", "physics"),
test_1 = c(98, 99, 61, 58),
test_2 = c(100, 100, 60, 60),
)
df
#> shape: (4, 4)
#> ┌───────┬─────────┬────────┬────────┐
#> │ name ┆ subject ┆ test_1 ┆ test_2 │
#> │ --- ┆ --- ┆ --- ┆ --- │
#> │ str ┆ str ┆ f64 ┆ f64 │
#> ╞═══════╪═════════╪════════╪════════╡
#> │ Cady ┆ maths ┆ 98.0 ┆ 100.0 │
#> │ Cady ┆ physics ┆ 99.0 ┆ 100.0 │
#> │ Karen ┆ maths ┆ 61.0 ┆ 60.0 │
#> │ Karen ┆ physics ┆ 58.0 ┆ 60.0 │
#> └───────┴─────────┴────────┴────────┘
# Using `pivot`, we can reshape so we have one row per student, with different
# subjects as columns, and their `test_1` scores as values:
df$lazy()$pivot(
"subject",
on_columns = c("maths", "physics"),
index = "name",
values = "test_1",
)$collect()
#> shape: (2, 3)
#> ┌───────┬───────┬─────────┐
#> │ name ┆ maths ┆ physics │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ f64 ┆ f64 │
#> ╞═══════╪═══════╪═════════╡
#> │ Karen ┆ 61.0 ┆ 58.0 │
#> │ Cady ┆ 98.0 ┆ 99.0 │
#> └───────┴───────┴─────────┘
# You can use selectors too - here we include all test scores in the pivoted table:
df$lazy()$pivot(
"subject",
on_columns = c("maths", "physics"),
values = cs$starts_with("test"),
)$collect()
#> shape: (2, 5)
#> ┌───────┬──────────────┬────────────────┬──────────────┬────────────────┐
#> │ name ┆ test_1_maths ┆ test_1_physics ┆ test_2_maths ┆ test_2_physics │
#> │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
#> │ str ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
#> ╞═══════╪══════════════╪════════════════╪══════════════╪════════════════╡
#> │ Cady ┆ 98.0 ┆ 99.0 ┆ 100.0 ┆ 100.0 │
#> │ Karen ┆ 61.0 ┆ 58.0 ┆ 60.0 ┆ 60.0 │
#> └───────┴──────────────┴────────────────┴──────────────┴────────────────┘
# If you end up with multiple values per cell, you can specify how to aggregate
# them with `aggregate_function`:
lf <- pl$LazyFrame(
ix = c(1, 1, 2, 2, 1, 2),
col = c("a", "a", "a", "a", "b", "b"),
foo = c(0, 1, 2, 2, 7, 1),
bar = c(0, 2, 0, 0, 9, 4),
)
lf$pivot(
"col", on_columns = c("a", "b"), index = "ix", aggregate_function = "sum"
)$collect()
#> shape: (2, 5)
#> ┌─────┬───────┬───────┬───────┬───────┐
#> │ ix ┆ foo_a ┆ foo_b ┆ bar_a ┆ bar_b │
#> │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
#> ╞═════╪═══════╪═══════╪═══════╪═══════╡
#> │ 2.0 ┆ 4.0 ┆ 1.0 ┆ 0.0 ┆ 4.0 │
#> │ 1.0 ┆ 1.0 ┆ 7.0 ┆ 2.0 ┆ 9.0 │
#> └─────┴───────┴───────┴───────┴───────┘
# You can also pass a custom aggregation function using `pl$element()` expressions:
lf <- pl$LazyFrame(
col1 = c("a", "a", "a", "b", "b", "b"),
col2 = c("x", "x", "x", "x", "y", "y"),
col3 = c(6, 7, 3, 2, 5, 7),
)
lf$pivot(
"col2",
on_columns = c("x", "y"),
index = "col1",
values = "col3",
aggregate_function = pl$element()$tanh()$mean(),
)$collect()
#> shape: (2, 3)
#> ┌──────┬──────────┬──────────┐
#> │ col1 ┆ x ┆ y │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ f64 ┆ f64 │
#> ╞══════╪══════════╪══════════╡
#> │ a ┆ 0.998347 ┆ null │
#> │ b ┆ 0.964028 ┆ 0.999954 │
#> └──────┴──────────┴──────────┘
# Note that `on_columns` must contain all combinations of the values in `on`.
# For example, you can use the `expand.grid()` function to create all combinations
# of multiple columns as follows:
as_polars_lf(datasets::penguins)$pivot(
on = c("species", "sex"),
on_columns = expand.grid(
species = c("Adelie", "Gentoo", "Chinstrap"),
sex = c("male", "female")
),
index = "island",
values = "body_mass",
aggregate_function = "mean",
)$collect()
#> shape: (3, 7)
#> ┌───────────┬──────────────┬──────────────┬──────────────┬─────────────┬─────────────┬─────────────┐
#> │ island ┆ {"Adelie","m ┆ {"Gentoo","m ┆ {"Chinstrap" ┆ {"Adelie"," ┆ {"Gentoo"," ┆ {"Chinstrap │
#> │ --- ┆ ale"} ┆ ale"} ┆ ,"male"} ┆ female"} ┆ female"} ┆ ","female"} │
#> │ cat ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
#> │ ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
#> ╞═══════════╪══════════════╪══════════════╪══════════════╪═════════════╪═════════════╪═════════════╡
#> │ Biscoe ┆ 4050.0 ┆ 5484.836066 ┆ null ┆ 3369.318182 ┆ 4679.741379 ┆ null │
#> │ Torgersen ┆ 4034.782609 ┆ null ┆ null ┆ 3395.833333 ┆ null ┆ null │
#> │ Dream ┆ 4045.535714 ┆ null ┆ 3938.970588 ┆ 3344.444444 ┆ null ┆ 3527.205882 │
#> └───────────┴──────────────┴──────────────┴──────────────┴─────────────┴─────────────┴─────────────┘