Join LazyFrames
Description
This function can do both mutating joins (adding columns based on
matching observations, for example with how = “left”
) and
filtering joins (keeping observations based on matching observations,
for example with how = “inner”
).
Usage
<LazyFrame>$join(
other,
on = NULL,
how = "inner",
...,
left_on = NULL,
right_on = NULL,
suffix = "_right",
validate = "m:m",
join_nulls = FALSE,
allow_parallel = TRUE,
force_parallel = FALSE,
coalesce = NULL
)
Arguments
other
|
LazyFrame to join with. |
on
|
Either a vector of column names or a list of expressions and/or strings.
Use left_on and right_on if the column names
to match on are different between the two DataFrames.
|
how
|
One of the following methods: "inner", "left", "right", "full", "semi", "anti", "cross". |
…
|
Ignored. |
left_on , right_on
|
Same as on but only for the left or the right DataFrame.
They must have the same length.
|
suffix
|
Suffix to add to duplicated column names. |
validate
|
Checks if join is of specified type:
|
join_nulls
|
Join on null values. By default null values will never produce matches. |
allow_parallel
|
Allow the physical plan to optionally evaluate the computation of both DataFrames up to the join in parallel. |
force_parallel
|
Force the physical plan to evaluate the computation of both DataFrames up to the join in parallel. |
coalesce
|
Coalescing behavior (merging of join columns).
|
Value
LazyFrame
Examples
library("polars")
# inner join by default
df1 = pl$LazyFrame(list(key = 1:3, payload = c("f", "i", NA)))
df2 = pl$LazyFrame(list(key = c(3L, 4L, 5L, NA_integer_)))
df1$join(other = df2, on = "key")
#> polars LazyFrame
#> $explain(): Show the optimized query plan.
#>
#> Naive plan:
#> INNER JOIN:
#> LEFT PLAN ON: [col("key")]
#> DF ["key", "payload"]; PROJECT */2 COLUMNS; SELECTION: None
#> RIGHT PLAN ON: [col("key")]
#> DF ["key"]; PROJECT */1 COLUMNS; SELECTION: None
#> END INNER JOIN
# cross join
df1 = pl$LazyFrame(x = letters[1:3])
df2 = pl$LazyFrame(y = 1:4)
df1$join(other = df2, how = "cross")
#> polars LazyFrame
#> $explain(): Show the optimized query plan.
#>
#> Naive plan:
#> CROSS JOIN:
#> LEFT PLAN ON: []
#> DF ["x"]; PROJECT */1 COLUMNS; SELECTION: None
#> RIGHT PLAN ON: []
#> DF ["y"]; PROJECT */1 COLUMNS; SELECTION: None
#> END CROSS JOIN
# use "validate" to ensure join keys are not duplicated
df1 = pl$LazyFrame(x = letters[1:5], y = 1:5)
df2 = pl$LazyFrame(x = c("a", letters[1:4]), y2 = 6:10)
# this throws an error because there are two keys in df2 that match the key
# in df1
tryCatch(
df1$join(df2, on = "x", validate = "1:1")$collect(),
error = function(e) print(e)
)
#> <RPolarsErr_error: Execution halted with the following contexts
#> 0: In R: in $collect():
#> 0: During function call [.main()]
#> 1: Encountered the following error in Rust-Polars:
#> join keys did not fulfill 1:1 validation
#> >