Use the Aho-Corasick algorithm to extract matches
Description
This method supports matching on string literals only, and does not
support regular expression matching.
Usage
<Expr>$str$extract_many(
patterns,
...,
ascii_case_insensitive = FALSE,
overlapping = FALSE,
leftmost = FALSE
)
Arguments
patterns
|
String patterns to search. Accepts expression input. Strings are parsed
as column names, and other non-expression inputs are parsed as literals.
To use the same character vector for all rows, use
list(c(…)) instead of c(…) (see Examples).
|
…
|
These dots are for future extensions and must be empty. |
ascii_case_insensitive
|
Enable ASCII-aware case insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only. |
overlapping
|
Whether matches can overlap. |
leftmost
|
Whether to guarantee in case there are overlapping matches that the
leftmost match is used. In case there are multiple candidates for the
leftmost match, the pattern which comes first in patterns
is used.
|
Value
A polars expression
Examples
library("polars")
df <- pl$DataFrame(values = "discontent")
patterns <- list(c("winter", "disco", "onte", "discontent"))
df$with_columns(
matches = pl$col("values")$str$extract_many(patterns),
matches_overlap = pl$col("values")$str$extract_many(patterns, overlapping = TRUE)
)
#> shape: (1, 3)
#> ┌────────────┬───────────┬─────────────────────────────────┐
#> │ values ┆ matches ┆ matches_overlap │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ list[str] ┆ list[str] │
#> ╞════════════╪═══════════╪═════════════════════════════════╡
#> │ discontent ┆ ["disco"] ┆ ["disco", "onte", "discontent"… │
#> └────────────┴───────────┴─────────────────────────────────┘
df <- pl$DataFrame(
values = c("discontent", "rhapsody"),
patterns = list(c("winter", "disco", "onte", "discontent"), c("rhap", "ody", "coalesce"))
)
df$select(pl$col("values")$str$extract_many("patterns"))
#> shape: (2, 1)
#> ┌─────────────────┐
#> │ values │
#> │ --- │
#> │ list[str] │
#> ╞═════════════════╡
#> │ ["disco"] │
#> │ ["rhap", "ody"] │
#> └─────────────────┘