Use the Aho-Corasick algorithm to find many matches
Description
The function will return the bytes offset of the start of each match.
The return type will be List(UInt32). This method supports matching on
string literals only, and does not support regular expression matching.
Usage
<Expr>$str$find_many(
patterns,
...,
ascii_case_insensitive = FALSE,
overlapping = FALSE
)
Arguments
patterns
|
String patterns to search. Accepts expression input. Strings are parsed
as column names, and other non-expression inputs are parsed as literals.
To use the same character vector for all rows, use
list(c(…)) instead of c(…) (see Examples).
|
…
|
These dots are for future extensions and must be empty. |
ascii_case_insensitive
|
Enable ASCII-aware case insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only. |
overlapping
|
Whether matches can overlap. |
Value
A polars expression
Examples
library("polars")
df <- pl$DataFrame(values = "discontent")
patterns <- list(c("winter", "disco", "onte", "discontent"))
df$with_columns(
matches = pl$col("values")$str$find_many(patterns, overlapping = FALSE),
matches_overlapping = pl$col("values")$str$find_many(
patterns, overlapping = TRUE
)
)
#> shape: (1, 3)
#> ┌────────────┬───────────┬─────────────────────┐
#> │ values ┆ matches ┆ matches_overlapping │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ list[u32] ┆ list[u32] │
#> ╞════════════╪═══════════╪═════════════════════╡
#> │ discontent ┆ [0] ┆ [0, 4, 0] │
#> └────────────┴───────────┴─────────────────────┘
df <- pl$DataFrame(
values = c("discontent", "rhapsody"),
patterns = list(
c("winter", "disco", "onte", "discontent"),
c("rhap", "ody", "coalesce")
)
)
df$select(pl$col("values")$str$find_many("patterns"))
#> shape: (2, 1)
#> ┌───────────┐
#> │ values │
#> │ --- │
#> │ list[u32] │
#> ╞═══════════╡
#> │ [0] │
#> │ [0, 5] │
#> └───────────┘