Use the aho-corasick algorithm to extract matches
Description
Use the aho-corasick algorithm to extract matches
Usage
<Expr>$str$extract_many(
patterns,
...,
ascii_case_insensitive = FALSE,
overlapping = FALSE
)
Arguments
patterns
|
String patterns to search. This can be an Expr or something coercible to an Expr. Strings are parsed as column names. |
…
|
Ignored. |
ascii_case_insensitive
|
Enable ASCII-aware case insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only. |
overlapping
|
Whether matches can overlap. |
Value
Expr: Series of dtype String.
Examples
library("polars")
df = pl$DataFrame(values = "discontent")
patterns = pl$lit(c("winter", "disco", "onte", "discontent"))
df$with_columns(
matches = pl$col("values")$str$extract_many(patterns),
matches_overlap = pl$col("values")$str$extract_many(patterns, overlapping = TRUE)
)
#> shape: (1, 3)
#> ┌────────────┬───────────┬─────────────────────────────────┐
#> │ values ┆ matches ┆ matches_overlap │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ list[str] ┆ list[str] │
#> ╞════════════╪═══════════╪═════════════════════════════════╡
#> │ discontent ┆ ["disco"] ┆ ["disco", "onte", "discontent"… │
#> └────────────┴───────────┴─────────────────────────────────┘
df = pl$DataFrame(
values = c("discontent", "rhapsody"),
patterns = list(c("winter", "disco", "onte", "discontent"), c("rhap", "ody", "coalesce"))
)
df$select(pl$col("values")$str$extract_many("patterns"))
#> shape: (2, 1)
#> ┌─────────────────┐
#> │ values │
#> │ --- │
#> │ list[str] │
#> ╞═════════════════╡
#> │ ["disco"] │
#> │ ["rhap", "ody"] │
#> └─────────────────┘