Use the aho-corasick algorithm to extract matches

Description

Usage

<Expr>$str$extract_many(
  patterns,
  ...,
  ascii_case_insensitive = FALSE,
  overlapping = FALSE
)

Arguments

`patterns`	String patterns to search. This can be an Expr or something coercible to an Expr. Strings are parsed as column names.
`…`	Ignored.
`ascii_case_insensitive`	Enable ASCII-aware case insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.
`overlapping`	Whether matches can overlap.

Value

Expr: Series of dtype String.

Examples

library("polars")

df = pl$DataFrame(values = "discontent")
patterns = pl$lit(c("winter", "disco", "onte", "discontent"))

df$with_columns(
  matches = pl$col("values")$str$extract_many(patterns),
  matches_overlap = pl$col("values")$str$extract_many(patterns, overlapping = TRUE)
)

#> shape: (1, 3)
#> ┌────────────┬───────────┬─────────────────────────────────┐
#> │ values     ┆ matches   ┆ matches_overlap                 │
#> │ ---        ┆ ---       ┆ ---                             │
#> │ str        ┆ list[str] ┆ list[str]                       │
#> ╞════════════╪═══════════╪═════════════════════════════════╡
#> │ discontent ┆ ["disco"] ┆ ["disco", "onte", "discontent"… │
#> └────────────┴───────────┴─────────────────────────────────┘

df = pl$DataFrame(
  values = c("discontent", "rhapsody"),
  patterns = list(c("winter", "disco", "onte", "discontent"), c("rhap", "ody", "coalesce"))
)

df$select(pl$col("values")$str$extract_many("patterns"))

#> shape: (2, 1)
#> ┌─────────────────┐
#> │ values          │
#> │ ---             │
#> │ list[str]       │
#> ╞═════════════════╡
#> │ ["disco"]       │
#> │ ["rhap", "ody"] │
#> └─────────────────┘