Skip to content

Use the aho-corasick algorithm to extract matches

Source code

Description

Use the aho-corasick algorithm to extract matches

Usage

<Expr>$str$extract_many(
  patterns,
  ...,
  ascii_case_insensitive = FALSE,
  overlapping = FALSE
)

Arguments

patterns String patterns to search. This can be an Expr or something coercible to an Expr. Strings are parsed as column names.
Ignored.
ascii_case_insensitive Enable ASCII-aware case insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.
overlapping Whether matches can overlap.

Value

Expr: Series of dtype String.

Examples

library("polars")

df = pl$DataFrame(values = "discontent")
patterns = pl$lit(c("winter", "disco", "onte", "discontent"))

df$with_columns(
  matches = pl$col("values")$str$extract_many(patterns),
  matches_overlap = pl$col("values")$str$extract_many(patterns, overlapping = TRUE)
)
#> shape: (1, 3)
#> ┌────────────┬───────────┬─────────────────────────────────┐
#> │ values     ┆ matches   ┆ matches_overlap                 │
#> │ ---        ┆ ---       ┆ ---                             │
#> │ str        ┆ list[str] ┆ list[str]                       │
#> ╞════════════╪═══════════╪═════════════════════════════════╡
#> │ discontent ┆ ["disco"] ┆ ["disco", "onte", "discontent"… │
#> └────────────┴───────────┴─────────────────────────────────┘
df = pl$DataFrame(
  values = c("discontent", "rhapsody"),
  patterns = list(c("winter", "disco", "onte", "discontent"), c("rhap", "ody", "coalesce"))
)

df$select(pl$col("values")$str$extract_many("patterns"))
#> shape: (2, 1)
#> ┌─────────────────┐
#> │ values          │
#> │ ---             │
#> │ list[str]       │
#> ╞═════════════════╡
#> │ ["disco"]       │
#> │ ["rhap", "ody"] │
#> └─────────────────┘