Skip to content

Use the Aho-Corasick algorithm to extract matches

Source code

Description

[Experimental] This method supports matching on string literals only, and does not support regular expression matching.

Usage

<Expr>$str$extract_many(
  patterns,
  ...,
  ascii_case_insensitive = FALSE,
  overlapping = FALSE
)

Arguments

patterns String patterns to search. Accepts expression input. Strings are parsed as column names, and other non-expression inputs are parsed as literals. To use the same character vector for all rows, use list(c(…)) instead of c(…) (see Examples).
These dots are for future extensions and must be empty.
ascii_case_insensitive Enable ASCII-aware case insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.
overlapping Whether matches can overlap.

Value

A polars expression

Examples

library("polars")

df <- pl$DataFrame(values = "discontent")
patterns <- list(c("winter", "disco", "onte", "discontent"))

df$with_columns(
  matches = pl$col("values")$str$extract_many(patterns),
  matches_overlap = pl$col("values")$str$extract_many(patterns, overlapping = TRUE)
)
#> shape: (1, 3)
#> ┌────────────┬───────────┬─────────────────────────────────┐
#> │ values     ┆ matches   ┆ matches_overlap                 │
#> │ ---        ┆ ---       ┆ ---                             │
#> │ str        ┆ list[str] ┆ list[str]                       │
#> ╞════════════╪═══════════╪═════════════════════════════════╡
#> │ discontent ┆ ["disco"] ┆ ["disco", "onte", "discontent"… │
#> └────────────┴───────────┴─────────────────────────────────┘
df <- pl$DataFrame(
  values = c("discontent", "rhapsody"),
  patterns = list(c("winter", "disco", "onte", "discontent"), c("rhap", "ody", "coalesce"))
)

df$select(pl$col("values")$str$extract_many("patterns"))
#> shape: (2, 1)
#> ┌─────────────────┐
#> │ values          │
#> │ ---             │
#> │ list[str]       │
#> ╞═════════════════╡
#> │ ["disco"]       │
#> │ ["rhap", "ody"] │
#> └─────────────────┘