Skip to content

Extract all capture groups for the given regex pattern

Source code

Description

Extract all capture groups for the given regex pattern

Usage

<Expr>$str$extract_groups(pattern)

Arguments

pattern A character of a valid regular expression pattern containing at least one capture group, compatible with the regex crate.

Details

All group names are strings. If your pattern contains unnamed groups, their numerical position is converted to a string. See examples.

Value

Expr of data type Struct with fields of data type String.

Examples

library(polars)

df = pl$DataFrame(
  url = c(
    "http://vote.com/ballon_dor?candidate=messi&ref=python",
    "http://vote.com/ballon_dor?candidate=weghorst&ref=polars",
    "http://vote.com/ballon_dor?error=404&ref=rust"
  )
)

pattern = r"(candidate=(?<candidate>\w+)&ref=(?<ref>\w+))"

df$with_columns(
  captures = pl$col("url")$str$extract_groups(pattern)
)$unnest("captures")
#> shape: (3, 3)
#> ┌─────────────────────────────────┬───────────┬────────┐
#> │ url                             ┆ candidate ┆ ref    │
#> │ ---                             ┆ ---       ┆ ---    │
#> │ str                             ┆ str       ┆ str    │
#> ╞═════════════════════════════════╪═══════════╪════════╡
#> │ http://vote.com/ballon_dor?can… ┆ messi     ┆ python │
#> │ http://vote.com/ballon_dor?can… ┆ weghorst  ┆ polars │
#> │ http://vote.com/ballon_dor?err… ┆ null      ┆ null   │
#> └─────────────────────────────────┴───────────┴────────┘
# If the groups are unnamed, their numerical position (as a string) is used:

pattern = r"(candidate=(\w+)&ref=(\w+))"

df$with_columns(
  captures = pl$col("url")$str$extract_groups(pattern)
)$unnest("captures")
#> shape: (3, 3)
#> ┌─────────────────────────────────┬──────────┬────────┐
#> │ url                             ┆ 1        ┆ 2      │
#> │ ---                             ┆ ---      ┆ ---    │
#> │ str                             ┆ str      ┆ str    │
#> ╞═════════════════════════════════╪══════════╪════════╡
#> │ http://vote.com/ballon_dor?can… ┆ messi    ┆ python │
#> │ http://vote.com/ballon_dor?can… ┆ weghorst ┆ polars │
#> │ http://vote.com/ballon_dor?err… ┆ null     ┆ null   │
#> └─────────────────────────────────┴──────────┴────────┘