Extract all capture groups for the given regex pattern
Description
Extract all capture groups for the given regex pattern
Usage
<Expr>$str$extract_groups(pattern)
Arguments
pattern
|
A character of a valid regular expression pattern containing at least one capture group, compatible with the regex crate. |
Details
All group names are strings. If your pattern contains unnamed groups, their numerical position is converted to a string. See examples.
Value
Expr of data type Struct with fields of data type String
.
Examples
library("polars")
df = pl$DataFrame(
url = c(
"http://vote.com/ballon_dor?candidate=messi&ref=python",
"http://vote.com/ballon_dor?candidate=weghorst&ref=polars",
"http://vote.com/ballon_dor?error=404&ref=rust"
)
)
pattern = r"(candidate=(?<candidate>\w+)&ref=(?<ref>\w+))"
df$with_columns(
captures = pl$col("url")$str$extract_groups(pattern)
)$unnest("captures")
#> shape: (3, 3)
#> ┌─────────────────────────────────┬───────────┬────────┐
#> │ url ┆ candidate ┆ ref │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ str ┆ str │
#> ╞═════════════════════════════════╪═══════════╪════════╡
#> │ http://vote.com/ballon_dor?can… ┆ messi ┆ python │
#> │ http://vote.com/ballon_dor?can… ┆ weghorst ┆ polars │
#> │ http://vote.com/ballon_dor?err… ┆ null ┆ null │
#> └─────────────────────────────────┴───────────┴────────┘
# If the groups are unnamed, their numerical position (as a string) is used:
pattern = r"(candidate=(\w+)&ref=(\w+))"
df$with_columns(
captures = pl$col("url")$str$extract_groups(pattern)
)$unnest("captures")
#> shape: (3, 3)
#> ┌─────────────────────────────────┬──────────┬────────┐
#> │ url ┆ 1 ┆ 2 │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ str ┆ str │
#> ╞═════════════════════════════════╪══════════╪════════╡
#> │ http://vote.com/ballon_dor?can… ┆ messi ┆ python │
#> │ http://vote.com/ballon_dor?can… ┆ weghorst ┆ polars │
#> │ http://vote.com/ballon_dor?err… ┆ null ┆ null │
#> └─────────────────────────────────┴──────────┴────────┘