Skip to content

Extract the target capture group from provided patterns

Source code

Description

Extract the target capture group from provided patterns

Usage

<Expr>$str$extract(pattern, group_index = 1L)

Arguments

pattern A valid regular expression pattern containing at least one capture group, compatible with the regex crate.
group_index Index of the targeted capture group. Group 0 means the whole pattern, the first group begins at index 1. Defaults to the first capture group.

Details

To modify regular expression behaviour (such as multi-line matching) with flags, use the inline (?iLmsuxU) syntax. See the example.

See the regex crate’s section on grouping and flags for additional information about the use of inline expression modifiers.

Value

A polars expression

Examples

library("polars")

df <- pl$DataFrame(
  url = c(
    "http://vote.com/ballon_dor?error=404&ref=unknown",
    "http://vote.com/ballon_dor?ref=polars&candidate=messi",
    "http://vote.com/ballon_dor?candidate=ronaldo&ref=polars"
  )
)
df$select(
  extracted = pl$col("url")$str$extract(r"(candidate=(\w+))", 1),
  referer = pl$col("url")$str$extract(r"(ref=(\w+))", 1),
  error = pl$col("url")$str$extract(r"(error=(\w+))", 1)
)
#> shape: (3, 3)
#> ┌───────────┬─────────┬───────┐
#> │ extracted ┆ referer ┆ error │
#> │ ---       ┆ ---     ┆ ---   │
#> │ str       ┆ str     ┆ str   │
#> ╞═══════════╪═════════╪═══════╡
#> │ null      ┆ unknown ┆ 404   │
#> │ messi     ┆ polars  ┆ null  │
#> │ ronaldo   ┆ polars  ┆ null  │
#> └───────────┴─────────┴───────┘
# Using the multi-line mode flag `(?m)`
df <- pl$DataFrame(
  lines = c("I Like\nThose\nOdds", "This is\nThe Way")
)
df$with_columns(
  with_m_flag = pl$col("lines")$str$extract(r"((?m)^(T\w+))", 1),
  without_flag = pl$col("lines")$str$extract(r"(^(T\w+))", 1),
)
#> shape: (2, 3)
#> ┌─────────┬─────────────┬──────────────┐
#> │ lines   ┆ with_m_flag ┆ without_flag │
#> │ ---     ┆ ---         ┆ ---          │
#> │ str     ┆ str         ┆ str          │
#> ╞═════════╪═════════════╪══════════════╡
#> │ I Like  ┆ Those       ┆ null         │
#> │ Those   ┆             ┆              │
#> │ Odds    ┆             ┆              │
#> │ This is ┆ This        ┆ This         │
#> │ The Way ┆             ┆              │
#> └─────────┴─────────────┴──────────────┘