Interface StringSeries

String functions for Series

interface StringSeries {
    concat(delimiter: string, ignoreNulls?: boolean): pl.Series;
    contains(
        pat: string | RegExp,
        literal?: boolean,
        strict?: boolean,
    ): pl.Series;
    decode(encoding: "base64" | "hex", strict?: boolean): pl.Series;
    decode(
        options: { encoding: "base64" | "hex"; strict?: boolean },
    ): pl.Series;
    encode(encoding: "base64" | "hex"): pl.Series;
    extract(pattern: string | RegExp, groupIndex: number): pl.Series;
    jsonDecode(dtype?: DataType, inferSchemaLength?: number): pl.Series;
    jsonPathMatch(jsonPath: string): pl.Series;
    lengths(): pl.Series;
    lstrip(): pl.Series;
    padEnd(length: number, fillChar: string): pl.Series;
    padStart(length: number, fillChar: string): pl.Series;
    replace(pattern: string | RegExp, value: string): pl.Series;
    replaceAll(pattern: string | RegExp, value: string): pl.Series;
    rstrip(): pl.Series;
    slice(start: number | pl.Expr, length?: number | pl.Expr): pl.Series;
    split(
        separator: string,
        options?: boolean | { inclusive?: boolean },
    ): pl.Series;
    strip(): pl.Series;
    strptime(datatype: Date, fmt?: string): pl.Series;
    strptime(datatype: Datetime, fmt?: string): pl.Series;
    strptime(
        datatype: (
            timeUnit?: TimeUnit | "ms" | "ns" | "us",
            timeZone?: undefined | null | string,
        ) => Datetime,
        fmt?: string,
    ): pl.Series;
    toLowerCase(): pl.Series;
    toUpperCase(): pl.Series;
    zFill(length: number | pl.Expr): pl.Series;
}

Hierarchy

Methods

  • Vertically concat the values in the Series to a single string value.

    Parameters

    • delimiter: string
    • OptionalignoreNulls: boolean

    Returns pl.Series

    > pl.Series([1, null, 2]).str.concat("-")[0]
    '1-null-2'
  • Check if strings in Series contain a substring that matches a pattern.

    Parameters

    • pat: string | RegExp

      A valid regular expression pattern, compatible with the regex crate @param literal Treat pattern` as a literal string, not as a regular expression.

    • Optionalliteral: boolean
    • Optionalstrict: boolean

      Raise an error if the underlying pattern is not a valid regex, otherwise mask out with a null value.

    Returns pl.Series

    Boolean mask

    > pl.Series(["Crab", "cat and dog", "rab$bit", null]).str.contains("cat|bit")
    shape: (4,)
    Series: '' [bool]
    [
    false
    true
    true
    null
    ]
  • Decodes a value in Series using the provided encoding

    Parameters

    • encoding: "base64" | "hex"

      hex | base64

    • Optionalstrict: boolean

      how to handle invalid inputs

      - true: method will throw error if unable to decode a value
      - false: unhandled values will be replaced with `null`
      

    Returns pl.Series

    s = pl.Series("strings", ["666f6f", "626172", null])
    s.str.decode("hex")
    shape: (3,)
    Series: 'strings' [str]
    [
    "foo",
    "bar",
    null
    ]
  • Decodes a value using the provided encoding

    Parameters

    • options: { encoding: "base64" | "hex"; strict?: boolean }

    Returns pl.Series

    > df = pl.DataFrame({"strings": ["666f6f", "626172", null]})
    > df.select(col("strings").str.decode("hex"))
    shape: (3, 1)
    ┌─────────┐
    strings
    │ --- │
    str
    ╞═════════╡
    foo
    ├╌╌╌╌╌╌╌╌╌┤
    bar
    ├╌╌╌╌╌╌╌╌╌┤
    null
    └─────────┘
  • Encodes a value in Series using the provided encoding

    Parameters

    • encoding: "base64" | "hex"

      hex | base64

    Returns pl.Series

    s = pl.Series("strings", ["foo", "bar", null])
    s.str.encode("hex")
    shape: (3,)
    Series: 'strings' [str]
    [
    "666f6f",
    "626172",
    null
    ]
  • Extract the target capture group from provided patterns.

    Parameters

    • pattern: string | RegExp

      A valid regex pattern

    • groupIndex: number

      Index of the targeted capture group. Group 0 mean the whole pattern, first group begin at index 1 Default to the first capture group

    Returns pl.Series

    Utf8 array. Contain null if original value is null or regex capture nothing.

    >  df = pl.DataFrame({
    ... 'a': [
    ... 'http://vote.com/ballon_dor?candidate=messi&ref=polars',
    ... 'http://vote.com/ballon_dor?candidat=jorginho&ref=polars',
    ... 'http://vote.com/ballon_dor?candidate=ronaldo&ref=polars'
    ... ]})
    > df.getColumn("a").str.extract(/candidate=(\w+)/, 1)
    shape: (3, 1)
    ┌─────────┐
    a
    │ --- │
    str
    ╞═════════╡
    messi
    ├╌╌╌╌╌╌╌╌╌┤
    null
    ├╌╌╌╌╌╌╌╌╌┤
    ronaldo
    └─────────┘
  • Parse string values in Series as JSON.

    Parameters

    • Optionaldtype: DataType
    • OptionalinferSchemaLength: number

    Returns pl.Series

    Utf8 array. Contain null if original value is null or the jsonPath return nothing.

    s = pl.Series("json", ['{"a":1, "b": true}', null, '{"a":2, "b": false}']);
    s.str.jsonDecode().as("json");
    shape: (3,)
    Series: 'json' [struct[2]]
    [
    {1,true}
    {null,null}
    {2,false}
    ]
  • Extract the first match of json string in Series with provided JSONPath expression. Throw errors if encounter invalid json strings. All return value will be casted to Utf8 regardless of the original value.

    Parameters

    • jsonPath: string

      A valid JSON path query string

    Returns pl.Series

    Utf8 array. Contain null if original value is null or the jsonPath return nothing.

    > s = pl.Series('json_val', [
    ... '{"a":"1"}',
    ... null,
    ... '{"a":2}',
    ... '{"a":2.1}',
    ... '{"a":true}'
    ... ])
    > s.str.jsonPathMatch('$.a')
    shape: (5,)
    Series: 'json_val' [str]
    [
    "1"
    null
    "2"
    "2.1"
    "true"
    ]
  • Get number of chars of the string values in Series. df = pl.Series(["Café", "345", "東京", null]) .str.lengths().alias("n_chars") shape: (4,) Series: 'n_chars' [u32] [ 4 3 2 null ]

    Returns pl.Series

  • Add a leading fillChar to a string in Series until string length is reached. If string is longer or equal to given length no modifications will be done

    Parameters

    • length: number

      of the final string

    • fillChar: string

      that will fill the string.

    Returns pl.Series

    If a string longer than 1 character is provided only the first character will be used

    > df = pl.DataFrame({
    ... 'foo': [
    ... "a",
    ... "b",
    ... "LONG_WORD",
    ... "cow"
    ... ]})
    > df.select(pl.col('foo').str.padStart("_", 3)
    shape: (4, 1)
    ┌──────────┐
    a
    │ -------- │
    str
    ╞══════════╡
    __a
    ├╌╌╌╌╌╌╌╌╌╌┤
    __b
    ├╌╌╌╌╌╌╌╌╌╌┤
    LONG_WORD
    ├╌╌╌╌╌╌╌╌╌╌┤
    cow
    └──────────┘
  • Replace first regex match with a string value in Series.

    Parameters

    • pattern: string | RegExp

      A valid regex pattern or string

    • value: string

      Substring to replace.

    Returns pl.Series

    df = pl.Series(["#12.34", "#56.78"]).str.replace(/#(\d+)/, "$$$1")
    shape: (2,)
    Series: '' [str]
    [
    "$12.34"
    "$56.78"
    ]
  • Replace all regex matches with a string value in Series.

    Parameters

    • pattern: string | RegExp

      A valid regex pattern or string

    • value: string

      Substring to replace.

    Returns pl.Series

    df = pl.Series(["abcabc", "123a123"]).str.replaceAll("a", "-");
    shape: (2,)
    Series: '' [str]
    [
    "-bc-bc"
    "123-123"
    ]
  • Create subslices of the string values of a Utf8 Series.

    Parameters

    • start: number | pl.Expr

      Start of the slice (negative indexing may be used).

    • Optionallength: number | pl.Expr

      Optional length of the slice.

    Returns pl.Series

  • Split a string into substrings using the specified separator. The return type will by of type List

    Parameters

    • separator: string

      — A string that identifies character or characters to use in separating the string.

    • Optionaloptions: boolean | { inclusive?: boolean }

    Returns pl.Series

  • Parse a Series of dtype Utf8 to a Date/Datetime Series.

    Parameters

    • datatype: Date

      Date or Datetime.

    • Optionalfmt: string

      formatting syntax. Read more

    Returns pl.Series

  • Parse a Series of dtype Utf8 to a Date/Datetime Series.

    Parameters

    • datatype: Datetime

      Date or Datetime.

    • Optionalfmt: string

      formatting syntax. Read more

    Returns pl.Series

  • Parse a Series of dtype Utf8 to a Date/Datetime Series.

    Parameters

    • datatype: (
          timeUnit?: TimeUnit | "ms" | "ns" | "us",
          timeZone?: undefined | null | string,
      ) => Datetime

      Date or Datetime.

        • (
              timeUnit?: TimeUnit | "ms" | "ns" | "us",
              timeZone?: undefined | null | string,
          ): Datetime
        • Calendar date and time type

          Parameters

          • OptionaltimeUnit: TimeUnit | "ms" | "ns" | "us"

            any of 'ms' | 'ns' | 'us'

          • timeZone: undefined | null | string = null

            timezone string as defined by Intl.DateTimeFormat America/New_York for example.

          Returns Datetime

    • Optionalfmt: string

      formatting syntax. Read more

    Returns pl.Series

  • Add a leading '0' to a string until string length is reached. If string is longer or equal to given length no modifications will be done

    Parameters

    • length: number | pl.Expr

      of the final string

    Returns pl.Series

    > df = pl.DataFrame({
    ... 'foo': [
    ... "a",
    ... "b",
    ... "LONG_WORD",
    ... "cow"
    ... ]})
    > df.select(pl.col('foo').str.padStart(3)
    shape: (4, 1)
    ┌──────────┐
    a
    │ -------- │
    str
    ╞══════════╡
    │ 00a
    ├╌╌╌╌╌╌╌╌╌╌┤
    │ 00b
    ├╌╌╌╌╌╌╌╌╌╌┤
    LONG_WORD
    ├╌╌╌╌╌╌╌╌╌╌┤
    cow
    └──────────┘