Skip to content

Create Categorical DataType

Source code

Description

Create Categorical DataType

Usage

DataType_Categorical(ordering = "physical")

Arguments

ordering Either “physical” (default) or “lexical”.

Details

When a categorical variable is created, its string values (or "lexical" values) are stored and encoded as integers ("physical" values) by order of appearance. Therefore, sorting a categorical value can be done either on the lexical or on the physical values. See Examples.

Value

A Categorical DataType

Examples

library("polars")

# default is to order by physical values
df = pl$DataFrame(x = c("z", "z", "k", "a", "z"), schema = list(x = pl$Categorical()))
df$sort("x")
#> shape: (5, 1)
#> ┌─────┐
#> │ x   │
#> │ --- │
#> │ cat │
#> ╞═════╡
#> │ z   │
#> │ z   │
#> │ z   │
#> │ k   │
#> │ a   │
#> └─────┘
# when setting ordering = "lexical", sorting will be based on the strings
df_lex = pl$DataFrame(
  x = c("z", "z", "k", "a", "z"),
  schema = list(x = pl$Categorical("lexical"))
)
df_lex$sort("x")
#> shape: (5, 1)
#> ┌─────┐
#> │ x   │
#> │ --- │
#> │ cat │
#> ╞═════╡
#> │ a   │
#> │ k   │
#> │ z   │
#> │ z   │
#> │ z   │
#> └─────┘