Data types
Polars
is entirely based on Arrow
data types and backed by Arrow
memory arrays. This makes data processing
cache-efficient and well-supported for Inter Process Communication. Most data types follow the exact implementation
from Arrow
, with exception of Utf8
(this is actually LargeUtf8
), Categorical
, and Object
(support is limited).
The data types are:
Int8
: 8-bit signed integer.Int16
: 16-bit signed integer.Int32
: 32-bit signed integer.Int64
: 64-bit signed integer.UInt8
: 8-bit unsigned integer.UInt16
: 16-bit unsigned integer.UInt32
: 32-bit unsigned integer.UInt64
: 64-bit unsigned integer.Float32
: 32-bit floating point.Float64
: 64-bit floating point.Boolean
: Boolean type effectively bit packed.Utf8
: String data (this is actuallyArrow
LargeUtf8
internally).Binary
: Store data as bytes.List
: A list array contains a child array containing the list values and an offset array. (this is actuallyArrow
LargeList
internally).Struct
: A struct array is represented asVec<Series>
and is useful to pack multiple/heterogenous values in a single column.Object
: A limited supported data type that can be any value.Date
: Date representation, internally represented as days since UNIX epoch encoded by a 32-bit signed integer.Datetime
: Datetime representation, internally represented as microseconds since UNIX epoch encoded by a 64-bit signed integer.Duration
: A timedelta type, internally represented as microseconds. Created when subtractingDate/Datetime
.Time
: Time representation, internally represented as nanoseconds since midnight.
To learn more about the internal representation of these data types, check the Arrow
columnar format.