polars.LazyFrame.describe#

LazyFrame.describe(
percentiles: Sequence[float] | float | None = (0.25, 0.5, 0.75),
*,
interpolation: RollingInterpolationMethod = 'nearest',
) DataFrame[source]#

Creates a summary of statistics for a LazyFrame, returning a DataFrame.

Parameters:
percentiles

One or more percentiles to include in the summary statistics. All values must be in the range [0, 1].

interpolation{‘nearest’, ‘higher’, ‘lower’, ‘midpoint’, ‘linear’}

Interpolation method used when calculating percentiles.

Returns:
DataFrame

Warning

  • This method does not maintain the laziness of the frame, and will collect the final result. This could potentially be an expensive operation.

  • We do not guarantee the output of describe to be stable. It will show statistics that we deem informative, and may be updated in the future. Using describe programmatically (versus interactive exploration) is not recommended for this reason.

Notes

The median is included by default as the 50% percentile.

Examples

>>> from datetime import date, time
>>> lf = pl.LazyFrame(
...     {
...         "float": [1.0, 2.8, 3.0],
...         "int": [40, 50, None],
...         "bool": [True, False, True],
...         "str": ["zz", "xx", "yy"],
...         "date": [date(2020, 1, 1), date(2021, 7, 5), date(2022, 12, 31)],
...         "time": [time(10, 20, 30), time(14, 45, 50), time(23, 15, 10)],
...     }
... )

Show default frame statistics:

>>> lf.describe()
shape: (9, 7)
┌────────────┬──────────┬──────────┬──────────┬──────┬────────────┬──────────┐
│ statistic  ┆ float    ┆ int      ┆ bool     ┆ str  ┆ date       ┆ time     │
│ ---        ┆ ---      ┆ ---      ┆ ---      ┆ ---  ┆ ---        ┆ ---      │
│ str        ┆ f64      ┆ f64      ┆ f64      ┆ str  ┆ str        ┆ str      │
╞════════════╪══════════╪══════════╪══════════╪══════╪════════════╪══════════╡
│ count      ┆ 3.0      ┆ 2.0      ┆ 3.0      ┆ 3    ┆ 3          ┆ 3        │
│ null_count ┆ 0.0      ┆ 1.0      ┆ 0.0      ┆ 0    ┆ 0          ┆ 0        │
│ mean       ┆ 2.266667 ┆ 45.0     ┆ 0.666667 ┆ null ┆ 2021-07-02 ┆ 16:07:10 │
│ std        ┆ 1.101514 ┆ 7.071068 ┆ null     ┆ null ┆ null       ┆ null     │
│ min        ┆ 1.0      ┆ 40.0     ┆ 0.0      ┆ xx   ┆ 2020-01-01 ┆ 10:20:30 │
│ 25%        ┆ 2.8      ┆ 40.0     ┆ null     ┆ null ┆ 2021-07-05 ┆ 14:45:50 │
│ 50%        ┆ 2.8      ┆ 50.0     ┆ null     ┆ null ┆ 2021-07-05 ┆ 14:45:50 │
│ 75%        ┆ 3.0      ┆ 50.0     ┆ null     ┆ null ┆ 2022-12-31 ┆ 23:15:10 │
│ max        ┆ 3.0      ┆ 50.0     ┆ 1.0      ┆ zz   ┆ 2022-12-31 ┆ 23:15:10 │
└────────────┴──────────┴──────────┴──────────┴──────┴────────────┴──────────┘

Customize which percentiles are displayed, applying linear interpolation:

>>> with pl.Config(tbl_rows=12):
...     lf.describe(
...         percentiles=[0.1, 0.3, 0.5, 0.7, 0.9],
...         interpolation="linear",
...     )
shape: (11, 7)
┌────────────┬──────────┬──────────┬──────────┬──────┬────────────┬──────────┐
│ statistic  ┆ float    ┆ int      ┆ bool     ┆ str  ┆ date       ┆ time     │
│ ---        ┆ ---      ┆ ---      ┆ ---      ┆ ---  ┆ ---        ┆ ---      │
│ str        ┆ f64      ┆ f64      ┆ f64      ┆ str  ┆ str        ┆ str      │
╞════════════╪══════════╪══════════╪══════════╪══════╪════════════╪══════════╡
│ count      ┆ 3.0      ┆ 2.0      ┆ 3.0      ┆ 3    ┆ 3          ┆ 3        │
│ null_count ┆ 0.0      ┆ 1.0      ┆ 0.0      ┆ 0    ┆ 0          ┆ 0        │
│ mean       ┆ 2.266667 ┆ 45.0     ┆ 0.666667 ┆ null ┆ 2021-07-02 ┆ 16:07:10 │
│ std        ┆ 1.101514 ┆ 7.071068 ┆ null     ┆ null ┆ null       ┆ null     │
│ min        ┆ 1.0      ┆ 40.0     ┆ 0.0      ┆ xx   ┆ 2020-01-01 ┆ 10:20:30 │
│ 10%        ┆ 1.36     ┆ 41.0     ┆ null     ┆ null ┆ 2020-04-20 ┆ 11:13:34 │
│ 30%        ┆ 2.08     ┆ 43.0     ┆ null     ┆ null ┆ 2020-11-26 ┆ 12:59:42 │
│ 50%        ┆ 2.8      ┆ 45.0     ┆ null     ┆ null ┆ 2021-07-05 ┆ 14:45:50 │
│ 70%        ┆ 2.88     ┆ 47.0     ┆ null     ┆ null ┆ 2022-02-07 ┆ 18:09:34 │
│ 90%        ┆ 2.96     ┆ 49.0     ┆ null     ┆ null ┆ 2022-09-13 ┆ 21:33:18 │
│ max        ┆ 3.0      ┆ 50.0     ┆ 1.0      ┆ zz   ┆ 2022-12-31 ┆ 23:15:10 │
└────────────┴──────────┴──────────┴──────────┴──────┴────────────┴──────────┘