polars.testing.parametric.dataframes#

polars.testing.parametric.dataframes(
cols: int | column | Sequence[column] | None = None,
*,
lazy: bool = False,
min_cols: int | None = 0,
max_cols: int | None = 8,
size: int | None = None,
min_size: int | None = 0,
max_size: int | None = 10,
chunked: bool | None = None,
include_cols: Sequence[column] | column | None = None,
null_probability: float | dict[str, float] = 0.0,
allow_infinities: bool = True,
allowed_dtypes: Collection[PolarsDataType] | PolarsDataType | None = None,
excluded_dtypes: Collection[PolarsDataType] | PolarsDataType | None = None,
) SearchStrategy[DataFrame | LazyFrame][source]#

Hypothesis strategy for producing polars DataFrames or LazyFrames.

Parameters:
cols{int, columns}, optional

integer number of columns to create, or a sequence of column objects that describe the desired DataFrame column data.

lazybool, optional

produce a LazyFrame instead of a DataFrame.

min_colsint, optional

if not passing an exact size, can set a minimum here (defaults to 0).

max_colsint, optional

if not passing an exact size, can set a maximum value here (defaults to MAX_COLS).

sizeint, optional

if set, will create a DataFrame of exactly this size (and ignore the min_size/max_size len params).

min_sizeint, optional

if not passing an exact size, set the minimum number of rows in the DataFrame.

max_sizeint, optional

if not passing an exact size, set the maximum number of rows in the DataFrame.

chunkedbool, optional

ensure that DataFrames with more than row have n_chunks > 1. if omitted, chunking will be randomised at the level of individual Series.

include_cols[column], optional

a list of column objects to include in the generated DataFrame. note that explicitly provided columns are appended onto the list of existing columns (if any present).

null_probability{float, dict[str,float]}, optional

percentage chance (expressed between 0.0 => 1.0) that a generated value is None. this is applied independently of any None values generated by the underlying strategy, and can be applied either on a per-column basis (if given as a {col:pct} dict), or globally. if null_probability is defined on a column, it takes precedence over the global value.

allow_infinitiesbool, optional

optionally disallow generation of +/-inf values for floating-point dtypes.

allowed_dtypes{list,set}, optional

when automatically generating data, allow only these dtypes.

excluded_dtypes{list,set}, optional

when automatically generating data, exclude these dtypes.

Notes

In actual usage this is deployed as a unit test decorator, providing a strategy that generates DataFrames or LazyFrames with the given characteristics for the unit test. While developing a strategy/test, it can also be useful to call .example() directly on a given strategy to see concrete instances of the generated data.

Examples

Use column or columns to specify the schema of the types of DataFrame to generate. Note: in actual use the strategy is applied as a test decorator, not used standalone.

>>> from polars.testing.parametric import column, columns, dataframes
>>> from hypothesis import given

Generate arbitrary DataFrames (as part of a unit test):

>>> @given(df=dataframes())
... def test_repr(df: pl.DataFrame) -> None:
...     assert isinstance(repr(df), str)

Generate LazyFrames with at least 1 column, random dtypes, and specific size:

>>> dfs = dataframes(min_cols=1, max_size=5, lazy=True)
>>> dfs.example()  
<polars.LazyFrame object at 0x11F561580>

Generate DataFrames with known colnames, random dtypes (per test, not per-frame):

>>> dfs = dataframes(columns(["x", "y", "z"]))
>>> dfs.example()  
shape: (3, 3)
┌────────────┬───────┬────────────────────────────┐
│ x          ┆ y     ┆ z                          │
│ ---        ┆ ---   ┆ ---                        │
│ date       ┆ u16   ┆ datetime[μs]               │
╞════════════╪═══════╪════════════════════════════╡
│ 0565-08-12 ┆ 34715 ┆ 5844-09-20 00:33:31.076854 │
│ 3382-10-17 ┆ 48662 ┆ 7540-01-29 11:20:14.836271 │
│ 4063-06-17 ┆ 39092 ┆ 1889-05-05 13:25:41.874455 │
└────────────┴───────┴────────────────────────────┘

Generate frames with explicitly named/typed columns and a fixed size:

>>> dfs = dataframes(
...     [
...         column("x", dtype=pl.Int32),
...         column("y", dtype=pl.Float64),
...     ],
...     size=2,
... )
>>> dfs.example()  
shape: (2, 2)
┌───────────┬────────────┐
│ x         ┆ y          │
│ ---       ┆ ---        │
│ i32       ┆ f64        │
╞═══════════╪════════════╡
│ -15836    ┆ 1.1755e-38 │
│ 575050513 ┆ NaN        │
└───────────┴────────────┘