Test suite

Info

Additional information on the Rust test suite will be added to this page later.

The py-polars/tests folder contains the main Polars test suite. This page contains some information on the various components of the test suite, as well as guidelines for writing new tests.

The test suite contains four main components, each confined to their own folder: unit tests, parametric tests, benchmark tests, and doctests.

Note that this test suite is indirectly responsible for testing Rust Polars as well. The Rust test suite is kept small to reduce compilation times. A lot of the Rust functionality is tested here instead.

Unit tests

The unit folder contains all regular unit tests. These tests are intended to make sure all Polars functionality works as intended.

Running unit tests

Run unit tests by running make test from the py-polars folder. This will compile the Rust bindings and then run the unit tests.

If you're working in the Python code only, you can avoid recompiling every time by simply running pytest instead from your virtual environment.

By default, slow tests are skipped. Slow tests are marked as such using a custom pytest marker. If you wish to run slow tests, run pytest -m slow. Or run pytest -m "" to run all tests, regardless of marker.

Tests can be run in parallel by running pytest -n auto. The parallelization is handled by pytest-xdist.

Writing unit tests

Whenever you add new functionality, you should also add matching unit tests. Add your tests to appropriate test module in the unit folder. Some guidelines to keep in mind:

Try to fully cover all possible inputs and edge cases you can think of.
Utilize pytest tools like fixture and parametrize where appropriate.
Since many tests will require some data to be defined first, it can be efficient to run multiple checks in a single test. This can also be addressed using pytest fixtures.
Unit tests should not depend on external factors, otherwise test parallelization will break.

Parametric tests

The parametric folder contains parametric tests written using the Hypothesis framework. These tests are intended to find and test edge cases by generating many random datapoints.

Running parametric tests

Run parametric tests by running pytest -m hypothesis.

Note that parametric tests are excluded by default when running pytest. You must explicitly specify -m hypothesis to run them.

These tests will be included when calculating test coverage, and will also be run as part of the make test-all make command.

Doctests

The docs folder contains a script for running doctest. This folder does not contain any actual tests - rather, the script checks all docstrings in the Polars package for Examples sections, runs the code examples, and verifies the output.

The aim of running doctest is to make sure the Examples sections in our docstrings are valid and remain up-to-date with code changes.

Running `doctest`

To run the doctest module, run make doctest from the py-polars folder. You can also run the script directly from your virtual environment.

Note that doctests are not run using pytest. While pytest does have the capability to run doc examples, configuration options are too limited for our purposes.

Doctests will not count towards test coverage. They are not a substitute for unit tests, but rather intended to convey the intended use of the Polars API to the user.

Writing doc examples

Almost all classes/methods/functions that are part of Polars' public API should include code examples in their docstring. These examples help users understand basic usage and allow us to illustrate more advanced concepts as well. Some guidelines for writing a good docstring Examples section:

Start with a minimal example that showcases the default functionality.
Showcase the effect of its parameters.
Showcase any special interactions when combined with other code.
Keep it succinct and avoid multiple examples showcasing the same thing.

There are many great docstring examples already, just check other code if you need inspiration!

In addition to the regular options available when writing doctests, the script configuration allows for a new IGNORE_RESULT directive. Use this directive if you want to ensure the code runs, but the output may be random by design or not interesting to check.

>>> df.sample(n=2)  # doctest: +IGNORE_RESULT

Benchmark tests

The benchmark folder contains code for running various benchmark tests. The aim of this part of the test suite is to spot performance regressions in the code, and to verify that Polars functionality works as expected when run on a release build or at a larger scale.

Polars uses CodSpeed for tracking the performance of the benchmark tests.

Generating data

For many tests, a relatively large dataset must be generated first. We use an R script to generate this data. The script was taken from the H2O AI database benchmark, which is the foundation for many of the benchmark tests.

For the exact steps to generate the data, please refer to the benchmark workflow. It involves installing R, installing the data.table dependency, and executing a data generation script.

Running the benchmark tests

The benchmark tests can be run using pytest. Run pytest -m benchmark --durations 0 -v to run these tests and report run duration.

Note that benchmark tests are excluded by default when running pytest. You must explicitly specify -m benchmark to run them. They will also be excluded when calculating test coverage.

These tests will be run as part of the make test-all make command.