Skip to content

Refactor test.libregrtest #109162

Closed
Closed
@vstinner

Description

@vstinner

I propose to refactor test.libregrtest to make it easier to maintain and to prepare adding type annotations.

The regrtest project has a long history. It was added in 1996 by commit 152494a. When it was created, it was 170 lines long and had 4 command line options: -v (verbose), -q (quiet), -g (generate) and -x (exclude). Slowly, it got more and more features:

  • Better command line interface with argparse (it used getopt at the begining)
  • Run tests in parallel with multiple processes (this code caused me a lot of headaches!)
  • Detect when the "environment" is altered: warnings filters, loggers, etc.
  • Re-run failed tests in verbose mode (now they are run in fresh processes)
  • Detect memory, reference and file descriptor leaks
  • Detect leaked files by creating a temporary directory for each test worker process
  • Best effort to restore the machine to its previous state: wait until threads and processes complete, remove temporary files, etc.
  • etc.

Some of these features were implemented in test.support + test.libregrtest.

A few years ago, I decided to split the giant mono regrtest.py file (for example, it was 2 200 lines of Python code in Python 2.7) into sub-files (!). To make it possible, I passed ns argument which is a bag of "global variables" (technically, it's a Namespace class, see cmdline.py).

The problem is that for type annotation, it's very unclear what a Namespace contains. It may or may not have arguments (see my commit message of this PR: Add missing attributes to Namespace: coverage, threshold, wait.), argument types are weakly defined, etc. Moreover, ns is not only used to "get" variables, but also to set variables! For example, find_tests() overrides ns.args. How is it possible to know which ns attributes are used? Are they "read-only"? We don't know just by reading a function prototype.

This large refactoring cleans up everything in a serie of small changes to pass simple types like bool, str or tuple[str]. It's easier to guess the purpose of a function and its behavior just from its prototype.

I tried to create only short files, the longest is still sadly main.py with 891 lines.

$ wc -l *.py|sort -n
     2 __init__.py
    56 pgo.py
   124 win_utils.py
   159 setup.py
   202 refleak.py
   307 utils.py
   329 save_env.py
   451 cmdline.py
   575 runtest.py
   631 runtest_mp.py
   891 main.py
  3727 total

To understand where the ns magic bag of global variables, look at regrtest.py monster in Python 2.7. Its main() functions defines not less than 34 functions inside the main() function! Variables are defined in the main() prototype!

def main(tests=None, testdir=None, verbose=0, quiet=False,
         exclude=False, single=False, randomize=False, fromfile=None,
         findleaks=False, use_resources=None, trace=False, coverdir='coverage',
         runleaks=False, huntrleaks=False, verbose2=False, print_slow=False,
         random_seed=None, use_mp=None, verbose3=False, forever=False,
         header=False, pgo=False, failfast=False, match_tests=None):

I supposed that it was designed to be able to use regrtest as an API: pass parameters to the main() function, without having to use the command line interface. In the main branch, this feature is still supported, the magic **kwargs bag:

def main(tests=None, **kwargs):
    Regrtest().main(tests=tests, **kwargs)

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    testsTests in the Lib/test dir

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions