Skip to content

Help wanted: ideas for improving the test run time #37

@jmcnamara

Description

@jmcnamara

Help wanted

One of the goals of rust_xlsxwriter is fidelity with the xlsx file format generated by Excel. This is achieved using integration tests that take files created in Excel 2007 and compares them file by file and element by element with files created using rust_xlsxwriter.

Here is a typical test file, the associated xlsx file and the test runner code.

This approach has a number of advantages from an maintenance point of view:

  • It allows incremental test-drive development of Excel features.
  • It allows bug reports to be replicated quickly in Excel and compared with rust_xlsxwriter
  • It avoids subjective arguments about whether rust_xlsxwriter or some other third party Excel reading software is correct in its implementation/interpretation of the XLSX file specification since it uses Excel as the standard.

For the end user the benefits of having output files that are effectively identical to files produced by Excel means the maximum possible interoperability with Excel and applications that read XLSX files.

The test suite contains an individual test for each file (although there is sometimes more than one test against the same input file). Each of these tests in compiled into and run as a crate which means the test suite is slow. For usability reasons I don't want to test more than one xlsx file per file/crate (apart from maybe the grouping scheme outlined below).

There are currently ~540 test files and it takes 8+ minutes to run on a 3.2 GHz 6-Core Intel Core i7 with 32GB of fast RAM:

$ time cargo test

real	8m36.340s
user	30m34.062s
sys	9m0.802s

In the GitHub Actions CI this is currently taking around 18 minutes.

There will eventually be around 800 test files so the runtime will be ~50% longer.

nextest is bit faster but not significantly so. The timing also doesn't include the doc tests:

$ time cargo nextest run

real	7m45.029s
user	26m44.624s
sys	6m59.271s

A few months ago when the test suite took around 4 minutes I tried to consolidate the tests into one crate using the advice in this article on Delete Cargo Integration Tests. This was significantly faster by around 5-10x but didn't allow me to run individual tests (I'm 99% sure). I tried to replicate that again to redo the performance testing and verify the running of individual tests but failed for some reasons related to test refactoring since then.

For comparison the Python bytes test suite runs 1600 integration and unit tests in 18 seconds. The Perl test suite takes around 3 minutes and the C test suite takes 5 minutes.

Anyway to the help wanted: if anyone has any ideas how the test runtime might be improved or if you can get the above "Delete Cargo Integration Tests" approach to work again for comparison let me know. I might be able to come up with a hybrid approach where the tests under development or debug are in their own crates and moved back to an overall test crate/folder afterwards.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions