Skip to content

Commit ad43e09

Browse files
committed
Add helper function for setting --nthreads
This will be useful in Snakemake workflows to prevent over/under-allocating threads.
1 parent a974866 commit ad43e09

File tree

2 files changed

+38
-0
lines changed

2 files changed

+38
-0
lines changed

CHANGES.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,18 @@
22

33
## __NEXT__
44

5+
### Features
6+
7+
* A helper function – `augur.subsample.get_parallelism` – has been added to optimize usage of `augur subsample` in Snakemake workflows. This is experimental and not yet part of the public API. [#1963][] (@victorlin)
8+
59
### Bug fixes
610

711
* filter, merge: Fixed formatting of the error message shown when there are duplicate sequence ids. [#1954][] @victorlin
812
* filter: Adjusted the error message shown when there are missing weights to mention the option of updating values in metadata. [#1956][] @victorlin
913

1014
[#1954]: https://github.com/nextstrain/augur/pull/1954
1115
[#1956]: https://github.com/nextstrain/augur/pull/1956
16+
[#1963]: https://github.com/nextstrain/augur/pull/1963
1217

1318
## 33.0.0 (26 January 2026)
1419

augur/subsample.py

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,39 @@ def run(args: argparse.Namespace) -> None:
233233
sample.remove_output_strains()
234234

235235

236+
def get_parallelism(
237+
config_file: str,
238+
config_section: Optional[List[str]] = None,
239+
limit: int = None
240+
) -> int:
241+
"""Compute the degree of parallelism (i.e., optimal value for ``--nthreads``).
242+
243+
Inspects the subsample config file to return the degree of parallelism that
244+
should be used for ``--nthreads``. Higher values will underutilize
245+
resources, while lower values will underallocate resources and not fully use
246+
available parallelism.
247+
248+
Parameters
249+
----------
250+
config_file
251+
Path to the subsample config file.
252+
253+
config_section
254+
Optional list of keys to navigate to a specific section of the config file.
255+
256+
limit
257+
Upper bound for return value.
258+
259+
Returns
260+
-------
261+
int
262+
Degree of parallelism.
263+
"""
264+
schema_validator = load_json_schema("schema-subsample-config.json")
265+
config = _parse_config(config_file, config_section, schema_validator)
266+
return max(1, min(limit, len(config["samples"])))
267+
268+
236269
def get_referenced_files(
237270
config_file: str,
238271
config_section: Optional[List[str]] = None,

0 commit comments

Comments
 (0)