Skip to content

Commit 100ecec

Browse files
committed
Workflows-as-programs: nextstrain run and co.
Adds a new command, `nextstrain run`, to run (compatible) pathogen workflows in a more managed way with easier update paths, without the need for user-facing Git, with support for multiple versions, and with support for concurrent-but-separate analyses via the same workflow. Supported by changes to - `nextstrain setup` to obtain and set up specific versions of pathogens - `nextstrain update` to keep pathogens up-to-date - `nextstrain version` to report on pathogen versions available locally At the moment, the only compatible pathogen is measles at my not-yet-finished demo/prototype branch.¹ Avian flu should not be far behind, though. There's a lot of functionality (and polish) here and elsewhere still todo to fully realize the sweeping goals of workflows-as-programs², but this is a fully-usable first piece of the puzzle that can stand on its own for now. ¹ <nextstrain/measles#55> ² <nextstrain/public#1>
1 parent 9f14e50 commit 100ecec

File tree

29 files changed

+1970
-120
lines changed

29 files changed

+1970
-120
lines changed

doc/commands/index.rst

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ nextstrain
1414

1515
.. code-block:: none
1616
17-
usage: nextstrain [-h] {build,view,deploy,remote,shell,update,setup,check-setup,login,logout,whoami,version,init-shell,authorization,debugger} ...
17+
usage: nextstrain [-h] {run,build,view,deploy,remote,shell,update,setup,check-setup,login,logout,whoami,version,init-shell,authorization,debugger} ...
1818
1919
2020
Nextstrain command-line interface (CLI)
@@ -41,6 +41,10 @@ commands
4141

4242

4343

44+
.. option:: run
45+
46+
Run pathogen workflow. See :doc:`/commands/run`.
47+
4448
.. option:: build
4549

4650
Run pathogen build. See :doc:`/commands/build`.
@@ -63,11 +67,11 @@ commands
6367

6468
.. option:: update
6569

66-
Update a runtime. See :doc:`/commands/update`.
70+
Update a pathogen or runtime. See :doc:`/commands/update`.
6771

6872
.. option:: setup
6973

70-
Set up a runtime. See :doc:`/commands/setup`.
74+
Set up a pathogen or runtime. See :doc:`/commands/setup`.
7175

7276
.. option:: check-setup
7377

doc/commands/run.rst

Lines changed: 259 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,259 @@
1+
.. default-role:: literal
2+
3+
.. role:: command-reference(ref)
4+
5+
.. program:: nextstrain run
6+
7+
.. _nextstrain run:
8+
9+
==============
10+
nextstrain run
11+
==============
12+
13+
.. code-block:: none
14+
15+
usage: nextstrain run [options] <pathogen-name>[@<version>] <workflow-name> <analysis-directory> [<target> [<target> [...]]]
16+
nextstrain run --help
17+
18+
19+
Runs a pathogen workflow in a Nextstrain runtime with config and input from an
20+
analysis directory and outputs written to that same directory.
21+
22+
This command focuses on the routine running of existing pathogen workflows
23+
(mainly provided by Nextstrain) using your own configuration, data, and other
24+
supported customizations. Pathogens are initially set up using `nextstrain
25+
setup` and can be updated over time as desired using `nextstrain update`.
26+
Multiple versions of a pathogen may be set up and run independently without
27+
conflict, allowing for comparisons of output across versions. The same
28+
pathogen workflow may also be concurrently run multiple times with separate
29+
analysis directories (i.e. different configs, input data, etc.) without
30+
conflict, allowing for independent outputs and analyses.
31+
32+
Compared to `nextstrain build`, this command is a higher-level interface to
33+
running pathogen workflows that does not require knowledge of Git or management
34+
of pathogen repositories and source code. For now, the `nextstrain build`
35+
command remains more suitable for active authorship and development of
36+
workflows.
37+
38+
All Nextstrain runtimes are supported. For AWS Batch, all runs will detach
39+
after submission and `nextstrain build` must be used to further monitor or
40+
manage the run and download results after completion.
41+
42+
positional arguments
43+
====================
44+
45+
46+
47+
.. option:: <pathogen-name>[@<version>]
48+
49+
The name (and optionally, version) of a previously set up pathogen.
50+
See :command-reference:`nextstrain setup`. If no version is
51+
specified, then the default version (if any) will be used.
52+
53+
Required.
54+
55+
.. option:: <workflow-name>
56+
57+
The name of a workflow for the given pathogen, e.g. typically
58+
``ingest``, ``phylogenetic``, or ``nextclade``.
59+
60+
Available workflows may vary per pathogen (and possibly between
61+
pathogen version). Some pathogens may provide multiple variants or
62+
base configurations of a top-level workflow, e.g. as in
63+
``phylogenetic/mpxv`` and ``phylogenetic/hmpxv1``. Refer to the
64+
pathogen's own documentation for valid workflow names.
65+
66+
Workflow names conventionally correspond directly to directory
67+
paths in the pathogen source, but this may not always be the case.
68+
69+
Required.
70+
71+
.. option:: <analysis-directory>
72+
73+
The path to your analysis directory. The workflow uses this as its
74+
working directory for all local inputs and outputs, including
75+
config files, input data files, resulting output data files, log
76+
files, etc.
77+
78+
We recommend keeping your config files and static input files (e.g.
79+
reference sequences, inclusion/exclusion lists, annotations, etc.)
80+
in a version control system, such as Git, so you can keep track of
81+
changes over time and recover previous versions. When using
82+
version control, dynamic inputs (e.g. downloaded input filefs) and
83+
outputs (e.g. resulting data files, log files, etc.) should
84+
generally be marked as ignored/excluded from tracking, such as via
85+
:file:`.gitignore` for Git.
86+
87+
An empty directory will be automatically created if the given path
88+
does not exist but its parent directory does.
89+
90+
Required.
91+
92+
.. option:: <target>
93+
94+
One or more workflow targets. A target is either a file path
95+
(relative to :option:`<analysis-directory>`) produced by the
96+
workflow or the name of a workflow rule or step.
97+
98+
Available targets will vary per pathogen (and between versions of
99+
pathogens). Refer to the pathogen's own documentation for valid
100+
targets.
101+
102+
Optional.
103+
104+
options
105+
=======
106+
107+
108+
109+
.. option:: --force
110+
111+
Force a rerun of the whole workflow even if everything seems up-to-date.
112+
113+
.. option:: --cpus <count>
114+
115+
Number of CPUs/cores/threads/jobs to utilize at once. Limits containerized (Docker, AWS Batch) workflow runs to this amount. Informs Snakemake's resource scheduler when applicable. Informs the AWS Batch instance size selection. By default, no constraints are placed on how many CPUs are used by a workflow run; workflow runs may use all that are available if they're able to.
116+
117+
.. option:: --memory <quantity>
118+
119+
Amount of memory to make available to the workflow run. Units of b, kb, mb, gb, kib, mib, gib are supported. Limits containerized (Docker, AWS Batch) workflow runs to this amount. Informs Snakemake's resource scheduler when applicable. Informs the AWS Batch instance size selection.
120+
121+
.. option:: --exclude-from-upload <pattern>
122+
123+
Exclude files matching ``<pattern>`` from being uploaded as part of
124+
the remote build. Shell-style advanced globbing is supported, but
125+
be sure to escape wildcards or quote the whole pattern so your
126+
shell doesn't expand them. May be passed more than once.
127+
Currently only supported when also using :option:`--aws-batch`.
128+
Default is to upload the entire pathogen build directory (except
129+
for some ancillary files which are always excluded).
130+
131+
Note that files excluded from upload may still be downloaded from
132+
the remote build, e.g. if they're created by it, and if downloaded
133+
will overwrite the local files. When attaching to the build, use
134+
:option:`nextstrain build --no-download` to avoid downloading any
135+
files or :option:`nextstrain build --exclude-from-download` to
136+
avoid downloading specific files.
137+
138+
Besides basic glob features like single-part wildcards (``*``),
139+
character classes (``[…]``), and brace expansion (``{…, …}``),
140+
several advanced globbing features are also supported: multi-part
141+
wildcards (``**``), extended globbing (``@(…)``, ``+(…)``, etc.),
142+
and negation (``!…``).
143+
144+
Patterns should be relative to the build directory.
145+
146+
147+
148+
149+
.. option:: --help, -h
150+
151+
Show a brief help message of common options and exit
152+
153+
.. option:: --help-all
154+
155+
Show a full help message of all options and exit
156+
157+
runtime selection options
158+
=========================
159+
160+
Select the Nextstrain runtime to use, if the
161+
default is not suitable.
162+
163+
.. option:: --docker
164+
165+
Run commands inside a container image using Docker. (default)
166+
167+
.. option:: --conda
168+
169+
Run commands with access to a fully-managed Conda environment.
170+
171+
.. option:: --singularity
172+
173+
Run commands inside a container image using Singularity.
174+
175+
.. option:: --ambient
176+
177+
Run commands in the ambient environment, outside of any container image or managed environment.
178+
179+
.. option:: --aws-batch
180+
181+
Run commands remotely on AWS Batch inside the Nextstrain container image.
182+
183+
runtime options
184+
===============
185+
186+
Options shared by all runtimes.
187+
188+
.. option:: --env <name>[=<value>]
189+
190+
Set the environment variable ``<name>`` to the value in the current environment (i.e. pass it thru) or to the given ``<value>``. May be specified more than once. Overrides any variables of the same name set via :option:`--envdir`. When this option or :option:`--envdir` is given, the default behaviour of automatically passing thru several "well-known" variables is disabled. The "well-known" variables are ``AUGUR_RECURSION_LIMIT``, ``AUGUR_MINIFY_JSON``, ``AWS_ACCESS_KEY_ID``, ``AWS_SECRET_ACCESS_KEY``, ``AWS_SESSION_TOKEN``, ``ID3C_URL``, ``ID3C_USERNAME``, ``ID3C_PASSWORD``, ``RETHINK_HOST``, and ``RETHINK_AUTH_KEY``. Pass those variables explicitly via :option:`--env` or :option:`--envdir` if you need them in combination with other variables.
191+
192+
.. option:: --envdir <path>
193+
194+
Set environment variables from the envdir at ``<path>``. May be specified more than once. An envdir is a directory containing files describing environment variables. Each filename is used as the variable name. The first line of the contents of each file is used as the variable value. When this option or :option:`--env` is given, the default behaviour of automatically passing thru several "well-known" variables is disabled. Envdirs may also be specified by setting ``NEXTSTRAIN_RUNTIME_ENVDIRS`` in the environment to a ``:``-separated list of paths. See the description of :option:`--env` for more details.
195+
196+
development options
197+
===================
198+
199+
These should generally be unnecessary unless you're developing Nextstrain.
200+
201+
.. option:: --image <image>
202+
203+
Container image name to use for the Nextstrain runtime (default: nextstrain/base for Docker and AWS Batch, docker://nextstrain/base for Singularity)
204+
205+
.. option:: --augur <dir>
206+
207+
Replace the image's copy of augur with a local copy
208+
209+
.. option:: --auspice <dir>
210+
211+
Replace the image's copy of auspice with a local copy
212+
213+
.. option:: --fauna <dir>
214+
215+
Replace the image's copy of fauna with a local copy
216+
217+
.. option:: --sacra <dir>
218+
219+
Replace the image's copy of sacra with a local copy
220+
221+
.. option:: --exec <prog>
222+
223+
Program to run inside the runtime
224+
225+
development options for --docker
226+
================================
227+
228+
229+
230+
.. option:: --docker-arg ...
231+
232+
Additional arguments to pass to `docker run`
233+
234+
development options for --aws-batch
235+
===================================
236+
237+
See <https://docs.nextstrain.org/projects/cli/page/aws-batch>
238+
for more information.
239+
240+
.. option:: --aws-batch-job <name>
241+
242+
Name of the AWS Batch job definition to use
243+
244+
.. option:: --aws-batch-queue <name>
245+
246+
Name of the AWS Batch job queue to use
247+
248+
.. option:: --aws-batch-s3-bucket <name>
249+
250+
Name of the AWS S3 bucket to use as shared storage
251+
252+
.. option:: --aws-batch-cpus <count>
253+
254+
Number of vCPUs to request for job
255+
256+
.. option:: --aws-batch-memory <mebibytes>
257+
258+
Amount of memory in MiB to request for job
259+

doc/commands/setup.rst

Lines changed: 32 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -12,15 +12,24 @@ nextstrain setup
1212

1313
.. code-block:: none
1414
15-
usage: nextstrain setup [-h] [--dry-run] [--force] [--set-default] <runtime>
15+
usage: nextstrain setup [--dry-run] [--force] [--set-default] <pathogen-name>[@<version>[=<url>]]
16+
nextstrain setup [--dry-run] [--force] [--set-default] <runtime-name>
17+
nextstrain setup --help
1618
1719
18-
Sets up a Nextstrain runtime for use with `nextstrain build`, `nextstrain
19-
view`, etc.
20+
Sets up a Nextstrain pathogen for use with `nextstrain run` or a Nextstrain
21+
runtime for use with `nextstrain run`, `nextstrain build`, `nextstrain view`,
22+
etc.
2023

21-
Only the Conda runtime currently supports automated set up, but this command
22-
may still be used with other runtimes to check an existing (manual) setup and
23-
set the runtime as the default on success.
24+
For pathogens, set up involves downloading a specific version of the pathogen's
25+
Nextstrain workflows. By convention, this download is from Nextstrain's
26+
repositories. More than one version of the same pathogen may be set up and
27+
used independently. This can be useful for comparing analyses across workflow
28+
versions. A default version can be set.
29+
30+
For runtimes, only the Conda runtime currently supports fully-automated set up,
31+
but this command may still be used with other runtimes to check an existing
32+
(manual) setup and set the runtime as the default on success.
2433

2534
Exits with an error code if automated set up fails or if setup checks fail.
2635

@@ -29,9 +38,23 @@ positional arguments
2938

3039

3140

32-
.. option:: <runtime>
41+
.. option:: <pathogen>|<runtime>
42+
43+
The Nextstrain pathogen or runtime to set up.
44+
45+
A pathogen is usually the plain name of a Nextstrain-maintained
46+
pathogen (e.g. ``measles``), optionally with an ``@<version>``
47+
specifier (e.g. ``measles@v42``). If ``<version>`` is specified in
48+
this case, it must be a tag name (i.e. a release name), development
49+
branch name, or a development commit id.
50+
51+
A pathogen may also be fully-specified as ``<name>@<version>=<url>``
52+
where ``<name>`` and ``<version>`` in this case are (mostly)
53+
arbitrary and ``<url>`` points to a ZIP file containing the
54+
pathogen workflow contents.
55+
56+
A runtime is one of {docker, conda, singularity, ambient, aws-batch}.
3357

34-
The Nextstrain runtime to set up. One of {docker, conda, singularity, ambient, aws-batch}.
3558

3659
options
3760
=======
@@ -52,5 +75,5 @@ options
5275

5376
.. option:: --set-default
5477

55-
Use the runtime as the default if set up is successful.
78+
Use this pathogen version or runtime as the default if set up is successful.
5679

0 commit comments

Comments
 (0)