Skip to content

Disable probabilistic sampling when --sequences-per-group is specified#737

Merged
huddlej merged 1 commit intomasterfrom
subsampling-with-sequences-per-group
Jun 15, 2021
Merged

Disable probabilistic sampling when --sequences-per-group is specified#737
huddlej merged 1 commit intomasterfrom
subsampling-with-sequences-per-group

Conversation

@benjaminotter
Copy link
Copy Markdown
Contributor

Description of proposed changes

Sets the default value for probabilistic sampling to the value of the argument parser.
This boolean value is set to False when --sequences-per-group is specified by the user.

Related issue(s)

Fixes #736

Testing

Tested as mentioned in #736 with:

cd tests/functional
augur filter \
    --metadata filter/metadata.tsv \
    --group-by region \
    --sequences-per-group 1 \
    --output-strains test.txt

Ouput:

8 strains were dropped during filtering
        8 of these were dropped because of subsampling criteria
3 strains passed all filters

@benjaminotter benjaminotter requested a review from huddlej June 15, 2021 14:25
@codecov
Copy link
Copy Markdown

codecov bot commented Jun 15, 2021

Codecov Report

Merging #737 (ffb5233) into master (52b0ccb) will increase coverage by 0.00%.
The diff coverage is 16.66%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #737   +/-   ##
=======================================
  Coverage   31.46%   31.47%           
=======================================
  Files          41       41           
  Lines        5704     5707    +3     
  Branches     1385     1386    +1     
=======================================
+ Hits         1795     1796    +1     
- Misses       3835     3836    +1     
- Partials       74       75    +1     
Impacted Files Coverage Δ
augur/filter.py 48.52% <16.66%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 52b0ccb...ffb5233. Read the comment docs.

…fic value

Avoids probabilistic behavior when users specify a fixed number of
sequences per group during subsampling. Adds a functional test for the
simplest use of sequences per group.

Fixes #736
@huddlej huddlej force-pushed the subsampling-with-sequences-per-group branch from 363c9f8 to ffb5233 Compare June 15, 2021 22:33
@huddlej huddlej marked this pull request as ready for review June 15, 2021 22:34
Copy link
Copy Markdown
Contributor

@huddlej huddlej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, @benjaminotter! Thank you! I added the functional test to make sure we're covering this use case.

@huddlej huddlej added this to the Feature release 13.0.0 milestone Jun 15, 2021
@huddlej huddlej merged commit eb1e850 into master Jun 15, 2021
@huddlej huddlej deleted the subsampling-with-sequences-per-group branch June 15, 2021 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

filter: Do not subsample probabilistically when users specify --sequences-per-group

2 participants