Skip to content

New module: bascule #8751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Conversation

elena-buscaroli
Copy link
Contributor

PR checklist

Closes #8750

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Emit the versions.yml file.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

Copy link
Contributor

@akaviaLab akaviaLab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the code seems well written. You will have to make sure it passes linting and nf-test, but you probably know that. All the tests are failing due to lack of space on the device, so I think you need to update the resources required in the process section https://www.nextflow.io/docs/latest/reference/process.html#disk

I have one big problem, which I don't fully understand. It looks like R is used to 1) untar the matrices, 2) apply some transformations, and pass it on to pybascule. Later, R is used for 3) visualization.

Running python via reticualte is relatively slow and has overhead. Why not write a simple python script that will untar (via https://docs.python.org/3/library/tarfile.html), transform (via pandas), and then if necessary, do the visualization via ggplot?
You can do the visualization via seaborn (for example), directly in python.

If ggplot is better for visualization, I would write a python script that processes the data, outputs some kind of text version, and then runs (via sys calls) RScript visualize-bascule.R --output $prefix.

Another issue is the fact that you seem to be working on a newer better version of bascule, which Google already points to. I hope you get your paper published soon! Wouldn't it make sense to write the bascule module to use the newer version? Are you planning to modify the Nextflow code to the R version in the future?

- conda-forge::pip=24.2
- conda-forge::r-ggplot2=3.5.2
- conda-forge::r-reticulate=1.42.0
- conda-forge::r-tidyverse=2.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You include the tidyverse, which is pretty big. Do you actually need the entire tidyverse, or can you pick only the pckages you need and get a smaller image?

@@ -0,0 +1,125 @@
#!/usr/bin/env Rscript

parse_args = function(x) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very different from other parse_args R functions that I've seen in nf-core, see for example

.
I'm assuming that the standard one didn't work. Are the arguments to bascule very different from other R using modules?

parsed_args[! is.na(parsed_args)]
}

opt = list(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is more than one argument used in the script, so please set reasonable defaults in opt. Also, are the parameters set by args? Is it possible to move some of them into the task definition, allowing users to change it without messing around in configs?
I'm not familiar with bascule - to me the question is how likely are users going to want to change any parameter.
If very likely - I think it should be in the nf file.
If very unlikely, it can stay only in args.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a situation where (some) parameters should be changed together, I think there are best practices for doing so.


counts = lapply(counts_all, function(count_i) count_i[samples_list,])

reference_cat = eval(parse(text=opt[["reference_cat"]]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the options that wasn't given a default in opt.

pl_exposures = plot_exposures(x)
pl_centroids = plot_centroids(x)

if (clustering) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So line 71 to 94 are the python code? Have I understood this correctly?

}

# save objects
saveRDS(x, file=paste0(opt[["prefix"]], "_fit_bascule.rds"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest putting prefix into the main.nf, with a reasonable default. You can incorporate it by using $, see an example in

background_file = '$background_file',

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# save objects
saveRDS(x, file=paste0(opt[["prefix"]], "_fit_bascule.rds"))
saveRDS(figure, paste0(opt[["prefix"]], "_plots_bascule.rds"))
ggplot2::ggsave(plot=figure, filename=paste0(opt[["prefix"]], "_plots_bascule.pdf"), height=210, width=210, units="mm")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much like the style of using PACKAGE::FUNCTION. However, I'm a bit nitpicky, and I think that if you use this style, you should use it consistently.

py = import("pybascule")

x = fit(counts=counts,
k_list=eval(parse(text=opt[["k_list"]])),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using eval is problematic. Can you tuink of an alternative way to do so?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

new module: bascule
2 participants