-
Notifications
You must be signed in to change notification settings - Fork 889
New module: bascule #8751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
New module: bascule #8751
Conversation
not working version, needs to update the script and to fix the containers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, the code seems well written. You will have to make sure it passes linting and nf-test, but you probably know that. All the tests are failing due to lack of space on the device, so I think you need to update the resources required in the process section https://www.nextflow.io/docs/latest/reference/process.html#disk
I have one big problem, which I don't fully understand. It looks like R is used to 1) untar the matrices, 2) apply some transformations, and pass it on to pybascule. Later, R is used for 3) visualization.
Running python via reticualte is relatively slow and has overhead. Why not write a simple python script that will untar (via https://docs.python.org/3/library/tarfile.html), transform (via pandas), and then if necessary, do the visualization via ggplot?
You can do the visualization via seaborn (for example), directly in python.
If ggplot is better for visualization, I would write a python script that processes the data, outputs some kind of text version, and then runs (via sys calls) RScript visualize-bascule.R --output $prefix.
Another issue is the fact that you seem to be working on a newer better version of bascule, which Google already points to. I hope you get your paper published soon! Wouldn't it make sense to write the bascule module to use the newer version? Are you planning to modify the Nextflow code to the R version in the future?
- conda-forge::pip=24.2 | ||
- conda-forge::r-ggplot2=3.5.2 | ||
- conda-forge::r-reticulate=1.42.0 | ||
- conda-forge::r-tidyverse=2.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You include the tidyverse, which is pretty big. Do you actually need the entire tidyverse, or can you pick only the pckages you need and get a smaller image?
@@ -0,0 +1,125 @@ | |||
#!/usr/bin/env Rscript | |||
|
|||
parse_args = function(x) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very different from other parse_args R functions that I've seen in nf-core, see for example
parse_args <- function(x) { |
I'm assuming that the standard one didn't work. Are the arguments to bascule very different from other R using modules?
parsed_args[! is.na(parsed_args)] | ||
} | ||
|
||
opt = list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is more than one argument used in the script, so please set reasonable defaults in opt. Also, are the parameters set by args? Is it possible to move some of them into the task definition, allowing users to change it without messing around in configs?
I'm not familiar with bascule - to me the question is how likely are users going to want to change any parameter.
If very likely - I think it should be in the nf file.
If very unlikely, it can stay only in args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a situation where (some) parameters should be changed together, I think there are best practices for doing so.
|
||
counts = lapply(counts_all, function(count_i) count_i[samples_list,]) | ||
|
||
reference_cat = eval(parse(text=opt[["reference_cat"]])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the options that wasn't given a default in opt.
pl_exposures = plot_exposures(x) | ||
pl_centroids = plot_centroids(x) | ||
|
||
if (clustering) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So line 71 to 94 are the python code? Have I understood this correctly?
} | ||
|
||
# save objects | ||
saveRDS(x, file=paste0(opt[["prefix"]], "_fit_bascule.rds")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest putting prefix into the main.nf, with a reasonable default. You can incorporate it by using $, see an example in
background_file = '$background_file', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant this link https://nf-co.re/docs/guidelines/components/modules#optional-command-arguments
# save objects | ||
saveRDS(x, file=paste0(opt[["prefix"]], "_fit_bascule.rds")) | ||
saveRDS(figure, paste0(opt[["prefix"]], "_plots_bascule.rds")) | ||
ggplot2::ggsave(plot=figure, filename=paste0(opt[["prefix"]], "_plots_bascule.pdf"), height=210, width=210, units="mm") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I very much like the style of using PACKAGE::FUNCTION. However, I'm a bit nitpicky, and I think that if you use this style, you should use it consistently.
py = import("pybascule") | ||
|
||
x = fit(counts=counts, | ||
k_list=eval(parse(text=opt[["k_list"]])), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using eval is problematic. Can you tuink of an alternative way to do so?
PR checklist
Closes #8750
versions.yml
file.label
nf-core modules test <MODULE> --profile docker
nf-core modules test <MODULE> --profile singularity
nf-core modules test <MODULE> --profile conda
nf-core subworkflows test <SUBWORKFLOW> --profile docker
nf-core subworkflows test <SUBWORKFLOW> --profile singularity
nf-core subworkflows test <SUBWORKFLOW> --profile conda