Skip to content

Standardization of names in meta.yml for all tools & Big refactoring for all modules to subfolders#551

Merged
ypriverol merged 82 commits intobigbio:devfrom
ypriverol:dev
May 27, 2025
Merged

Standardization of names in meta.yml for all tools & Big refactoring for all modules to subfolders#551
ypriverol merged 82 commits intobigbio:devfrom
ypriverol:dev

Conversation

@ypriverol
Copy link
Member

@ypriverol ypriverol commented May 21, 2025

A little bit of discussion about this PR from @ypriverol. Currently, quantms has evolved with no standarization of the naming system for modules, tools, subworkflows, workflows etc. These are the main issues I have found:

  • Different folder naming convention: All processes related to openms are in a folder called openms modules/local/openms/{}, but the ones from diann are in the parent local modules/local/{} 🤔.
  • Some processes have names like DIANNCONVERT and others like PREPROCESS_EXPDESIGN. This is difficult to follow because we don't know why is that and how the decision was made.
  • In a similar way the names in the process are different in the main.nf compared with the meta.yml. For example, you can be called PREPROCESS_EXPDESIGN in the main.nf and in the meta.yml preprocessexpdesign
  • Some of the modules in the subworkflows and workflows are called with aliases and others no. For example, you can be called as
include { EPIFANY                                } from '../../modules/local/openms/epifany/main'
include { PROTEININFERENCE as PROTEININFERENCER  } from '../../modules/local/openms/proteininference/main'

This PR is the first iteration standardising the names, modules, folders and import style. These are some of the decisions I have made:

  • In main.nf processes are UPERCASE with _ to split between words, this is following nf-core and Python style.
  • In meta.yml, processes have the same name as in the main.nf but with lower case.
  • Modules are split roughly by framework: openms, diann, msstats, and utils
  • Avoid including aliases in subworkflows as much as possible.
  • In a module can be associated with a tool / unique step like Percolator, Sage or Comet, try to use the single name like COMET in favor of using multiple decorative things like SEARCH_ENGINE_COMET. In some cases is not possible, like INFERENCE and EPYFANY, then I have to put a decoration because epifany is an inference tool. Then I use protein_inference_epifany and protein_inference_generic
  • All subworkflows now have their folder, example psm_rescoring, inside it has a main.nf.

Please feel free to give me feedback here and also in the way we are naming the folders #551 (comment). This will be really impactful for users, then all comments are welcome.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented May 21, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@ypriverol ypriverol requested review from Copilot and fabianegli May 21, 2025 13:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors and standardizes module names and process identifiers across MSstats and DIA-NN modules to the {TOOL}_{PROCESS} naming convention.

  • Renamed MSstats TMT and LFQ modules in metadata and process definitions.
  • Updated various DIA-NN process blocks to include DIANN_ prefix and underscore.
  • Adjusted module metadata for DIANN convert to match naming style.

Reviewed Changes

Copilot reviewed 39 out of 39 changed files in this pull request and generated no comments.

Show a summary per file
File Description
modules/local/msstats/msstats_tmt/meta.yml Updated module name to MSSTATS_TMT
modules/local/msstats/msstats_tmt/main.nf Renamed process to MSSTATS_TMT
modules/local/msstats/msstats_lfq/meta.yml Updated module name to MSSTATS_LFQ
modules/local/msstats/msstats_lfq/main.nf Renamed process to MSSTATS_LFQ
modules/local/diann/summary/main.nf Renamed process to DIANN_SUMMARY
modules/local/diann/insilico_library_generation/main.nf Renamed process to DIANN_INSILICO_LIBRARY_GENERATION
modules/local/diann/generate_cfg/main.nf Renamed process to DIANN_GENERATE_CFG
modules/local/diann/convert/meta.yml Changed module name to diannconvert
modules/local/diann/convert/main.nf Renamed process to DIANN_CONVERT
modules/local/diann/assemble_empirical_library/main.nf Renamed process to DIANN_ASSEMBLE_EMPIRICAL_LIBRARY
Comments suppressed due to low confidence (1)

modules/local/diann/convert/meta.yml:1

  • Module name should follow the {TOOL}_{PROCESS} convention and match its process identifier. Consider renaming this to DIANN_CONVERT for consistency.
name: diannconvert

@ypriverol ypriverol requested a review from Copilot May 21, 2025 13:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors and standardizes module names and process identifiers across the project.

  • Updated msstats module names and process identifiers to include descriptive suffixes (TMT, LFQ).
  • Revised DIANN module process names to use a consistent DIANN_ prefix and underscore-separated naming.
  • Adjusted meta.yml files to align with the standardized process names.

Reviewed Changes

Copilot reviewed 42 out of 42 changed files in this pull request and generated no comments.

Show a summary per file
File Description
modules/local/msstats/msstats_tmt/meta.yml Updated module name from MSSTATSTMT to MSSTATS_TMT
modules/local/msstats/msstats_tmt/main.nf Updated process name from MSSTATSTMT to MSSTATS_TMT
modules/local/msstats/msstats_lfq/meta.yml Updated module name from MSSTATS to MSSTATS_LFQ
modules/local/msstats/msstats_lfq/main.nf Updated process name from MSSTATS to MSSTATS_LFQ
modules/local/diann/summary/main.nf Updated process name from DIANNSUMMARY to DIANN_SUMMARY
modules/local/diann/insilico_library_generation/main.nf Updated process name from SILICOLIBRARYGENERATION to DIANN_INSILICO_LIBRARY_GENERATION
modules/local/diann/generate_cfg/main.nf Updated process name from GENERATE_DIANN_CFG to DIANN_GENERATE_CFG
modules/local/diann/convert/meta.yml Updated module name from DIANNCONVERT to diannconvert (note casing inconsistency)
modules/local/diann/convert/main.nf Updated process name from DIANNCONVERT to DIANN_CONVERT
modules/local/diann/assemble_empirical_library/main.nf Updated process name from ASSEMBLE_EMPIRICAL_LIBRARY to DIANN_ASSEMBLE_EMPIRICAL_LIBRARY
Comments suppressed due to low confidence (1)

modules/local/diann/convert/meta.yml:1

  • The module name 'diannconvert' is in lowercase while other similar module names use uppercase with underscores (e.g., DIANN_CONVERT). Consider updating it for consistency.
name: diannconvert

@ypriverol
Copy link
Member Author

ypriverol commented May 26, 2025

Thanks to all for the feedback, here is the current default output for each workflow:

results_dia
 |__pipeline_info
 |__sdrf
 |__thermorawfileparser
 |__quant_tables
 |__msstats
 |__pmultiqc
 |__ |__multiqc_plots
 |__ |__ |__png
 |__ |__ |__svg
 |__ |__ |__pdf
 |__ |__multiqc_data
results_iso
 |__pipeline_info
 |__sdrf
 |__quant_tables
 |__msstats
 |__pmultiqc
 |__ |__multiqc_data
 |__ |__multiqc_plots
 |__ |__ |__pdf
 |__ |__ |__png
 |__ |__ |__svg
results_lfq
 |__pipeline_info
 |__sdrf
 |__spectra
 |__ |__mzml_statistics
 |__quant_tables
 |__msstats
 |__pmultiqc
 |__ |__multiqc_data
 |__ |__multiqc_plots
 |__ |__ |__pdf
 |__ |__ |__svg
 |__ |__ |__png
results_lfq_dda_id
 |__pipeline_info
 |__sdrf
 |__spectra
 |__ |__mzml_statistics
 |__psm_tables
 |__pmultiqc
 |__ |__multiqc_data
results_localize
 |__pipeline_info
 |__sdrf
 |__quant_tables
 |__pmultiqc
 |__ |__multiqc_plots
 |__ |__ |__svg
 |__ |__ |__pdf
 |__ |__ |__png
 |__ |__multiqc_data

As mentioned and suggested by @jpfeuffer the structure is the following:

  • sdrf: all related SDRF files, SDRF, openms configs, etc
  • spectra: all the spectra-related folders, including mzml_statistics, thermorawfileparser, if converted is needed.
  • psm_tables: for the outputs of the ID pipeline in parquet files
  • quant_tables: for all the outputs of quant, including mztab, msstats_in, diann outputs, etc
  • pmultiqc: for pmultiqc reports
  • pipeline_info: for all the pipeline info, information DAG, reports execution, etc.
  • msstats: for msstats reports

As suggested by @jpfeuffer, a different config was created in case the user wants to output all the intermediate steps verbose_modules.

Still, I added some structure to each folder like: spectra, peptide_identification, and peptide_postprocessing to organise the steps more. Here is an example of an LFQ analysis.

.
 |__pipeline_info
 |__sdrf
 |__preprocess_expdesign
 |__spectra
 |__ |__mzml_indexing
 |__ |__ |__out
 |__ |__mzml_statistics
 |__peptide_identification
 |__ |__comet
 |__ |__sage
 |__peptide_postprocessing
 |__ |__psm_features
 |__ |__psm_clean
 |__ |__percolator
 |__ |__consensusid
 |__ |__fdr_consensusid
 |__ |__id_filter
 |__quant_tables
 |__msstats
 |__pmultiqc
 |__ |__multiqc_plots
 |__ |__ |__svg
 |__ |__ |__png
 |__ |__ |__pdf
 |__ |__multiqc_data

Please provide feedback before merging this PR. Additionally, I organised the modules config similar to other nf-core pipelines.

@ypriverol ypriverol requested a review from Copilot May 26, 2025 14:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR standardizes naming conventions and refactors module organization for consistency across the project. Key changes include:

  • Renaming process names (e.g. from DIANNCONVERT to CONVERT_RESULTS in main.nf and updating meta.yml names accordingly).
  • Restructuring modules into subfolders based on their framework (openms, diann, msstats, utils) and aligning naming across meta and main files.
  • Updating configuration files (modules.config, verbose_modules.config, .nf-core.yml) to reflect the new naming standards and publishing paths.

Reviewed Changes

Copilot reviewed 123 out of 123 changed files in this pull request and generated no comments.

Show a summary per file
File Description
modules/local/diann/convert_results/meta.yml Updated the module name from DIANNCONVERT to convert_results to align with naming standards.
modules/local/diann/convert_results/main.nf Renamed the process from DIANNCONVERT to CONVERT_RESULTS for consistency.
conf/modules/verbose_modules.config Adjusted publishDir settings to support standardized names; changes appear consistent.
conf/modules/modules.config Updated withName regexes and process names to match the new naming scheme for improved clarity.
conf/igenomes_ignored.config Removed as part of the standardization process.
CHANGELOG.md Version updated from 1.4.1 to 1.5.0dev with corresponding changelog updates.
.nf-core.yml Updated lint configuration and version, though duplicate config entries were introduced.
Comments suppressed due to low confidence (1)

.nf-core.yml:20

  • Duplicate entries for configuration files 'conf/modules.config' and 'conf/igenomes_ignored.config' are present in the lint configuration. Please remove the duplicates to avoid redundancy and potential configuration conflicts.
-    - conf/modules.config

@ypriverol ypriverol requested a review from Copilot May 26, 2025 14:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR standardizes the naming conventions for modules and processes across the codebase and refactors modules into dedicated subfolders. It ensures that the names in meta.yml are in lower-case and the process names in main.nf follow the UPERCASE style, while also updating configuration files for publishing directories and removing legacy configuration files.

  • Standardized module and process names (e.g., DIANNCONVERT → convert_results / CONVERT_RESULTS)
  • Updated configuration files (conf/modules/verbose_modules.config and conf/modules/modules.config) to support new naming conventions and publishing paths
  • Removed legacy iGenomes configuration and updated version numbers in CHANGELOG.md and .nf-core.yml

Reviewed Changes

Copilot reviewed 123 out of 123 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
modules/local/diann/convert_results/meta.yml Renamed module identifier to lower-case to match naming standards
modules/local/diann/convert_results/main.nf Updated process name to reflect standardized naming conventions
conf/modules/verbose_modules.config Updated publish directory paths for various processes
conf/modules/modules.config Revised process name patterns and publish directory settings for consistency
conf/igenomes_ignored.config Removed legacy iGenomes configuration
CHANGELOG.md Bumped version to account for standardized naming changes
.nf-core.yml Updated versions and configuration file references to align with recent changes
Comments suppressed due to low confidence (1)

modules/local/diann/convert_results/meta.yml:1

  • The module name has been updated to 'convert_results' in meta.yml. Please ensure that all references to this module in related documentation and other configuration files are updated accordingly.
-name: DIANNCONVERT

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@jpfeuffer
Copy link
Collaborator

Looks good!! What is an example for DIA?

@ypriverol
Copy link
Member Author

ypriverol commented May 26, 2025

Looks good!! What is an example for DIA?

@jpfeuffer, The extended DIA:

.
 |__pipeline_info
 |__sdrf
 |__spectra
 |__ |__thermorawfileparser
 |__ |__mzml_statistics
 |__database_generation
 |__ |__insilico_library_generation
 |__ |__assemble_empirical_library
 |__diann_preprocessing
 |__ |__preliminary_analysis
 |__ |__individual_analysis
 |__quant_tables
 |__msstats
 |__pmultiqc
 |__ |__multiqc_plots
 |__ |__ |__png
 |__ |__ |__pdf
 |__ |__ |__svg
 |__ |__multiqc_data

@daichengxin
Copy link
Collaborator

LGTM

@timosachsenberg
Copy link

results_iso doesn’t have spectra folder? Did not see it in the tree above

@ypriverol
Copy link
Member Author

ypriverol commented May 27, 2025

results_iso doesn’t have spectra folder? Did not see it in the tree above

@timosachsenberg it will be only available if mzml_features is enabled, which I have in the LFQ tests. I don't know if it will be productive to export all the mzml conversion in case it is applied but I guess the majority of people don't needed.

Copy link
Collaborator

@enryH enryH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the easier to read module names and the new module structure (plus test and configs).

Renaming and moving files make it a bit hard to evaluate that the moved nexflow processes in modules are all correct (deleted + added file with changes -> often differences are not highlighted). So separating renaming and moving would help if we do this ever again.

See my few comments.

ypriverol and others added 4 commits May 27, 2025 13:05
Co-authored-by: Henry Webel <heweb@dtu.dk>
Co-authored-by: Henry Webel <heweb@dtu.dk>
Co-authored-by: Henry Webel <heweb@dtu.dk>
@ypriverol ypriverol merged commit 1438d5d into bigbio:dev May 27, 2025
34 checks passed
Copilot AI mentioned this pull request May 31, 2025
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Refactoring some of the global steps.

7 participants