Skip to content

Conversation

@dylan-copeland
Copy link
Collaborator

This PR enables the reading of serial basis files in parallel, by using the MPI_COMM_NULL communicator in BasisReader.

@dylan-copeland dylan-copeland added the RFR Ready for review label Aug 31, 2025
@chldkdtn
Copy link
Collaborator

chldkdtn commented Sep 2, 2025

@dreamer2368 and @ckendrick Let's review and approve this PR.

Copy link
Collaborator

@dreamer2368 dreamer2368 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you describe the use case of this PR a little more?

  • How does a serial basis file look like? Is it a monolithic basis matrix, or some rows of the basis matrix corresponding to a MPI process at the time of POD training?
  • What kind of parallel reading is performed here? All MPI processes attempt to read one monolithic basis matrix?

d_format(db_format)
{
CAROM_ASSERT(!base_file_name.empty());
d_distributed = comm != MPI_COMM_NULL;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This concerns me a little bit, since libROM currently does not support custom MPI communicator other than MPI_COMM_WORLD.

Should we add a line of CAROM_VERIFY for checking if the communicator is MPI_COMM_WORLD?

@dylan-copeland
Copy link
Collaborator Author

dylan-copeland commented Sep 3, 2025

@dreamer2368 An example using this PR is in file src/rom_handler.cpp of llnl/scaleupROM#60
CAROM::BasisReader(basis_name + basis_tags[k].print() + ".000000", CAROM::Database::formats::HDF5, local_dim, MPI_COMM_NULL);

The workflow being supported is

  1. A serial process generates basis files. These are small enough that they should not be distributed in parallel, which is the case for small components in scaleupROM. The entire basis matrix is in a single file.
  2. In parallel, any MPI rank needing the file will load it, using MPI_COMM_NULL to indicate that there is no MPI communication.

In this PR, the BasisReader constructor now takes a new argument for the MPI communicator. Only two possibilities are supported so far, which are MPI_COMM_NULL (setting d_distributed to false) and MPI_COMM_WORLD (setting d_distributed to true). It would be simpler just to have a bool distributed argument, but I added the MPI_Comm with the idea that it could be used in the future, without changing the interface to remove the bool argument later. The downside of this is that users may try to use a split communicator that is not supported, which would cause problems without an informative error message.

As you suggested, we could add a verification that the communicator is either MPI_COMM_NULL or MPI_COMM_WORLD. An alternative is to change the argument to a bool, which may require a breaking change later for generalization.

Copy link
Collaborator

@ckendrick ckendrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the communicator over a bool for this is a good idea and will hopefully save work later on when we support different MPI communicators.
Could you also add a small test for this feature? It can be something like a simple call to BasisReader with the new communicator option and just check if the Matrix returned has the correct sizes and distributed flag set for each rank.

Copy link
Collaborator

@ckendrick ckendrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dylan-copeland thanks for adding the test, this looks good to me now!

@dreamer2368
Copy link
Collaborator

@dreamer2368 An example using this PR is in file src/rom_handler.cpp of LLNL/scaleupROM#60 CAROM::BasisReader(basis_name + basis_tags[k].print() + ".000000", CAROM::Database::formats::HDF5, local_dim, MPI_COMM_NULL);

The workflow being supported is

  1. A serial process generates basis files. These are small enough that they should not be distributed in parallel, which is the case for small components in scaleupROM. The entire basis matrix is in a single file.
  2. In parallel, any MPI rank needing the file will load it, using MPI_COMM_NULL to indicate that there is no MPI communication.

In this PR, the BasisReader constructor now takes a new argument for the MPI communicator. Only two possibilities are supported so far, which are MPI_COMM_NULL (setting d_distributed to false) and MPI_COMM_WORLD (setting d_distributed to true). It would be simpler just to have a bool distributed argument, but I added the MPI_Comm with the idea that it could be used in the future, without changing the interface to remove the bool argument later. The downside of this is that users may try to use a split communicator that is not supported, which would cause problems without an informative error message.

As you suggested, we could add a verification that the communicator is either MPI_COMM_NULL or MPI_COMM_WORLD. An alternative is to change the argument to a bool, which may require a breaking change later for generalization.

I'm okay with this PR as long as a verification line is added.

@dylan-copeland dylan-copeland merged commit 80aa762 into master Sep 16, 2025
4 checks passed
@dylan-copeland dylan-copeland deleted the serial-reader branch September 16, 2025 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

RFR Ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants