Skip to content

Spike: Investigate client aggregation methods #225

@Spartee

Description

@Spartee

Description

In some cases, a global domain of a simulation or workload is needed for analysis or training. When using the SmartRedis clients in an MPI application, each rank sends it's own data to the Orchestrator (redis/keydb) database. a method, Client.aggregate could be very useful in combining data back into a global field.

Justification

Online analysis is a primary use case here. For example, let's say we are running a weather simulation and would like to view a global snapshot of the domain every 50 timesteps. currently, one must create a helper function to retrieve data from all the ranks that sent data to the database and recombine it into a global field before it can be analyized.

Also, future methods like to_xarray become much more useful if a domain can be aggregated first.

Implementation Strategy

This ticket is meant to provide a place for feedback on this issue. The result of this ticket is a design document that will be shared with the community to collect feedback.

I've been ideating on this however, and I'm thinking something like the following method

from smartredis import Client

client = Client(cluster=True)
dataset = client.aggregate("hello_world_rank_*")

This example would collect all the tensors in the database the start with hello_world_rank_. Essentially this is like glob syntax for specifying which tensors to retrieve.

Open questions:

  1. Does a simple concat of all tensors work for most use cases?
  2. How does the user specify how the domain should be recombined in more complex cases? (unstructured mesh?)
  3. Multiple fields in one dataset/aggregate call?
  4. How can we use the metadata fields of the DataSet object to make this easier for users?

It's important that when we design this method, we keep in mind that the idea is to be able to quickly convert to something like an xarray dataset.

for example:

from smartredis import Client

client = Client(cluster=True)
dataset = client.aggregate("hello_world_rank_*").to_xarray()
dataset.plot() # use xarray methods now.

Acceptance Criteria

  • Create a design document with how this aggregation method will work (discussion should take place here)
  • Post link to the design document here
  • Incorporate feedback and open issue to implement Client.aggregate

Tagging some people who I think would provide good feedback on this issue.
@ashao @rabernat @mellis13 @nbren12

Metadata

Metadata

Assignees

Labels

type: designIssues related to architecture and code design

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions