Skip to content

Improve Dask serializers, registration #1948

@thewtex

Description

@thewtex

As suggested by @jakirkham in #1942, we can improve the registration of Dask serialization and deserialization functions, including making them available in distribute. This could be done with NDArrayITKBase, but we can also have optimized serializers for common itk.DataObject's.

As @jakirkham noted:

If one communicates objects that Dask already knows how to work with (like NumPy arrays), then one will get optimized serialization for free. This may involve a little (or maybe no) code in ITK to make sure NumPy arrays are handled reasonable well and handed to Dask when functions return. The benefit here seems useful even outside the Dask context.

If one adds pickle protocol 5 support to objects (possibly extending to older Python's with pickle5 or leveraging NumPy to do this indirectly ( numpy/numpy#12091 )), then one can leverage Dask's own support for pickle protocol 5 ( dask/distributed#3784 ) ( dask/distributed#3849 ), which should yield a similar performance boost to Dask's own custom serialization (as the underlying principles are the same). This would require merely updating how pickling works in ITK. It also will work with anything else that supports pickle protocol 5.

Alternatively one can use Dask's custom serialization by following this example on how to extend to custom types. The usual strategy here is to put these registration bits in its own module. This will soften the Distributed dependency and provide a single point for registering these functions. The latter is useful as one needs to import this module in distributed for it to work. This is doable with a bit of work in ITK and Distributed, but not too difficult.

Any of these options seems fine. Possibly it's worth doing multiple or even all of them (this is what we did in RAPIDS rapidsai/cudf#5139 ). Just trying to show what's possible and what it would entail to do 🙂

Originally posted by @jakirkham in #1942 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    status:Use_Milestone_BacklogUse "Backlog" milestone instead of label for issues without a fixed deadlinetype:EnhancementImprovement of existing methods or implementation

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions