Improve Dask serializers, registration

As suggested by @jakirkham in #1942, we can improve the registration of Dask serialization and deserialization functions, including making them available in `distribute`. This could be done with `NDArrayITKBase`, but we can also have optimized serializers for common `itk.DataObject`'s.

As @jakirkham noted:

If one communicates objects that Dask already knows how to work with (like [NumPy arrays]( https://github.com/dask/distributed/blob/2.22.0/distributed/protocol/numpy.py )), then one will get optimized serialization for free. This may involve a little (or maybe no) code in ITK to make sure NumPy arrays are handled reasonable well and handed to Dask when functions return. The benefit here seems useful even outside the Dask context.

If one adds [pickle protocol 5]( https://docs.python.org/3/library/pickle.html#pickle-oob ) support to objects (possibly extending to older Python's with [`pickle5`]( https://github.com/pitrou/pickle5-backport ) or leveraging NumPy to do this indirectly ( https://github.com/numpy/numpy/pull/12091 )), then one can leverage Dask's own support for pickle protocol 5 ( https://github.com/dask/distributed/pull/3784 ) ( https://github.com/dask/distributed/pull/3849 ), which should yield a similar performance boost to Dask's own custom serialization (as the underlying principles are the same). This would require merely updating how pickling works in ITK. It also will work with anything else that supports pickle protocol 5.

Alternatively one can use [Dask's custom serialization]( https://distributed.dask.org/en/latest/serialization.html ) by following [this example on how to extend to custom types]( https://distributed.dask.org/en/latest/serialization.html#id3 ). The usual strategy here is to put these registration bits in its own module. This will soften the Distributed dependency and provide a single point for registering these functions. The latter is useful as one needs to [`import` this module in `distributed`]( https://github.com/dask/distributed/blob/2.22.0/distributed/protocol/__init__.py#L104-L113 ) for it to work. This is doable with a bit of work in ITK and Distributed, but not too difficult.

Any of these options seems fine. Possibly it's worth doing multiple or even all of them (this is what we did in RAPIDS https://github.com/rapidsai/cudf/pull/5139 ). Just trying to show what's possible and what it would entail to do 🙂

_Originally posted by @jakirkham in https://github.com/InsightSoftwareConsortium/ITK/pull/1942#issuecomment-668789613_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve Dask serializers, registration #1948

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Improve Dask serializers, registration #1948

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions