Skip to content

[WIP] - Mixed thread/processes evaluation #457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

mrocklin
Copy link
Member

Fixes #329

This starts an attempt to run tasks in processes even though we use the multiprocessing scheduler. We wrap tasks in run_in_process function which calls out to an external pool.

Some issues:

  1. This doesn't support nested tasks, we may want some form of quoting for this
  2. I suspect that creating the pool may cause issues in windows
  3. This doesn't actually use this functionality anywhere. Good places to experiment are Bag.to_dataframe and dask.array.from_hdf5

@mrocklin
Copy link
Member Author

@shoyer your thoughts on this would be appreciated.

@shoyer
Copy link
Member

shoyer commented Jul 21, 2015

What about a user facing API (e.g., the context manager)? So far I have successfully avoided writing dask graphs directly in xray :).

For us, I suppose all we need is a parameter we could pass to da.from_array and da.store (e.g., multiprocess=True) that signals that each call to getitem/setitem should be wrapped in run_in_process.

@mrocklin
Copy link
Member Author

I'm concerned about using these functions in processes with unserializable hdf5 dataset objects. Thoughts on how best to integrate run_in_processes to these top-level functions?

@mrocklin
Copy link
Member Author

Or rather, for this to work with processes and xray I expect to have to build a task that has a function that opens up a file using the netCDF4 library. I was hoping to avoid explicitly having from_h5py and from_netcdf4 functions.

@shoyer
Copy link
Member

shoyer commented Jul 21, 2015

I agree that dask should not have from_h5py and from_netcdf4 functions.

My thought was that I would write an object like the following in xray, to be passed off to da.from_array:

from contextlib import contextmanager

class NetCDFArray(object):
    def __init__(self, filename, variable_name):
        self.filename = filename
        self.variable_name = variable_name
        with self._open() as var:
             self._dtype = var.dtype
             self._shape = var.dtype

    @contextmanager
    def _open(self):
        with netCDF4.Dataset(self.filename) as nc:
            var = nc.variables[self.variable_name]
            yield var

    @property
    def dtype(self):
        return self._dtype

    @property
    def shape(self):
        return self._shape

    def __getitem__(self, key):
        with self._open() as var:
            return var[key]

    def __setitem__(self, key, value):
        with self._open() as var:
            var[key] = value

@mrocklin
Copy link
Member Author

OK, well, see 820784f . It's totally untested but may do what we need.

@mrocklin
Copy link
Member Author

mrocklin commented Aug 4, 2015

@shoyer can I ask you to give this a spin? Possibly with something like the NetCDFArray object that you have above.

@shoyer
Copy link
Member

shoyer commented Aug 5, 2015

Sorry, I've been busy this week and have not been able to get to this. Now I'm headed into the woods for the next 6 days....

On Tue, Aug 4, 2015 at 8:31 AM, Matthew Rocklin [email protected]
wrote:

@shoyer can I ask you to give this a spin? Possibly with something like the NetCDFArray object that you have above.

Reply to this email directly or view it on GitHub:
#457 (comment)

@mrocklin
Copy link
Member Author

mrocklin commented Aug 6, 2015

Have fun!

@mrocklin
Copy link
Member Author

mrocklin commented Sep 2, 2015

Closing this as stale. Happy to reopen on renewed interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants