-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[WIP] - Mixed thread/processes evaluation #457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@shoyer your thoughts on this would be appreciated. |
What about a user facing API (e.g., the context manager)? So far I have successfully avoided writing dask graphs directly in xray :). For us, I suppose all we need is a parameter we could pass to |
I'm concerned about using these functions in processes with unserializable hdf5 dataset objects. Thoughts on how best to integrate |
Or rather, for this to work with processes and xray I expect to have to build a task that has a function that opens up a file using the |
I agree that dask should not have My thought was that I would write an object like the following in xray, to be passed off to from contextlib import contextmanager
class NetCDFArray(object):
def __init__(self, filename, variable_name):
self.filename = filename
self.variable_name = variable_name
with self._open() as var:
self._dtype = var.dtype
self._shape = var.dtype
@contextmanager
def _open(self):
with netCDF4.Dataset(self.filename) as nc:
var = nc.variables[self.variable_name]
yield var
@property
def dtype(self):
return self._dtype
@property
def shape(self):
return self._shape
def __getitem__(self, key):
with self._open() as var:
return var[key]
def __setitem__(self, key, value):
with self._open() as var:
var[key] = value |
OK, well, see 820784f . It's totally untested but may do what we need. |
820784f
to
9d9d643
Compare
@shoyer can I ask you to give this a spin? Possibly with something like the |
Sorry, I've been busy this week and have not been able to get to this. Now I'm headed into the woods for the next 6 days.... On Tue, Aug 4, 2015 at 8:31 AM, Matthew Rocklin [email protected]
|
Have fun! |
Closing this as stale. Happy to reopen on renewed interest. |
Fixes #329
This starts an attempt to run tasks in processes even though we use the multiprocessing scheduler. We wrap tasks in
run_in_process
function which calls out to an external pool.Some issues:
Bag.to_dataframe
anddask.array.from_hdf5