Support WDL 1.2 "Extended" File/Directory format#834
Support WDL 1.2 "Extended" File/Directory format#834adamnovak wants to merge 2 commits intochanzuckerberg:mainfrom
Conversation
|
@mlin Does this seem like the right general approach to you? |
|
@adamnovak I have to say I'm initially a little bearish on changing the internal representation of Wondering, is your main interest in the cache coherence aspect, or in the place to stick arbitrary metadata? Could we get pretty far by preprocessing the extended input JSON and keeping some crappy global dict of path to metadata? |
|
I think my main interest is in allowing a In Toil, we have to deal with building these trees on the fly all the time because Toil's backing storage abstraction stores only files, not directories. So we encode whole trees of what files go where into strings and use those as WDL Directory string values. If MiniWDL had a Directory abstraction that knew it was responsible for information about what files go where, then Toil could work with that and throw away a lot of hacks. Maybe the right approach is to leave |
|
@adamnovak If our goal were only to support the "extended" file/directory input format then I feel we could do it easily by preprocessing the input JSON, materializing the desired posix structure (using symlinks, hardlinks, or last-resort copies where needed), and starting the workflow on that. But I think I'm hearing from you that Toil needs finer control of each task's filesystem, is that right? miniwdl's posix filesystem assumptions are pretty intentional in keeping it "mini" so I think that's where we're diverging a bit. I do think it should have a pluggable architecture to accommodate more exotic needs. The CallCache is pluggable (but the semantics need more documentation). Not so much the filesystem interactions -- perhaps that's the direction to head? |
Motivation
This should fix #833 by implementing the "extended" File and Directory format from WDL 1.2.
Approach
This changes
FileandDirectoryto have the extended-format dicts as theirvalues, with the actual path being fetched out of or replaced in["location"]when needed.This is still a draft; it still needs:
Checklist
make prettyto reformat the code withruff formatmake checkto statically check the code usingruff checkandmypy