-
Notifications
You must be signed in to change notification settings - Fork 7.1k
add lazily filled dict for prototype datasets #5219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,6 +31,7 @@ | |
getitem, | ||
path_comparator, | ||
path_accessor, | ||
LazyDict, | ||
) | ||
from torchvision.prototype.features import Label, BoundingBox, Feature | ||
|
||
|
@@ -94,6 +95,9 @@ def _2011_classify_archive(self, data: Tuple[str, Any]) -> Optional[int]: | |
else: | ||
return None | ||
|
||
def _2011_image_key(self, rel_posix_path: str) -> str: | ||
return rel_posix_path.rsplit("/", 1)[1] | ||
|
||
def _2011_filter_split(self, row: List[str], *, split: str) -> bool: | ||
_, split_id = row | ||
return { | ||
|
@@ -173,9 +177,8 @@ def _make_datapipe( | |
) | ||
|
||
image_files_dp = CSVParser(image_files_dp, dialect="cub200") | ||
image_files_map = dict( | ||
(image_id, rel_posix_path.rsplit("/", maxsplit=1)[1]) for image_id, rel_posix_path in image_files_dp | ||
) | ||
image_files_dp = Mapper(image_files_dp, self._2011_image_key, input_col=1) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What we can do here is a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Besides, I think the DataLoader would complain this datapipe graph in the second epoch because So, a fix from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like a good thing to test in general. What should a test look like. Is something like for _ in dataset.cycle(2):
pass enough? If yes, my proposal passes this test. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean if we put the dataset (datapipes) into DataLoader, the second epoch of DataLoader would break. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So something like data_loader = DataLoader2(dataset)
for epoch in range(2):
for sample in data_loader:
pass ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah. I have asked Kevin to fix such issue in demux. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My proposal still works. I've pushed the test I'm running against. There are multiple failures for other datasets, but |
||
image_files_map = LazyDict(image_files_dp) | ||
|
||
split_dp = CSVParser(split_dp, dialect="cub200") | ||
split_dp = Filter(split_dp, functools.partial(self._2011_filter_split, split=config.split)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A second thought. Could we simply filter
image_files_dp
fromarchive_dp
here and create theimage_files_map
dictionary?Then, we can do demux over
archive_dp
again and drop data inimage_files_dp
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically splitting of
image_files_dp
from the graph?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Then, we can materialize the data from it like a meta-datapipe.