-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Expand tests for prototype datasets #5187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit d16b9f6 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pmeier , full disclosure I didn't give this an in-depth look. I'll have more feedback on this once I implement a few of those prototype datasets on my own (WIP :) )
@@ -129,10 +133,31 @@ def load(self, config, *, decoder=DEFAULT_DECODER): | |||
return datapipe, mock_info | |||
|
|||
|
|||
def config_id(name, config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the default pytest id generation be enough here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not. You would get something like
test/test_prototype_builtin_datasets.py::TestCommon::test_smoke[dataset_mock0-config0]
test/test_prototype_builtin_datasets.py::TestCommon::test_smoke[dataset_mock1-config1]
test/test_prototype_builtin_datasets.py::TestCommon::test_smoke[dataset_mock2-config2]
...
The problem is that by default, pytest
uses the name of the parameter as well as a running index as id. So if we want to have somewhat expressive test names, we need to set the id ourselves. The same example as above but with the custom id from this PR looks like
test/test_prototype_builtin_datasets.py::TestCommon::test_smoke[mnist-train]
test/test_prototype_builtin_datasets.py::TestCommon::test_smoke[mnist-test]
test/test_prototype_builtin_datasets.py::TestCommon::test_smoke[fashionmnist-train]
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ejguan The test failure most likely stems from this:
vision/torchvision/prototype/datasets/_builtin/cub200.py
Lines 171 to 178 in f923aeb
images_dp, split_dp, image_files_dp, bounding_boxes_dp = Demultiplexer( | |
archive_dp, 4, self._2011_classify_archive, drop_none=True, buffer_size=INFINITE_BUFFER_SIZE | |
) | |
image_files_dp = CSVParser(image_files_dp, dialect="cub200") | |
image_files_map = dict( | |
(image_id, rel_posix_path.rsplit("/", maxsplit=1)[1]) for image_id, rel_posix_path in image_files_dp | |
) |
I need this mapping twice and so I thought I can simply create it once. IIUC, the problem comes from the fact that this starts iteration of the Demultiplexer
while creating the datapipe and thus it has some file handles in buffer when traverse
tries to pickle it.
What is recommended pattern here?
Yeah. The problem is
Thanks for adding such use case. The problem is iterator object and buffer attached to |
@NicolasHug I added the ability add marks like |
Summary: * refactor prototype datasets tests * skip tests with insufficient third party dependencies * cleanup * add tests for SBD prototype dataset * add tests for SEMEION prototype dataset * add tests for VOC prototype dataset * add tests for CelebA prototype dataset * add tests for DTD prototype dataset * add tests for FER2013 prototype dataset * add tests for CLEVR prototype dataset * add tests for oxford-iiit-pet prototype dataset * enforce tests for new datasets * add missing archive generation for oxford-iiit-pet tests * add tests for CUB200 prototype datasets * fix split generation * add capability to mark parametrization and xfail cub200 traverse tests Reviewed By: datumbox, NicolasHug Differential Revision: D33655253 fbshipit-source-id: 186591f2cb89e864c2d143d6a460449cf4991baa
This PR adds tests and fixes error of all missing prototype datasets.
Blocked by #5186.
cc @pmeier @bjuncek