Skip to content

Expand tests for prototype datasets #5187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jan 19, 2022
Merged

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Jan 10, 2022

This PR adds tests and fixes error of all missing prototype datasets.

Blocked by #5186.

cc @pmeier @bjuncek

@facebook-github-bot
Copy link

facebook-github-bot commented Jan 10, 2022

💊 CI failures summary and remediations

As of commit d16b9f6 (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build unittest_linux_cpu_py3.7 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

/root/project/torchvision/io/video.py:406: Runt...log: [mov,mp4,m4a,3gp,3g2,mj2] moov atom not found
test/test_image.py::test_decode_png[L-ImageReadMode.GRAY-palette_pytorch.png]
test/test_image.py::test_decode_png[RGB-ImageReadMode.RGB-palette_pytorch.png]
  /root/project/env/lib/python3.7/site-packages/PIL/Image.py:946: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
    "Palette images with Transparency expressed in bytes should be "

test/test_io.py::TestVideo::test_probe_video_from_memory
  /root/project/torchvision/io/_video_opt.py:423: UserWarning: The given buffer is not writable, and PyTorch does not support non-writable tensors. This means you can write to the underlying (supposedly non-writable) buffer using the tensor. You may want to copy the buffer to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /opt/conda/conda-bld/pytorch_1642552271656/work/torch/csrc/utils/tensor_new.cpp:998.)
    video_data = torch.frombuffer(video_data, dtype=torch.uint8)

test/test_io.py::TestVideo::test_read_video_timestamps_corrupted_file
  /root/project/torchvision/io/video.py:406: RuntimeWarning: Failed to open container for /tmp/tmpov3rjnou.mp4; Caught error: [Errno 1094995529] Invalid data found when processing input: '/tmp/tmpov3rjnou.mp4'; last error log: [mov,mp4,m4a,3gp,3g2,mj2] moov atom not found
    warnings.warn(msg, RuntimeWarning)

test/test_models.py::test_memory_efficient_densenet[densenet121]
test/test_models.py::test_memory_efficient_densenet[densenet169]
test/test_models.py::test_memory_efficient_densenet[densenet201]
test/test_models.py::test_memory_efficient_densenet[densenet161]
  /root/project/env/lib/python3.7/site-packages/torch/autocast_mode.py:162: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
    warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')

test/test_models.py::test_inception_v3_eval

🚧 1 ongoing upstream failure:

These were probably caused by upstream breakages that are not fixed yet.


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pmeier , full disclosure I didn't give this an in-depth look. I'll have more feedback on this once I implement a few of those prototype datasets on my own (WIP :) )

@@ -129,10 +133,31 @@ def load(self, config, *, decoder=DEFAULT_DECODER):
return datapipe, mock_info


def config_id(name, config):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the default pytest id generation be enough here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not. You would get something like

test/test_prototype_builtin_datasets.py::TestCommon::test_smoke[dataset_mock0-config0]
test/test_prototype_builtin_datasets.py::TestCommon::test_smoke[dataset_mock1-config1]
test/test_prototype_builtin_datasets.py::TestCommon::test_smoke[dataset_mock2-config2]
...

The problem is that by default, pytest uses the name of the parameter as well as a running index as id. So if we want to have somewhat expressive test names, we need to set the id ourselves. The same example as above but with the custom id from this PR looks like

test/test_prototype_builtin_datasets.py::TestCommon::test_smoke[mnist-train]
test/test_prototype_builtin_datasets.py::TestCommon::test_smoke[mnist-test]
test/test_prototype_builtin_datasets.py::TestCommon::test_smoke[fashionmnist-train]
...

Copy link
Collaborator Author

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ejguan The test failure most likely stems from this:

images_dp, split_dp, image_files_dp, bounding_boxes_dp = Demultiplexer(
archive_dp, 4, self._2011_classify_archive, drop_none=True, buffer_size=INFINITE_BUFFER_SIZE
)
image_files_dp = CSVParser(image_files_dp, dialect="cub200")
image_files_map = dict(
(image_id, rel_posix_path.rsplit("/", maxsplit=1)[1]) for image_id, rel_posix_path in image_files_dp
)

I need this mapping twice and so I thought I can simply create it once. IIUC, the problem comes from the fact that this starts iteration of the Demultiplexer while creating the datapipe and thus it has some file handles in buffer when traverse tries to pickle it.

What is recommended pattern here?

@ejguan
Copy link
Contributor

ejguan commented Jan 18, 2022

@ejguan The test failure most likely stems from this:

images_dp, split_dp, image_files_dp, bounding_boxes_dp = Demultiplexer(
archive_dp, 4, self._2011_classify_archive, drop_none=True, buffer_size=INFINITE_BUFFER_SIZE
)
image_files_dp = CSVParser(image_files_dp, dialect="cub200")
image_files_map = dict(
(image_id, rel_posix_path.rsplit("/", maxsplit=1)[1]) for image_id, rel_posix_path in image_files_dp
)

I need this mapping twice and so I thought I can simply create it once. IIUC, the problem comes from the fact that this starts iteration of the Demultiplexer while creating the datapipe and thus it has some file handles in buffer when traverse tries to pickle it.

What is recommended pattern here?

Yeah. The problem is Demux has not been depleted (full iteration) to clean up the iterator. There are two ways

@ejguan The test failure most likely stems from this:

images_dp, split_dp, image_files_dp, bounding_boxes_dp = Demultiplexer(
archive_dp, 4, self._2011_classify_archive, drop_none=True, buffer_size=INFINITE_BUFFER_SIZE
)
image_files_dp = CSVParser(image_files_dp, dialect="cub200")
image_files_map = dict(
(image_id, rel_posix_path.rsplit("/", maxsplit=1)[1]) for image_id, rel_posix_path in image_files_dp
)

I need this mapping twice and so I thought I can simply create it once. IIUC, the problem comes from the fact that this starts iteration of the Demultiplexer while creating the datapipe and thus it has some file handles in buffer when traverse tries to pickle it.

What is recommended pattern here?

Thanks for adding such use case. The problem is iterator object and buffer attached to Demultiplexer object is not cleaned up if demux has not been depleted (full iteration of all children). This can be solved by adding special serialize/deserialize functions for demux to prevent serialization of theses objects.
cc: @NivekT

@pmeier
Copy link
Collaborator Author

pmeier commented Jan 19, 2022

@NicolasHug I added the ability add marks like pytest.mark.xfail to datasets through the parametrization decorator. With this in place, we can xfail "cub200" on the failing tests until #5187 (comment) is resolved on the torchdata side.

@pmeier pmeier merged commit 3e4d062 into pytorch:main Jan 19, 2022
@pmeier pmeier deleted the datasets/expand-tests branch January 19, 2022 08:55
facebook-github-bot pushed a commit that referenced this pull request Jan 19, 2022
Summary:
* refactor prototype datasets tests

* skip tests with insufficient third party dependencies

* cleanup

* add tests for SBD prototype dataset

* add tests for SEMEION prototype dataset

* add tests for VOC prototype dataset

* add tests for CelebA prototype dataset

* add tests for DTD prototype dataset

* add tests for FER2013 prototype dataset

* add tests for CLEVR prototype dataset

* add tests for oxford-iiit-pet prototype dataset

* enforce tests for new datasets

* add missing archive generation for oxford-iiit-pet tests

* add tests for CUB200 prototype datasets

* fix split generation

* add capability to mark parametrization and xfail cub200 traverse tests

Reviewed By: datumbox, NicolasHug

Differential Revision: D33655253

fbshipit-source-id: 186591f2cb89e864c2d143d6a460449cf4991baa
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants