Expand tests for prototype datasets #5187

pmeier · 2022-01-10T21:08:26Z

This PR adds tests and fixes error of all missing prototype datasets.

Blocked by #5186.

cc @pmeier @bjuncek

facebook-github-bot · 2022-01-10T21:08:33Z

💊 CI failures summary and remediations

As of commit d16b9f6 (more details on the Dr. CI page):

1/2 failures introduced in this PR
1/2 broken upstream at merge base bf073e7 since Jan 14

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

unittest_linux_cpu_py3.7 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

/root/project/torchvision/io/video.py:406: Runt...log: [mov,mp4,m4a,3gp,3g2,mj2] moov atom not found

test/test_image.py::test_decode_png[L-ImageReadMode.GRAY-palette_pytorch.png]
test/test_image.py::test_decode_png[RGB-ImageReadMode.RGB-palette_pytorch.png]
  /root/project/env/lib/python3.7/site-packages/PIL/Image.py:946: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
    "Palette images with Transparency expressed in bytes should be "

test/test_io.py::TestVideo::test_probe_video_from_memory
  /root/project/torchvision/io/_video_opt.py:423: UserWarning: The given buffer is not writable, and PyTorch does not support non-writable tensors. This means you can write to the underlying (supposedly non-writable) buffer using the tensor. You may want to copy the buffer to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /opt/conda/conda-bld/pytorch_1642552271656/work/torch/csrc/utils/tensor_new.cpp:998.)
    video_data = torch.frombuffer(video_data, dtype=torch.uint8)

test/test_io.py::TestVideo::test_read_video_timestamps_corrupted_file
  /root/project/torchvision/io/video.py:406: RuntimeWarning: Failed to open container for /tmp/tmpov3rjnou.mp4; Caught error: [Errno 1094995529] Invalid data found when processing input: '/tmp/tmpov3rjnou.mp4'; last error log: [mov,mp4,m4a,3gp,3g2,mj2] moov atom not found
    warnings.warn(msg, RuntimeWarning)

test/test_models.py::test_memory_efficient_densenet[densenet121]
test/test_models.py::test_memory_efficient_densenet[densenet169]
test/test_models.py::test_memory_efficient_densenet[densenet201]
test/test_models.py::test_memory_efficient_densenet[densenet161]
  /root/project/env/lib/python3.7/site-packages/torch/autocast_mode.py:162: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
    warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')

test/test_models.py::test_inception_v3_eval

🚧 1 ongoing upstream failure:

These were probably caused by upstream breakages that are not fixed yet.

unittest_prototype since Jan 14 (adf8466)
- 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

NicolasHug

Thanks @pmeier , full disclosure I didn't give this an in-depth look. I'll have more feedback on this once I implement a few of those prototype datasets on my own (WIP :) )

test/test_prototype_builtin_datasets.py

NicolasHug · 2022-01-14T13:49:02Z

test/builtin_dataset_mocks.py

@@ -129,10 +133,31 @@ def load(self, config, *, decoder=DEFAULT_DECODER):
        return datapipe, mock_info


+def config_id(name, config):


Would the default pytest id generation be enough here?

Unfortunately not. You would get something like

test/test_prototype_builtin_datasets.py::TestCommon::test_smoke[dataset_mock0-config0] test/test_prototype_builtin_datasets.py::TestCommon::test_smoke[dataset_mock1-config1] test/test_prototype_builtin_datasets.py::TestCommon::test_smoke[dataset_mock2-config2] ...

The problem is that by default, pytest uses the name of the parameter as well as a running index as id. So if we want to have somewhat expressive test names, we need to set the id ourselves. The same example as above but with the custom id from this PR looks like

test/test_prototype_builtin_datasets.py::TestCommon::test_smoke[mnist-train] test/test_prototype_builtin_datasets.py::TestCommon::test_smoke[mnist-test] test/test_prototype_builtin_datasets.py::TestCommon::test_smoke[fashionmnist-train] ...

pmeier

@ejguan The test failure most likely stems from this:

vision/torchvision/prototype/datasets/_builtin/cub200.py

Lines 171 to 178 in f923aeb

    
           images_dp, split_dp, image_files_dp, bounding_boxes_dp = Demultiplexer( 
        
               archive_dp, 4, self._2011_classify_archive, drop_none=True, buffer_size=INFINITE_BUFFER_SIZE 
        
           ) 
        
           image_files_dp = CSVParser(image_files_dp, dialect="cub200") 
        
           image_files_map = dict( 
        
               (image_id, rel_posix_path.rsplit("/", maxsplit=1)[1]) for image_id, rel_posix_path in image_files_dp 
        
           )

I need this mapping twice and so I thought I can simply create it once. IIUC, the problem comes from the fact that this starts iteration of the Demultiplexer while creating the datapipe and thus it has some file handles in buffer when traverse tries to pickle it.

What is recommended pattern here?

ejguan · 2022-01-18T14:42:38Z

@ejguan The test failure most likely stems from this:

vision/torchvision/prototype/datasets/_builtin/cub200.py

Lines 171 to 178 in f923aeb

images_dp, split_dp, image_files_dp, bounding_boxes_dp = Demultiplexer(

archive_dp, 4, self._2011_classify_archive, drop_none=True, buffer_size=INFINITE_BUFFER_SIZE

)

image_files_dp = CSVParser(image_files_dp, dialect="cub200")

image_files_map = dict(

(image_id, rel_posix_path.rsplit("/", maxsplit=1)[1]) for image_id, rel_posix_path in image_files_dp

)

I need this mapping twice and so I thought I can simply create it once. IIUC, the problem comes from the fact that this starts iteration of the Demultiplexer while creating the datapipe and thus it has some file handles in buffer when traverse tries to pickle it.

What is recommended pattern here?

Yeah. The problem is Demux has not been depleted (full iteration) to clean up the iterator. There are two ways

@ejguan The test failure most likely stems from this:

vision/torchvision/prototype/datasets/_builtin/cub200.py

Lines 171 to 178 in f923aeb

images_dp, split_dp, image_files_dp, bounding_boxes_dp = Demultiplexer(

archive_dp, 4, self._2011_classify_archive, drop_none=True, buffer_size=INFINITE_BUFFER_SIZE

)

image_files_dp = CSVParser(image_files_dp, dialect="cub200")

image_files_map = dict(

(image_id, rel_posix_path.rsplit("/", maxsplit=1)[1]) for image_id, rel_posix_path in image_files_dp

)

I need this mapping twice and so I thought I can simply create it once. IIUC, the problem comes from the fact that this starts iteration of the Demultiplexer while creating the datapipe and thus it has some file handles in buffer when traverse tries to pickle it.

What is recommended pattern here?

Thanks for adding such use case. The problem is iterator object and buffer attached to Demultiplexer object is not cleaned up if demux has not been depleted (full iteration of all children). This can be solved by adding special serialize/deserialize functions for demux to prevent serialization of theses objects.
cc: @NivekT

pmeier · 2022-01-19T08:37:21Z

@NicolasHug I added the ability add marks like pytest.mark.xfail to datasets through the parametrization decorator. With this in place, we can xfail "cub200" on the failing tests until #5187 (comment) is resolved on the torchdata side.

Summary: * refactor prototype datasets tests * skip tests with insufficient third party dependencies * cleanup * add tests for SBD prototype dataset * add tests for SEMEION prototype dataset * add tests for VOC prototype dataset * add tests for CelebA prototype dataset * add tests for DTD prototype dataset * add tests for FER2013 prototype dataset * add tests for CLEVR prototype dataset * add tests for oxford-iiit-pet prototype dataset * enforce tests for new datasets * add missing archive generation for oxford-iiit-pet tests * add tests for CUB200 prototype datasets * fix split generation * add capability to mark parametrization and xfail cub200 traverse tests Reviewed By: datumbox, NicolasHug Differential Revision: D33655253 fbshipit-source-id: 186591f2cb89e864c2d143d6a460449cf4991baa

pmeier added 11 commits January 7, 2022 16:12

refactor prototype datasets tests

7f9eeef

skip tests with insufficient third party dependencies

5cbe9c7

cleanup

6ca137f

add tests for SBD prototype dataset

fedf7c3

add tests for SEMEION prototype dataset

6433ad4

add tests for VOC prototype dataset

af6f21d

add tests for CelebA prototype dataset

8479a2f

add tests for DTD prototype dataset

3712b36

add tests for FER2013 prototype dataset

7e99c4a

add tests for CLEVR prototype dataset

add38a5

add tests for oxford-iiit-pet prototype dataset

b6d8ec2

pmeier added module: datasets module: tests prototype labels Jan 10, 2022

pmeier requested a review from NicolasHug January 10, 2022 21:08

pytorch-probot bot added the ciflow/default label Jan 10, 2022

facebook-github-bot added the cla signed label Jan 10, 2022

pmeier added 3 commits January 10, 2022 22:08

Merge branch 'main' into datasets/expand-tests

f6679ab

enforce tests for new datasets

ee25f02

add missing archive generation for oxford-iiit-pet tests

03e8ea9

NicolasHug approved these changes Jan 14, 2022

View reviewed changes

Merge branch 'main' into datasets/expand-tests

6c910c9

pmeier mentioned this pull request Jan 17, 2022

Add tests for prototype utilities #5202

Closed

pmeier added 2 commits January 17, 2022 09:09

Merge branch 'main' into datasets/expand-tests

2a762ca

add tests for CUB200 prototype datasets

e422fae

pmeier commented Jan 17, 2022

View reviewed changes

fix split generation

5a976d9

pmeier mentioned this pull request Jan 17, 2022

add SVHN prototype dataset #5155

Merged

pmeier added 2 commits January 19, 2022 09:03

Merge branch 'main' into datasets/expand-tests

8cd243f

add capability to mark parametrization and xfail cub200 traverse tests

d16b9f6

pmeier merged commit 3e4d062 into pytorch:main Jan 19, 2022

pmeier deleted the datasets/expand-tests branch January 19, 2022 08:55

pmeier mentioned this pull request Jan 19, 2022

add lazily filled dict for prototype datasets #5219

Closed

ejguan mentioned this pull request Jan 19, 2022

Demux (fork) can not be serialized if there is a child not depleted pytorch/data#171

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expand tests for prototype datasets #5187

Expand tests for prototype datasets #5187

Uh oh!

pmeier commented Jan 10, 2022 •

edited by pytorch-probot bot

Loading

Uh oh!

facebook-github-bot commented Jan 10, 2022 •

edited

Loading

Uh oh!

NicolasHug left a comment

Uh oh!

Uh oh!

NicolasHug Jan 14, 2022

Uh oh!

pmeier Jan 17, 2022

Uh oh!

pmeier left a comment

Uh oh!

ejguan commented Jan 18, 2022

Uh oh!

pmeier commented Jan 19, 2022

Uh oh!

Uh oh!

		@@ -129,10 +133,31 @@ def load(self, config, *, decoder=DEFAULT_DECODER):
		return datapipe, mock_info


		def config_id(name, config):

	images_dp, split_dp, image_files_dp, bounding_boxes_dp = Demultiplexer(
	archive_dp, 4, self._2011_classify_archive, drop_none=True, buffer_size=INFINITE_BUFFER_SIZE
	)

	image_files_dp = CSVParser(image_files_dp, dialect="cub200")
	image_files_map = dict(
	(image_id, rel_posix_path.rsplit("/", maxsplit=1)[1]) for image_id, rel_posix_path in image_files_dp
	)

Expand tests for prototype datasets #5187

Expand tests for prototype datasets #5187

Uh oh!

Conversation

pmeier commented Jan 10, 2022 • edited by pytorch-probot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jan 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

unittest_linux_cpu_py3.7 (1/1)

🚧 1 ongoing upstream failure:

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NicolasHug Jan 14, 2022

Choose a reason for hiding this comment

Uh oh!

pmeier Jan 17, 2022

Choose a reason for hiding this comment

Uh oh!

pmeier left a comment

Choose a reason for hiding this comment

Uh oh!

ejguan commented Jan 18, 2022

Uh oh!

pmeier commented Jan 19, 2022

Uh oh!

Uh oh!

pmeier commented Jan 10, 2022 •

edited by pytorch-probot bot

Loading

facebook-github-bot commented Jan 10, 2022 •

edited

Loading