Skip to content

Current way to use torchvision.prototype.transforms #7168

Closed
@austinmw

Description

@austinmw

📚 The doc issue

I tried to run the end-to-end example in this recent blog post:

import PIL
from torchvision import io, utils
from torchvision.prototype import features, transforms as T
from torchvision.prototype.transforms import functional as F
# Defining and wrapping input to appropriate Tensor Subclasses
path = "COCO_val2014_000000418825.jpg"
img = features.Image(io.read_image(path), color_space=features.ColorSpace.RGB)
# img = PIL.Image.open(path)
bboxes = features.BoundingBox(
    [[2, 0, 206, 253], [396, 92, 479, 241], [328, 253, 417, 332],
     [148, 68, 256, 182], [93, 158, 170, 260], [432, 0, 438, 26],
     [422, 0, 480, 25], [419, 39, 424, 52], [448, 37, 456, 62],
     [435, 43, 437, 50], [461, 36, 469, 63], [461, 75, 469, 94],
     [469, 36, 480, 64], [440, 37, 446, 56], [398, 233, 480, 304],
     [452, 39, 463, 63], [424, 38, 429, 50]],
    format=features.BoundingBoxFormat.XYXY,
    spatial_size=F.get_spatial_size(img),
)
labels = features.Label([59, 58, 50, 64, 76, 74, 74, 74, 74, 74, 74, 74, 74, 74, 50, 74, 74])
# Defining and applying Transforms V2
trans = T.Compose(
    [
        T.ColorJitter(contrast=0.5),
        T.RandomRotation(30),
        T.CenterCrop(480),
    ]
)
img, bboxes, labels = trans(img, bboxes, labels)
# Visualizing results
viz = utils.draw_bounding_boxes(F.to_image_tensor(img), boxes=bboxes)
F.to_pil_image(viz).show()

but found that torchvision.prototype.features is now gone. What's the current way to run this? I attempted to simply pass the images, bboxes and labels with the following types: torchvision.prototype.datasets.utils._encoded.EncodedImage, torchvision.prototype.datapoints._bounding_box.BoundingBox, torchvision.prototype.datapoints._label.Label. However this didn't seem to apply the transforms as everything remained the same shape.

edit: I've found that features seems to be renamed to datapoints. I tried applying this, but EncodedImage in a coco sample['image'] seems to be 1D and prototype.transforms requires 2D images. What's the proper way to get this as 2D so I can apply transforms? Is there a decode method I'm missing?

Suggest a potential alternative/fix

No response

cc @vfdev-5 @bjuncek @pmeier

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions