[Discussion] How do we want to handle `torchvision.prototype.features.Feature`'s?

This issue should spark a discussion about how we want to handle `Feature`'s in the future. There are a lot of open questions I'm trying to summarize. I'll give my opinion to each of them. You can find the current implementation under `torchvision.prototype.features`.

## What are `Feature`'s?

`Feature`'s are subclasses of `torch.Tensor` and their purpose is threefold:

1. With their type, e.g. `Image`, they information about the data they carry. The prototype transformations (`torchvision.prototype.transforms`) use this information to automatically dispatch an input to the correct kernel.
2. They can optionally carry additional meta data that might be needed for transforming the feature. For example, most geometric transformations can only be performed on bounding boxes if the size of the corresponding image is known.
3. They provide a convenient interface for feature specific functionality, for example transforming the format of a bounding box.

There are currently three `Feature`'s implemented

- `Image`,
- `BoundingBox`, and
- `Label`,

but in the future we should add at least three more:

- `SemanticSegmentationMask`,
- `InstanceSegementationMask`, and
- `Video`.

## What is the policy of adding new `Feature`'s?

We could allow subclassing of `Feature`'s. On the one hand, this would make it easier for datasets to conveniently bundle meta data. For example, the COCO dataset could return a `CocoLabel`, which in addition to the default `Label.category` could also have the `super_category` field. On the other hand, this would also mean that the transforms need to handle subclasses of features well, for example a `CocoLabel` could be treated the same as a `Label`.

I see two downsides with that:

1. What if a transform needs the additional meta data carried by a feature subclass? Imagine I've added a special transformation that needs `CocoLabel.super_category`. Although from the surface this now supports plain `Label`'s this will fail at runtime.
2. Documentation custom features is more complicated than documenting a separate field in the sample dictionary of a dataset.

Thus, I'm leaning towards only having a few base classes.

## From what data should a `Feature` be instantiable?

Some of the features like `Image` or `Video` have non-tensor objects that carry the data. Should these features know how to handle them? For example should something like `Image(PIL.Image.open(...))` work?

My vote is out for yes. IMO this is very convenient and also not an unexpected semantic compared to passing the data directly, e.g. `Image(torch.rand(3, 256, 256))`

## Should `Feature`'s have a fixed shape?

Consider the following table:

| `Feature`                   | `.shape`                      |
|-----------------------------|-------------------------------|
| `Image`                     | `(*, C, H, W)`                |
| `Label`                     | `(*)`                         |
| `BoundingBox`               | `(*, 4)`                      |
| `SemanticSegmentationMask`  | `(*, H, W)` or `(*, C, H, W)` |
| `InstanceSegementationMask` | `(*, N, H, W)`                |
| `Video`                     | `(*, T, C, H, W)`             |

(For `SemanticSegmentationMask` I'm not sure about the shape yet. Having an extra channel dimension makes the tensor unnecessarily large, but it aligns well with segmentation image files, which are usually stored as RGB)

Should we fix the shape to a single feature, i.e. remove the `*` from the table above, or should we only care about the shape in the last dimensions to be correct?

My vote is out for having a flexible shape, since otherwise batching is not possible. For example, if we fix bounding boxes to shape `(4,)` a transformation would need to transform `N` bounding boxes individually, while for shape `(N, 4)` it could make use of parallelism.

On the same note, if we go for the flexible shape, do we keep the singular name of the feature? For example, do we regard a batch of images with shape `(B, C, H, W)` still as `Image` or should we go for the plural `Images` in general? My vote is out for always keeping the singular, since I've often seen something like:

```python
for image, target in data_loader(dataset, batch_size=4):
    ...
```

## Should `Feature`'s have a fixed dtype?

This makes sense for `InstanceSegementationMask` which should always be `torch.bool`. For all the other features I'm unsure. My gut says to use a default dtype, but also allow other dtypes.

## What meta data should `Feature`'s carry?

IMO, this really depends on the decision above about the fixed / flexible shapes. If we go for fixed shapes, it can basically carry any information. If we go for flexible shapes instead, we should only have meta data, which is the same for batched features. For example, `BoundingBox.image_size` is fine, but `Label.category` is not.

## What methods should `Feature`'s provide?

For now I've only included typical conversion methods, but of course this is not exhaustive.

| `Feature`                   | method(s)          |
|-----------------------------|--------------------|
| `Image`                     | `.to_dtype()`      |
|                             | `.to_colorspace()` |
| `Label`                     | `.to_str()`        |
| `BoundingBox`               | `.to_format()`     |
| `InstanceSegementationMask` | `.to_semantic()`   |


cc @bjuncek

`Feature`	`.shape`
`Image`	`(*, C, H, W)`
`Label`	`(*)`
`BoundingBox`	`(*, 4)`
`SemanticSegmentationMask`	`(, H, W)` or `(, C, H, W)`
`InstanceSegementationMask`	`(*, N, H, W)`
`Video`	`(*, T, C, H, W)`

`Feature`	method(s)
`Image`	`.to_dtype()`
	`.to_colorspace()`
`Label`	`.to_str()`
`BoundingBox`	`.to_format()`
`InstanceSegementationMask`	`.to_semantic()`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Discussion] How do we want to handle `torchvision.prototype.features.Feature`'s? #5045

What are `Feature`'s?

What is the policy of adding new `Feature`'s?

From what data should a `Feature` be instantiable?

Should `Feature`'s have a fixed shape?

Should `Feature`'s have a fixed dtype?

What meta data should `Feature`'s carry?

What methods should `Feature`'s provide?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Discussion] How do we want to handle torchvision.prototype.features.Feature's? #5045

Description

What are Feature's?

What is the policy of adding new Feature's?

From what data should a Feature be instantiable?

Should Feature's have a fixed shape?

Should Feature's have a fixed dtype?

What meta data should Feature's carry?

What methods should Feature's provide?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Discussion] How do we want to handle `torchvision.prototype.features.Feature`'s? #5045

What are `Feature`'s?

What is the policy of adding new `Feature`'s?

From what data should a `Feature` be instantiable?

Should `Feature`'s have a fixed shape?

Should `Feature`'s have a fixed dtype?

What meta data should `Feature`'s carry?

What methods should `Feature`'s provide?