-
Notifications
You must be signed in to change notification settings - Fork 458
Experimental datumaro implementation #4920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Signed-off-by: Albert van Houten <[email protected]> Co-authored-by: Leonardo Lai <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
…4751) Signed-off-by: Albert van Houten <[email protected]> Co-authored-by: Grégoire Payen de La Garanderie <[email protected]>
…ing_extensions into feature/datumaro # Conflicts: # library/src/otx/data/dataset/base_new.py # library/src/otx/data/dataset/classification_new.py # library/src/otx/data/dataset/detection_new.py # library/src/otx/data/dataset/instance_segmentation_new.py # library/src/otx/data/dataset/keypoint_detection_new.py # library/src/otx/data/dataset/segmentation_new.py # library/src/otx/data/entity/sample.py
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]> Co-authored-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
…ing_extensions into feature/datumaro Signed-off-by: Albert van Houten <[email protected]> # Conflicts: # library/pyproject.toml # library/src/otx/data/dataset/anomaly.py # library/src/otx/data/dataset/base.py # library/src/otx/data/dataset/classification.py # library/src/otx/data/dataset/detection.py # library/src/otx/data/dataset/instance_segmentation.py # library/src/otx/data/dataset/keypoint_detection.py # library/src/otx/data/dataset/segmentation.py # library/src/otx/data/dataset/tile.py # library/src/otx/data/factory.py # library/src/otx/data/module.py # library/src/otx/data/transform_libs/torchvision.py # library/tests/unit/data/samplers/test_class_incremental_sampler.py # library/tests/unit/data/utils/test_utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements an experimental Datumaro dataset integration for OTX, transitioning from legacy Datumaro components to the new experimental Dataset API. The changes introduce a new sample-based architecture while maintaining compatibility with existing OTX functionality.
Key changes:
- Migration from legacy Datumaro components to experimental Dataset API with schema-based conversion
- Introduction of new OTXSample-based data entities with PyTree registration for TorchVision compatibility
- Replacement of legacy polygon handling with numpy ragged arrays for better performance
- Comprehensive test updates and new test implementations for the updated dataset architecture
Reviewed Changes
Copilot reviewed 59 out of 59 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Updates Datumaro dependency to experimental branch version |
| library/src/otx/data/dataset/*.py | Implements new dataset classes using experimental Datumaro with sample-based architecture |
| library/src/otx/data/entity/sample.py | Adds new OTXSample classes with PyTree registration for transform compatibility |
| library/tests/unit/types/test_label.py | Updates label tests to use new hierarchical label categories |
| library/tests/unit/data/transform_libs/test_torchvision.py | Converts polygon handling from Datumaro objects to numpy arrays |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| fake_polygons = np.array( | ||
| [ | ||
| np.array([[10, 10], [50, 10], [50, 50], [10, 50]]), # Rectangle polygon for first object | ||
| np.array([[60, 60], [100, 60], [100, 100], [60, 100]]), # Rectangle polygon for second object | ||
| ] | ||
| ) |
Copilot
AI
Oct 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The polygon data structure has changed from Datumaro Polygon objects to numpy ragged arrays, but the test doesn't validate that the new format maintains the same geometric properties. Consider adding assertions to verify that the polygon areas and vertex coordinates are equivalent between the old and new formats.
| p = np.asarray(polygon.points) | ||
| p[0::2] = width - p[0::2] | ||
| return p.tolist() | ||
| def revert_hflip(polygon: np.ndarray, width: int) -> np.ndarray: |
Copilot
AI
Oct 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function modifies the input polygon array in-place, which could cause issues if the original array is used elsewhere. The function should create a copy before modification to avoid unintended side effects.
| def revert_hflip(polygon: np.ndarray, width: int) -> np.ndarray: | |
| def revert_hflip(polygon: np.ndarray, width: int) -> np.ndarray: | |
| polygon = polygon.copy() |
| rescaled_polygons[i] = scaled_points | ||
| else: | ||
| # Handle empty or invalid polygons | ||
| rescaled_polygons[i] = np.array([[0, 0], [0, 0], [0, 0]], dtype=np.float32) |
Copilot
AI
Oct 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Creating a dummy polygon with three identical points may not be the best approach for handling empty/invalid polygons. Consider using an empty array or a clearly marked invalid polygon structure that can be properly filtered out later in the pipeline.
| rescaled_polygons[i] = np.array([[0, 0], [0, 0], [0, 0]], dtype=np.float32) | |
| rescaled_polygons[i] = np.empty((0, 2), dtype=np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why exactly 3 points BTW? Why not just empty tensor or 0 tensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely sure why Greg applied it in this manner, but I guess there is a validation step somewhere that requires 3 points for a valid polygon.
# Conflicts: # library/tests/unit/data/dataset/test_base.py # library/tests/unit/data/test_tiling.py
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 98 out of 99 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| object.__setattr__( | ||
| label_group, "labels", [id_to_name_mapping.get(label, label) for label in label_group.labels] | ||
| ) |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using object.__setattr__ to mutate frozen dataclass attributes suggests that the upstream dataclass (HierarchicalLabelCategories or LabelGroup) is frozen. Consider whether these objects should be mutable by design or if there's a better approach (e.g., creating new instances rather than mutating in place) to avoid circumventing immutability guarantees.
| if img.ndim == 3 and img.shape[0] <= 4 and img.shape[0] < min(img.shape[1:]): | ||
| # Image is in CHW format, transpose to HWC | ||
| return np.ascontiguousarray(img.transpose(1, 2, 0)) |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The heuristic assumes that if the first dimension is ≤4 and smaller than other dimensions, the image is in CHW format. This may fail for edge cases where an image legitimately has height or width of 4 or less. Consider adding additional validation or a more robust channel detection mechanism to prevent incorrect transpositions.
| def _is_native_torchvision_transform(self, transform: tvt_v2.Transform) -> bool: | ||
| """Check if the transform is a native torchvision transform.""" | ||
| module = type(transform).__module__ | ||
| return module.startswith("torchvision.") |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check relies on the module name starting with 'torchvision.'. If third-party packages or custom transforms mimic this naming convention, they may be incorrectly classified. Consider a more robust mechanism, such as checking against a known set of torchvision transform types or using an interface/base class check.
| return module.startswith("torchvision.") | |
| # Only treat as native if defined in torchvision.transforms.v2 or its _transforms submodule | |
| return module == "torchvision.transforms.v2" or module == "torchvision.transforms.v2._transforms" |
| if (ph, pw) == (tw, th): | ||
| pm = pm.transpose(-1, -2) | ||
| # Else if target seems swapped relative to prediction, transpose target | ||
| elif (th, tw) == (pw, ph): |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dimension alignment logic attempts to fix spatial mismatches by transposing. However, this heuristic may not handle all cases correctly, especially if dimensions are neither equal nor swapped. Consider logging a warning when dimensions mismatch and a transpose is applied, so that unexpected dimension mismatches can be detected and investigated.
| if (ph, pw) == (tw, th): | |
| pm = pm.transpose(-1, -2) | |
| # Else if target seems swapped relative to prediction, transpose target | |
| elif (th, tw) == (pw, ph): | |
| if (ph, pw) == (tw, th): | |
| log.warning( | |
| f"Predicted mask shape {pm.shape} does not match target mask shape {tm.shape}. " | |
| "Transposing predicted mask to align spatial dimensions." | |
| ) | |
| pm = pm.transpose(-1, -2) | |
| # Else if target seems swapped relative to prediction, transpose target | |
| elif (th, tw) == (pw, ph): | |
| log.warning( | |
| f"Target mask shape {tm.shape} does not match predicted mask shape {pm.shape}. " | |
| "Transposing target mask to align spatial dimensions." | |
| ) |
| "RandomPhotometricDistort", | ||
| "RandomGaussianBlur", | ||
| "RandomGaussianNoise", |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removed 'TopdownAffine' augmentation from the configurable_augs list may indicate that this augmentation is no longer supported or has been moved elsewhere. Ensure that test coverage for this augmentation exists in another test or that its removal is intentional and documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove TopdownAffine from configurable augs list? We actually provide possibility to turn on/off this augmentation in Geti
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc, TopdownAffine can't be an optional augmentation since it is responsible for resizing the image. Otherwise, at least for this test, the images were in the wrong size.
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
Signed-off-by: Albert van Houten <[email protected]>
…ing_extensions into feature/datumaro Signed-off-by: Albert van Houten <[email protected]> # Conflicts: # library/uv.lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kprokofi please review if these changes make sense. The old logic only worked when tiling with square images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think padding do nothing here. We resize and pad to the same size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the padding, tiling only works if the image is square (which it was previously in the integration test). I have replaced it with another dataset, as the polygons in the dataset were self-intersecting, which isn't supported, and this new dataset does have rectangle images, which broke specifically this tiling recipe.
| # Unit test models separately using --forked flag so that each is run in a separate process. | ||
| # Running these models after each other from the same process can cause segfaults | ||
| - name: Unit testing models | ||
| working-directory: library | ||
| run: uv run pytest tests/unit --cov --cov-report=xml | ||
| env: | ||
| OMP_NUM_THREADS: "1" | ||
| run: uv run pytest tests/unit/backend/native/models --cov --cov-report=xml --cov-append --forked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any idea what causes the segfaults?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear to me. I've tried a few different things to see if it would fix it, like clearing up pytorch memory usage, but this was the only way that I could fix it.
| # TODO(gdlg): Add support for ignore_index again | ||
| ignore_index: int = 255, # noqa: ARG003 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a TODO here. Do you know what's the purpose of this ignore_index parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore index is necessary for Semantic Segmentation tasks. This index helps to ignore marked pixels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any use case for the ignore index being different from the value used for background pixels? Or is it always the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is always the same if we don't count background as training class. Sometimes (in some datasets), ignore_index = -1 is used to indicate invalid pixels instead of 255.
However in the datasets like Pascal VOC we use background in training and mark it as 0 class
kprokofi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round of comments
| self, | ||
| batch_tile_preds: list[OTXPredBatch], | ||
| batch_tile_attrs: list[list[dict]], | ||
| batch_tile_attrs: list[list[TileInfo]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are changing name tile_attrs to tile_info, could you change it everywhere for consistency?
| for pred_mask, target_mask in zip(preds.masks, inputs.masks) | ||
| ] | ||
| aligned: list[dict[str, Any]] = [] | ||
| for pred_mask, target_mask in zip(preds.masks, inputs.masks): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just pm, tm then? (for pm, tm in ..)
| if tm.ndim == 2: | ||
| tm = tm.unsqueeze(0) | ||
|
|
||
| # Align spatial dimensions if swapped (e.g., [1, W, H] vs [1, H, W]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why spatial dimensions can be misaligned and swapped? Code here looks as a hack for me. We need to align targets and predictions on the dataset preparation / output preprocessing stage, not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me recheck this after my break. I have made many changes to get training to work, and probably some of these checks can be removed. I do think it's still good to make some of this code more robust against channel orders.
|
|
||
|
|
||
| @register_pytree_node | ||
| class InstanceSegmentationSampleWithMask(OTXSample): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can't we combine InstanceSegmentationSample and InstanceSegmentationSampleWithMask? The difference in mask field only.
- Why do we still include polygons in InstanceSegmentationSampleWithMask?
| cls, | ||
| task: OTXTaskType, | ||
| dm_subset: DmDataset, | ||
| dm_subset: DmDataset | DatasetNew, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it for backward compatibility? I don't think we need that. cc @leoll2
Same for "convert_from_legacy".
GetiTune will depend on the new Datumaro Dataset, Geti classic has no dependency on the develop branch. OTX uses file system, not datumaro directly. I propose to support only the latest version if no stoppers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Legacy datumaro datasets can still be passed here since the conversion to the new datumaro dataset schema happens in this factory. We will remove this in the future, but as long as we still support the legacy datasets in datumaro it also makes sense to keep it here. If you already have a new datumaro Dataset, you can initialize the tasks dataset immediately.
It also reduces the scope for this PR a bit.
| elif isinstance(image, torch.Tensor): | ||
| # Handle tensor in HWC format - convert to CHW | ||
| # If the last dimension is <= 4 and smaller than other dims, it's likely HWC format | ||
| if image.ndim == 3 and image.shape[-1] <= 4 and image.shape[-1] < min(image.shape[:-1]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to convert tensors from CHW (what is actually needed for training and transforms) to HWC?
| if isinstance(img, np.ndarray): | ||
| # Check if the numpy array is in CHW format (channels should be <= 4 typically) | ||
| # If the first dimension is small (<=4) and smaller than other dimensions, it's likely CHW format | ||
| if img.ndim == 3 and img.shape[0] <= 4 and img.shape[0] < min(img.shape[1:]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above. I think we should always stick to CHW format.
| rescaled_polygons[i] = scaled_points | ||
| else: | ||
| # Handle empty or invalid polygons | ||
| rescaled_polygons[i] = np.array([[0, 0], [0, 0], [0, 0]], dtype=np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why exactly 3 points BTW? Why not just empty tensor or 0 tensor?
kprokofi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second round of comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think padding do nothing here. We resize and pad to the same size.
|
|
||
| @classmethod | ||
| def from_dm_label_groups(cls, dm_label_categories: LabelCategories) -> LabelInfo: | ||
| def from_dm_label_groups(cls, dm_label_categories: HierarchicalLabelCategories) -> LabelInfo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we deleted this in OTXDataset and we define LabelInfo directly, could you clean this class? Specifically, we don't need "from_dm_label_groups", "from_dm_label_groups_arrow", etc.
| if torch.cuda.is_available(): | ||
| torch.cuda.empty_cache() | ||
| if is_xpu_available(): | ||
| import intel_extension_for_pytorch as ipex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use IPEX. It is outdated
| "RandomPhotometricDistort", | ||
| "RandomGaussianBlur", | ||
| "RandomGaussianNoise", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove TopdownAffine from configurable augs list? We actually provide possibility to turn on/off this augmentation in Geti
Signed-off-by: Albert van Houten <[email protected]>
This pull request introduces the new experimental dataset into the dataset classes of OTX. A lot of logic has been migrated to datumaro such as management of image color channels and the tiling implementation.
Summary
How to test
Checklist