Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified keras_cv/models/backbones/vit_det/data/vitdet_base_out.npz
Binary file not shown.
8 changes: 7 additions & 1 deletion keras_cv/models/backbones/vit_det/vit_det_backbone.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

from keras_cv.api_export import keras_cv_export
from keras_cv.backend import keras
from keras_cv.backend import ops
from keras_cv.layers.vit_det_layers import AddPositionalEmbedding
from keras_cv.layers.vit_det_layers import ViTDetPatchingAndEmbedding
from keras_cv.layers.vit_det_layers import WindowedTransformerEncoder
Expand Down Expand Up @@ -81,9 +82,9 @@ class ViTDetBackbone(Backbone):
def __init__(
self,
*,
include_rescaling,
input_shape=(1024, 1024, 3),
input_tensor=None,
include_rescaling=False,
patch_size=16,
embed_dim=768,
depth=12,
Expand Down Expand Up @@ -123,6 +124,11 @@ def __init__(
# Use common rescaling strategy across keras_cv
x = keras.layers.Rescaling(1.0 / 255.0)(x)

# VITDet scales inputs based on the standard ImageNet mean/stddev.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we want to apply both types of rescaling? I'm a bit confused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The include_rescaling check and associated rescaling layer make sure that the inputs are scaled from 0-1.

The subsequent bit rescales that using the mean and stddev of ImageNet, with the prior assumption that inputs are scaled 0-1

x = (x - ops.array([0.229, 0.224, 0.225], dtype=x.dtype)) / (
ops.array([0.485, 0.456, 0.406], dtype=x.dtype)
)
Comment on lines +127 to +130
Copy link
Contributor

@tirthasheshpatel tirthasheshpatel Sep 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  • I think you got the mean and std mixed up :) From the SAM repo, mean is [123.675, 116.28, 103.53] ([0.485, 0.456, 0.406] when normalized) and std is [58.395, 57.12, 57.375] ([0.229, 0.224, 0.225] when normalized).
  • We should not do this because the SAM model first normalizes, then pads. If padded inputs are passed, the preprocessing operation would not remain the same. It's giving me suboptimal outputs when I run the demos.

@ianstenbit Let's revert this and document the preprocess step, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this does look like the mean/std are backwards -- I'll fix that. Can you send your demos so that I can run them?
My test demo looked fine with this change, which makes me think it's not sensitive enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this does look like the mean/std are backwards -- I'll fix that. Can you send your demos so that I can run them?

Here: https://colab.research.google.com/drive/1wHTsYfmmZVuC71I4St1NshaAOQ6nUPFg?usp=sharing

I ran them with the patch in #2087 and the outputs are looking better. Output masks are now much closer to the demo on the original repo, with slight noise here and there (I think this is because the padded outlines having a non-zero value because of the normalization step). Nothing big though.

My test demo looked fine with this change, which makes me think it's not sensitive enough.

I agree, it's not super-sensitive, which is good! I just thought there might be cases where this could lead to a huge difference. What do you think about keeping the normalization step an opt-in rather than always having it on?


x = ViTDetPatchingAndEmbedding(
kernel_size=(patch_size, patch_size),
strides=(patch_size, patch_size),
Expand Down
6 changes: 3 additions & 3 deletions keras_cv/models/backbones/vit_det/vit_det_backbone_presets.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
"config": {
"input_shape": (1024, 1024, 3),
"input_tensor": None,
"include_rescaling": False,
"include_rescaling": True,
"patch_size": 16,
"embed_dim": 768,
"depth": 12,
Expand Down Expand Up @@ -61,7 +61,7 @@
"config": {
"input_shape": (1024, 1024, 3),
"input_tensor": None,
"include_rescaling": False,
"include_rescaling": True,
"patch_size": 16,
"embed_dim": 1024,
"depth": 24,
Expand Down Expand Up @@ -92,7 +92,7 @@
"config": {
"input_shape": (1024, 1024, 3),
"input_tensor": None,
"include_rescaling": False,
"include_rescaling": True,
"patch_size": 16,
"embed_dim": 1280,
"depth": 32,
Expand Down
Binary file not shown.
Binary file not shown.