-
Notifications
You must be signed in to change notification settings - Fork 332
Update VITDet to conform to KerasCV scaling standards #2086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,6 +16,7 @@ | |
|
||
from keras_cv.api_export import keras_cv_export | ||
from keras_cv.backend import keras | ||
from keras_cv.backend import ops | ||
from keras_cv.layers.vit_det_layers import AddPositionalEmbedding | ||
from keras_cv.layers.vit_det_layers import ViTDetPatchingAndEmbedding | ||
from keras_cv.layers.vit_det_layers import WindowedTransformerEncoder | ||
|
@@ -81,9 +82,9 @@ class ViTDetBackbone(Backbone): | |
def __init__( | ||
self, | ||
*, | ||
include_rescaling, | ||
input_shape=(1024, 1024, 3), | ||
input_tensor=None, | ||
include_rescaling=False, | ||
patch_size=16, | ||
embed_dim=768, | ||
depth=12, | ||
|
@@ -123,6 +124,11 @@ def __init__( | |
# Use common rescaling strategy across keras_cv | ||
x = keras.layers.Rescaling(1.0 / 255.0)(x) | ||
|
||
# VITDet scales inputs based on the standard ImageNet mean/stddev. | ||
x = (x - ops.array([0.229, 0.224, 0.225], dtype=x.dtype)) / ( | ||
ops.array([0.485, 0.456, 0.406], dtype=x.dtype) | ||
) | ||
Comment on lines
+127
to
+130
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Two things:
@ianstenbit Let's revert this and document the preprocess step, what do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes this does look like the mean/std are backwards -- I'll fix that. Can you send your demos so that I can run them? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Here: https://colab.research.google.com/drive/1wHTsYfmmZVuC71I4St1NshaAOQ6nUPFg?usp=sharing I ran them with the patch in #2087 and the outputs are looking better. Output masks are now much closer to the demo on the original repo, with slight noise here and there (I think this is because the padded outlines having a non-zero value because of the normalization step). Nothing big though.
I agree, it's not super-sensitive, which is good! I just thought there might be cases where this could lead to a huge difference. What do you think about keeping the normalization step an opt-in rather than always having it on? |
||
|
||
x = ViTDetPatchingAndEmbedding( | ||
kernel_size=(patch_size, patch_size), | ||
strides=(patch_size, patch_size), | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we want to apply both types of rescaling? I'm a bit confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
include_rescaling
check and associated rescaling layer make sure that the inputs are scaled from 0-1.The subsequent bit rescales that using the mean and stddev of ImageNet, with the prior assumption that inputs are scaled 0-1