-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Adding Preset Transforms in reference scripts #3317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Preset Transforms in reference scripts #3317
Conversation
1c46a69
to
992d41f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed approach looks great to me, good to go for the other tasks as well!
Codecov Report
@@ Coverage Diff @@
## master #3317 +/- ##
=======================================
Coverage 73.93% 73.93%
=======================================
Files 104 104
Lines 9594 9594
Branches 1531 1531
=======================================
Hits 7093 7093
Misses 2024 2024
Partials 477 477 Continue to review full report at Codecov.
|
71b7091
to
a2e9306
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes for segmentation and video classification looks good to me as well!
Summary: * Adding presets in the classification reference scripts. * Adding presets in the object detection reference scripts. * Adding presets in the segmentation reference scripts. * Adding presets in the video classification reference scripts. * Moving flip at the end to align with image classification signature. Reviewed By: datumbox Differential Revision: D26226607 fbshipit-source-id: 965f54e18d01fce6c1225eb2b6bdea1e4efd3998
class ClassificationPresetEval: | ||
def __init__(self, crop_size, resize_size=256, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)): | ||
|
||
self.transforms = transforms.Compose([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, for text domain, we will need to download the transform, for example sentencepiece model, or a vocabulary saved in text file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm writing here what we discussed on the call.
It seems that supporting your case is possible by using PyTorch Hub's load_state_dict_from_url()
method and then passing the result to your code. This is very common pattern in TorchVision, used mainly for pre-trained models. Example:
vision/torchvision/models/mobilenetv3.py
Lines 244 to 249 in 97885cb
model = MobileNetV3(inverted_residual_setting, last_channel, **kwargs) | |
if pretrained: | |
if model_urls.get(arch, None) is None: | |
raise ValueError("No checkpoint is available for model type {}".format(arch)) | |
state_dict = load_state_dict_from_url(model_urls[arch], progress=progress) | |
model.load_state_dict(state_dict) |
@netw0rkf10w Providing the training and inference transforms of each model/pipeline in an organized way so that people can reproduce the models. Here is an example of complex presets: 1 and 2. |
Thanks, @datumbox, for your reply! Is there a discussion thread (or a GitHub issue) on the topic that I can read, or was it internal? |
Unfortunately many of these discussions happened outside of Github. This is quite problematic and we want to change it because in circumstances like this, it's hard to give information to people and does not help with the transparency... So, apologies for not being able to point you to a public thread... To solve the situation, below I provide a summary of what motivated the change. Let me know if you need more info: Preset preprocessing transforms are those transformations applied to the data before feeding them to an ML model. They are typically separated into two categories: those applied during training and those during inference. Examples of such transforms include the Data Augmentation techniques, the Normalization/Scaling methods and other adhoc transformations applied to the data as a preliminary step (binary to bitmap conversion for Vision, Fast Fourier Transforms for Audio, Tokenization for Text etc). The preset transforms are a crucial part of the model and having access to them is necessary to understand how a model was created and how to use it. Disclosing which training transforms were used is an important part for reproducibility and crucial to understand the assumptions and properties a specific model. The latter is particularly true while using transfer learning and porting a model from one domain to another (for example the Zoom transform can be used for augmentation in ImageNet Classification but not for Cancer Detection). Similarly having access to the transforms used during inference is critical because without them one can’t use the model. This is the reason we decided to bring these Preset transforms as close to the training references as possible. By putting them together, the users are able to reproduce the training, adjust the scripts to meet their needs, do transfer learning etc. Hope that makes sense. |
@datumbox That totally makes sense! Thank you very much for your detailed explanation! |
Updating the following reference scripts:
The Similarity reference script was skipped because it's not a real recipe. A reference implementation for it can be seen at 71b7091.