Skip to content

Adding Preset Transforms in reference scripts #3317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 28, 2021

Conversation

datumbox
Copy link
Contributor

@datumbox datumbox commented Jan 28, 2021

Updating the following reference scripts:

  • Classification
  • Object Detection
  • Segmentation
  • Video Classification

The Similarity reference script was skipped because it's not a real recipe. A reference implementation for it can be seen at 71b7091.

@datumbox datumbox force-pushed the references/preset_transforms branch from 1c46a69 to 992d41f Compare January 28, 2021 12:09
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed approach looks great to me, good to go for the other tasks as well!

@codecov
Copy link

codecov bot commented Jan 28, 2021

Codecov Report

Merging #3317 (ba326de) into master (7621a8e) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3317   +/-   ##
=======================================
  Coverage   73.93%   73.93%           
=======================================
  Files         104      104           
  Lines        9594     9594           
  Branches     1531     1531           
=======================================
  Hits         7093     7093           
  Misses       2024     2024           
  Partials      477      477           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7621a8e...ba326de. Read the comment docs.

@datumbox datumbox force-pushed the references/preset_transforms branch from 71b7091 to a2e9306 Compare January 28, 2021 14:06
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes for segmentation and video classification looks good to me as well!

@datumbox datumbox changed the title [WIP] Adding Preset Transforms in reference scripts Adding Preset Transforms in reference scripts Jan 28, 2021
@datumbox datumbox mentioned this pull request Jan 28, 2021
13 tasks
@datumbox datumbox merged commit 1703e4c into pytorch:master Jan 28, 2021
@datumbox datumbox deleted the references/preset_transforms branch January 28, 2021 15:08
facebook-github-bot pushed a commit that referenced this pull request Feb 4, 2021
Summary:
* Adding presets in the classification reference scripts.

* Adding presets in the object detection reference scripts.

* Adding presets in the segmentation reference scripts.

* Adding presets in the video classification reference scripts.

* Moving flip at the end to align with image classification signature.

Reviewed By: datumbox

Differential Revision: D26226607

fbshipit-source-id: 965f54e18d01fce6c1225eb2b6bdea1e4efd3998
class ClassificationPresetEval:
def __init__(self, crop_size, resize_size=256, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)):

self.transforms = transforms.Compose([
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, for text domain, we will need to download the transform, for example sentencepiece model, or a vocabulary saved in text file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm writing here what we discussed on the call.

It seems that supporting your case is possible by using PyTorch Hub's load_state_dict_from_url() method and then passing the result to your code. This is very common pattern in TorchVision, used mainly for pre-trained models. Example:

model = MobileNetV3(inverted_residual_setting, last_channel, **kwargs)
if pretrained:
if model_urls.get(arch, None) is None:
raise ValueError("No checkpoint is available for model type {}".format(arch))
state_dict = load_state_dict_from_url(model_urls[arch], progress=progress)
model.load_state_dict(state_dict)

@netw0rkf10w
Copy link
Contributor

@datumbox @fmassa Could you please tell me what was the original motivation of this PR? I couldn't find any information on Preset Transforms. Thanks a lot!

@datumbox
Copy link
Contributor Author

@netw0rkf10w Providing the training and inference transforms of each model/pipeline in an organized way so that people can reproduce the models. Here is an example of complex presets: 1 and 2.

@netw0rkf10w
Copy link
Contributor

Thanks, @datumbox, for your reply! Is there a discussion thread (or a GitHub issue) on the topic that I can read, or was it internal?

@datumbox
Copy link
Contributor Author

Unfortunately many of these discussions happened outside of Github. This is quite problematic and we want to change it because in circumstances like this, it's hard to give information to people and does not help with the transparency... So, apologies for not being able to point you to a public thread... To solve the situation, below I provide a summary of what motivated the change. Let me know if you need more info:

Preset preprocessing transforms are those transformations applied to the data before feeding them to an ML model. They are typically separated into two categories: those applied during training and those during inference. Examples of such transforms include the Data Augmentation techniques, the Normalization/Scaling methods and other adhoc transformations applied to the data as a preliminary step (binary to bitmap conversion for Vision, Fast Fourier Transforms for Audio, Tokenization for Text etc).

The preset transforms are a crucial part of the model and having access to them is necessary to understand how a model was created and how to use it. Disclosing which training transforms were used is an important part for reproducibility and crucial to understand the assumptions and properties a specific model. The latter is particularly true while using transfer learning and porting a model from one domain to another (for example the Zoom transform can be used for augmentation in ImageNet Classification but not for Cancer Detection). Similarly having access to the transforms used during inference is critical because without them one can’t use the model.

This is the reason we decided to bring these Preset transforms as close to the training references as possible. By putting them together, the users are able to reproduce the training, adjust the scripts to meet their needs, do transfer learning etc. Hope that makes sense.

@netw0rkf10w
Copy link
Contributor

@datumbox That totally makes sense! Thank you very much for your detailed explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants