Skip to content

Update the Backward Compatibility Service Level Agreement (SLA) #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 27, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 6 additions & 14 deletions RFC-0017-PyTorch-Operator-Versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ These operators sometimes require changes to maintain the high quality user expe

BC and FC breaking changes have been challenging to coordinate across PyTorch because there are multiple consumers of PyTorch’s op set and we promise to keep models running in production working as expected.

We are providing the same Service Level Agreement (SLA) to both internal and external use cases, which is included in the goals to be finalized.

This document proposes a new BC and FC policy based on operator versioning.

Moving forward, we're not having a difference between Meta internal and Open Source (OSS) guarantees. They would be moving under *the same Service Level Agreement (SLA)* to both internal and external use cases.


## History

Expand All @@ -21,10 +21,8 @@ Backwards compatibility (BC), the ability for PyTorch to continue running progra

PyTorch current SLA on backwards compatibility:



* **OSS** — “stable” features will be deprecated for one release before a BC-breaking change is made. [PyTorch OSS BC-breaking policy](https://pytorch.org/docs/master/)
* **FB Internal** — we will not break a serialized torchscript program running in production at Facebook (to be replaced with a more generic SLA)
* **OSS** — “stable” features will be deprecated for one release before a BC-breaking change is made. [PyTorch OSS BC-breaking policy](https://pytorch.org/docs/master/)
* **Meta Internal** — we will not break a serialized torchscript program running in production at Meta (to be replaced with a more generic SLA)

BC-breaking operator changes were previously governed by the [Backward-compatibility Breaking Change Review Process](https://fb.quip.com/gydOArylrcKd), but this only covered torchscript and eager. A generic process needs to be visible from OSS.

Expand All @@ -38,7 +36,7 @@ PyTorch current SLA on forward compatibility:


* **OSS** — no promise
* **FB Internal** — PyTorch commits can run existing PyTorch eager, package/deploy, and serialized torchscript programs for at least two weeks
* **Meta Internal** — PyTorch commits can run existing PyTorch eager, package/deploy, and serialized torchscript programs for at least two weeks
* The addition of a new kwarg-only argument at the end of an op’s parameter list (but before out=, if present) with a default value is FC-compatible for serialized [torchscript](https://fb.workplace.com/groups/pytorch.dev/permalink/909079013003913/) and [mobile](https://fb.workplace.com/groups/pytorch.dev/permalink/912379562673858/).


Expand Down Expand Up @@ -100,7 +98,6 @@ We propose the operator versioning that works across eager, TorchScript, torch.p
* The operator version and upgraders are built into the runtime for BC.
* Allow for the addition of optional keyword-only arguments without a version bump or FC concern
* Since additional operators can be introduced in upgraders, tracing based selective build should also cover upgraders: easier for BC because the new runtimes goes with the upgraders.
* We should also consider the timeline for mobile to no longer use upgraders by requiring models that are too old update themselves before deployment (SLA time window).
* **torch.package changes**
* Each torch.package package contains a table of operators and corresponding version according to PyTorch build used to package the model
* Q: How does the torch.package scenario for mapping old versions to current PyTorch operators work?
Expand All @@ -111,7 +108,7 @@ We propose the operator versioning that works across eager, TorchScript, torch.p
* e2e FC-breaking guide
* It’s OK to add new optional keyword-only arguments as long as their default semantic preserve the operator’s current semantics
* **SLA window**
* We are targeting at a certain period length of Service-level agreement. May start from a window of two binary releases (longer than 90 days)
* PyTorch SLA will ensure that models developed using a certain version and developed with non-deprecated APIs, will be runnable (with a slight performance regression allowed) for *up to one more release or 180 days* (from the version release date that introduced the BC-breaking change), whichever is later.

Note that the proposal does not introduce an explicit version to _all_ PyTorch operators. Instead code changes are only required for updated operators with BC/FC breakage, that cannot be handled by automatic BC/FC methods. For other operators, the implicit version is v0.

Expand Down Expand Up @@ -222,11 +219,6 @@ Deploying a new model to an existing runtime.

# Open Questions

## Use deprecation window to handle backward compatibility
One future option is to keep both old and new operators, but set a certain deprecation window for old operators. Deprecate the old operator when the window expires. There are some open questions on this option:
* What would be the window length? Would it be different for different situations (internal vs. external, server vs. mobile, etc.)
* From user's point of view, the number of operators my bloat, but the old operators out of SLA BC window can be removed.

## Downgraders for FC
Dual to upgreaders for BC on client, downgraders can be used for FC on server. There are several options:
* We set a 2-week (maybe 3 week) FC window. The FC break update is split into two PRs. The first PR with new operator readers is rolled out. After the FC window (supposing all client runtime are updated to be able to read the new operator), the producer of the new operators are turned on to generate models with new operator schema.
Expand Down