-
Notifications
You must be signed in to change notification settings - Fork 79
[PyTorch] Add RFC for lightweight dispatch #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
[ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
RFC-0020-Lightweight-Dispatch.md
Outdated
@@ -0,0 +1,291 @@ | |||
# Context | |||
With recent developments in PyTorch Edge applications on embedded systems and resource limited devices, the issue of op registration/dispatching runtime overhead and build-time complexity has risen to the fore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"PyTorch Edge" is not public yet, so I'd change to Mobile.
Or just rephrase it like:
"As PyTorch aims to support a wider set of devices and use cases, we see the need for a lighter version of our operator dispatching mechanism".
RFC-0020-Lightweight-Dispatch.md
Outdated
|
||
# Motivation | ||
* **Performance** | ||
* For recent use cases of Edge interpreter, we need to satisfy more and more strict initialization latency requirements, where analysis shows op registration contributes to a large portion of it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename edge>mobile
RFC-0020-Lightweight-Dispatch.md
Outdated
* Also with static dispatch, we don’t have to register all of the ops into the JIT op registry, which saves runtime memory usage and further reduces static initialization time. | ||
* It is possible to avoid dispatching at runtime. | ||
* **Modularity and binary size** | ||
* Currently the mobile runtime consists of both JIT op registry and c10 dispatcher. This project will make it possible to not depend on the c10 dispatcher, delivering a cleaner runtime library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Currently the mobile runtime consists of both JIT op registry and c10 dispatcher. This project will make it possible to not depend on the c10 dispatcher, delivering a cleaner runtime library. | |
* Currently the mobile runtime consists of both JIT op registry and c10 dispatcher. This project will make it possible to not depend on the c10 dispatcher (opt-in), delivering a cleaner runtime library. |
RFC-0020-Lightweight-Dispatch.md
Outdated
* Currently the mobile runtime consists of both JIT op registry and c10 dispatcher. This project will make it possible to not depend on the c10 dispatcher, delivering a cleaner runtime library. | ||
* This project creates an opportunity to reduce binary size by getting rid of the dispatcher and enables further size optimization on unboxing wrappers. | ||
* **Ability to incorporate custom implementation of ATen ops** | ||
* For some of the edge use cases, we need to support custom implementations of ATen ops. With an extra op registration path such as codegen unboxing it is easier to hookup ops with custom native functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* For some of the edge use cases, we need to support custom implementations of ATen ops. With an extra op registration path such as codegen unboxing it is easier to hookup ops with custom native functions. | |
* For some of the mobile use cases, we need to support custom implementations of ATen ops. With an extra op registration path such as codegen unboxing, it is easier to hookup ops with custom native functions. |
RFC-0020-Lightweight-Dispatch.md
Outdated
 | ||
|
||
|
||
Currently the lite interpreter (or Edge runtime) registers all ATen ops into the dispatcher and some other ops into the JIT op registry. At model inference time the interpreter will look for the operator name in the JIT op registry first, if not found then it will look into the dispatcher. This proposal **adds a build flavor that moves these ATen ops from dispatcher to JIT op registry** so that it’s easier to optimize (e.g., avoid schema parsing) and can also reduce dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the lite interpreter (or Edge runtime) registers all ATen ops into the dispatcher and some other ops into the JIT op registry. At model inference time the interpreter will look for the operator name in the JIT op registry first, if not found then it will look into the dispatcher. This proposal **adds a build flavor that moves these ATen ops from dispatcher to JIT op registry** so that it’s easier to optimize (e.g., avoid schema parsing) and can also reduce dependencies. | |
Currently the mobile interpreter registers all ATen ops into the dispatcher and some other ops into the JIT op registry. At model inference time, the interpreter will look for the operator name in the JIT op registry first, if not found then it will look into the dispatcher. This proposal **adds a build flavor that moves these ATen ops from dispatcher to JIT op registry** so that it’s easier to optimize (e.g. avoid schema parsing) and reduce dependencies. |
RFC-0020-Lightweight-Dispatch.md
Outdated
* **JIT type -> C++ type**. This is necessary for some of the optional C++ types, e.g., we need to map `int` to `int64_t` for the last argument in the example. | ||
* This is already done in [types.py](https://github.com/pytorch/pytorch/blob/master/tools/codegen/api/types.py), and we need to integrate it into our new codegen. | ||
* **JIT type -> IValue to basic type conversion C++ code.** E.g., the first argument of this operator: `Tensor(a) self` needs to be translated to: `(std::move(peek(stack, 0, 4))).toTensor()` | ||
* IValue provides APIs to directly convert an ivalue to these basic types. See [ivalue_inl.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/ivalue_inl.h#L1453-L1493) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* IValue provides APIs to directly convert an ivalue to these basic types. See [ivalue_inl.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/ivalue_inl.h#L1453-L1493) | |
* IValue provides APIs to directly convert an IValue to these basic types. See [ivalue_inl.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/ivalue_inl.h#L1453-L1493) |
RFC-0020-Lightweight-Dispatch.md
Outdated
* This is already done in [types.py](https://github.com/pytorch/pytorch/blob/master/tools/codegen/api/types.py), and we need to integrate it into our new codegen. | ||
* **JIT type -> IValue to basic type conversion C++ code.** E.g., the first argument of this operator: `Tensor(a) self` needs to be translated to: `(std::move(peek(stack, 0, 4))).toTensor()` | ||
* IValue provides APIs to directly convert an ivalue to these basic types. See [ivalue_inl.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/ivalue_inl.h#L1453-L1493) | ||
* Here’s a [list](#bookmark=id.deyvpbsb5yel) of all the JIT types appearing in native_functions.yaml, most of them can be converted using ivalue’s API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Here’s a [list](#bookmark=id.deyvpbsb5yel) of all the JIT types appearing in native_functions.yaml, most of them can be converted using ivalue’s API. | |
* Here’s a [list](#bookmark=id.deyvpbsb5yel) of all the JIT types appearing in native_functions.yaml, most of them can be converted using IValue’s API. |
RFC-0020-Lightweight-Dispatch.md
Outdated
* For the scenario of adding a new operator (not to `native_functions.yaml`), we need to provide clear guidance to add it to the JIT op registry as well, otherwise JIT execution will break. | ||
* We can add tests on the mobile build for the sake of coverage. | ||
|
||
For OSS mobile integration, we will need to have a new build flavor to switch between c10 dispatcher vs jit op registry. This new flavor will include codegen source files (`CodegenFunctions.h, CodegenFunctions.cpp, CodegenUnboxingWrappers.cpp`) instead of existing dispatcher related source files: `Operators.cpp`, `RegisterSchema.cpp `etc, similar to the internal build configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's clarify here again that this new path will be opt in
* This is already done in [types.py](https://github.com/pytorch/pytorch/blob/master/tools/codegen/api/types.py), and we need to integrate it into our new codegen. | ||
* **JIT type -> IValue to basic type conversion C++ code.** E.g., the first argument of this operator: `Tensor(a) self` needs to be translated to: `(std::move(peek(stack, 0, 4))).toTensor()` | ||
* IValue provides APIs to directly convert an IValue to these basic types. See [ivalue_inl.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/ivalue_inl.h#L1453-L1493) | ||
* Here’s a [list](#bookmark=id.deyvpbsb5yel) of all the JIT types appearing in native_functions.yaml, most of them can be converted using IValue’s API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the [list](#bookmark=id.deyvpbsb5yel)
valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me attach the list as appendix.
|
||
#### Codegen source file details | ||
|
||
With the logic from the previous section, we should be able to wrap the code into a function pointer and register it into [torch::jit::OperatorRegistry](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/operator.cpp#L19). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size overhead of operator.cpp is around 10 KB. It's much smaller than torch library, but we still need to watch the size for embedded use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised it's that small. The file looks huge to me.
structured: True | ||
structured_inherits: TensorIteratorBase | ||
dispatch: | ||
CPU, CUDA: acosh_out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QuantizedCPU is not listed as a dispatch (it's listed in relu_). Would code-gen consider the dispath listed here? If so, we don't need to generate QuantizedCPU for this op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right this may be a bad example, but on high level we will look at this dispatch keys and generate switch cases.
DispatchKeySet _dk_set = c10::detail::multi_dispatch_key_set(tensor); | ||
DispatchKey _dk = _dk_set.highestPriorityBackendTypeId(); | ||
switch (_dk) { | ||
case DispatchKey::CPU: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's runtime-control flow, does it mean that we need to build both the kernels, even if we just use one? We could use tracing based selective build, but tracing would still be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should trace backend AOT to avoid building both kernels. I think Dhruv has a proposal of doing so.
Summary: RFC: pytorch/rfcs#40 This PR (re)introduces python codegen for unboxing wrappers. Given an entry of `native_functions.yaml` the codegen should be able to generate the corresponding C++ code to convert ivalues from the stack to their proper types. To trigger the codegen, run ``` tools/jit/gen_unboxing.py -d cg/torch/share/ATen ``` Merged changes on CI test. In #71782 I added an e2e test for static dispatch + codegen unboxing. The test exports a mobile model of mobilenetv2, load and run it on a new binary for lite interpreter: `test/mobile/custom_build/lite_predictor.cpp`. ## Lite predictor build specifics 1. Codegen: `gen.py` generates `RegisterCPU.cpp` and `RegisterSchema.cpp`. Now with this PR, once `static_dispatch` mode is enabled, `gen.py` will not generate `TORCH_LIBRARY` API calls in those cpp files, hence avoids interaction with the dispatcher. Once `USE_LIGHTWEIGHT_DISPATCH` is turned on, `cmake/Codegen.cmake` calls `gen_unboxing.py` which generates `UnboxingFunctions.h`, `UnboxingFunctions_[0-4].cpp` and `RegisterCodegenUnboxedKernels_[0-4].cpp`. 2. Build: `USE_LIGHTWEIGHT_DISPATCH` adds generated sources into `all_cpu_cpp` in `aten/src/ATen/CMakeLists.txt`. All other files remain unchanged. In reality all the `Operators_[0-4].cpp` are not necessary but we can rely on linker to strip them off. ## Current CI job test coverage update Created a new CI job `linux-xenial-py3-clang5-mobile-lightweight-dispatch-build` that enables the following build options: * `USE_LIGHTWEIGHT_DISPATCH=1` * `BUILD_LITE_INTERPRETER=1` * `STATIC_DISPATCH_BACKEND=CPU` This job triggers `test/mobile/lightweight_dispatch/build.sh` and builds `libtorch`. Then the script runs C++ tests written in `test_lightweight_dispatch.cpp` and `test_codegen_unboxing.cpp`. Recent commits added tests to cover as many C++ argument type as possible: in `build.sh` we installed PyTorch Python API so that we can export test models in `tests_setup.py`. Then we run C++ test binary to run these models on lightweight dispatch enabled runtime. Pull Request resolved: #69881 Reviewed By: iseeyuan Differential Revision: D33692299 Pulled By: larryliu0820 fbshipit-source-id: 211e59f2364100703359b4a3d2ab48ca5155a023
Summary: RFC: pytorch/rfcs#40 This PR (re)introduces python codegen for unboxing wrappers. Given an entry of `native_functions.yaml` the codegen should be able to generate the corresponding C++ code to convert ivalues from the stack to their proper types. To trigger the codegen, run ``` tools/jit/gen_unboxing.py -d cg/torch/share/ATen ``` Merged changes on CI test. In #71782 I added an e2e test for static dispatch + codegen unboxing. The test exports a mobile model of mobilenetv2, load and run it on a new binary for lite interpreter: `test/mobile/custom_build/lite_predictor.cpp`. ## Lite predictor build specifics 1. Codegen: `gen.py` generates `RegisterCPU.cpp` and `RegisterSchema.cpp`. Now with this PR, once `static_dispatch` mode is enabled, `gen.py` will not generate `TORCH_LIBRARY` API calls in those cpp files, hence avoids interaction with the dispatcher. Once `USE_LIGHTWEIGHT_DISPATCH` is turned on, `cmake/Codegen.cmake` calls `gen_unboxing.py` which generates `UnboxingFunctions.h`, `UnboxingFunctions_[0-4].cpp` and `RegisterCodegenUnboxedKernels_[0-4].cpp`. 2. Build: `USE_LIGHTWEIGHT_DISPATCH` adds generated sources into `all_cpu_cpp` in `aten/src/ATen/CMakeLists.txt`. All other files remain unchanged. In reality all the `Operators_[0-4].cpp` are not necessary but we can rely on linker to strip them off. ## Current CI job test coverage update Created a new CI job `linux-xenial-py3-clang5-mobile-lightweight-dispatch-build` that enables the following build options: * `USE_LIGHTWEIGHT_DISPATCH=1` * `BUILD_LITE_INTERPRETER=1` * `STATIC_DISPATCH_BACKEND=CPU` This job triggers `test/mobile/lightweight_dispatch/build.sh` and builds `libtorch`. Then the script runs C++ tests written in `test_lightweight_dispatch.cpp` and `test_codegen_unboxing.cpp`. Recent commits added tests to cover as many C++ argument type as possible: in `build.sh` we installed PyTorch Python API so that we can export test models in `tests_setup.py`. Then we run C++ test binary to run these models on lightweight dispatch enabled runtime. Pull Request resolved: #69881 Reviewed By: iseeyuan Differential Revision: D33692299 Pulled By: larryliu0820 fbshipit-source-id: 211e59f2364100703359b4a3d2ab48ca5155a023 (cherry picked from commit 58e1c9a)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm I think there's a couple of things to follow up on, but maybe better to just get this version in.
Summary: RFC: pytorch/rfcs#40 This PR (re)introduces python codegen for unboxing wrappers. Given an entry of `native_functions.yaml` the codegen should be able to generate the corresponding C++ code to convert ivalues from the stack to their proper types. To trigger the codegen, run ``` tools/jit/gen_unboxing.py -d cg/torch/share/ATen ``` Merged changes on CI test. In pytorch/pytorch#71782 I added an e2e test for static dispatch + codegen unboxing. The test exports a mobile model of mobilenetv2, load and run it on a new binary for lite interpreter: `test/mobile/custom_build/lite_predictor.cpp`. ## Lite predictor build specifics 1. Codegen: `gen.py` generates `RegisterCPU.cpp` and `RegisterSchema.cpp`. Now with this PR, once `static_dispatch` mode is enabled, `gen.py` will not generate `TORCH_LIBRARY` API calls in those cpp files, hence avoids interaction with the dispatcher. Once `USE_LIGHTWEIGHT_DISPATCH` is turned on, `cmake/Codegen.cmake` calls `gen_unboxing.py` which generates `UnboxingFunctions.h`, `UnboxingFunctions_[0-4].cpp` and `RegisterCodegenUnboxedKernels_[0-4].cpp`. 2. Build: `USE_LIGHTWEIGHT_DISPATCH` adds generated sources into `all_cpu_cpp` in `aten/src/ATen/CMakeLists.txt`. All other files remain unchanged. In reality all the `Operators_[0-4].cpp` are not necessary but we can rely on linker to strip them off. ## Current CI job test coverage update Created a new CI job `linux-xenial-py3-clang5-mobile-lightweight-dispatch-build` that enables the following build options: * `USE_LIGHTWEIGHT_DISPATCH=1` * `BUILD_LITE_INTERPRETER=1` * `STATIC_DISPATCH_BACKEND=CPU` This job triggers `test/mobile/lightweight_dispatch/build.sh` and builds `libtorch`. Then the script runs C++ tests written in `test_lightweight_dispatch.cpp` and `test_codegen_unboxing.cpp`. Recent commits added tests to cover as many C++ argument type as possible: in `build.sh` we installed PyTorch Python API so that we can export test models in `tests_setup.py`. Then we run C++ test binary to run these models on lightweight dispatch enabled runtime. Pull Request resolved: pytorch/pytorch#69881 Reviewed By: iseeyuan Differential Revision: D33692299 Pulled By: larryliu0820 fbshipit-source-id: 211e59f2364100703359b4a3d2ab48ca5155a023 (cherry picked from commit 58e1c9a25e3d1b5b656282cf3ac2f548d98d530b)
Summary: RFC: pytorch/rfcs#40 This PR (re)introduces python codegen for unboxing wrappers. Given an entry of `native_functions.yaml` the codegen should be able to generate the corresponding C++ code to convert ivalues from the stack to their proper types. To trigger the codegen, run ``` tools/jit/gen_unboxing.py -d cg/torch/share/ATen ``` Merged changes on CI test. In pytorch/pytorch#71782 I added an e2e test for static dispatch + codegen unboxing. The test exports a mobile model of mobilenetv2, load and run it on a new binary for lite interpreter: `test/mobile/custom_build/lite_predictor.cpp`. ## Lite predictor build specifics 1. Codegen: `gen.py` generates `RegisterCPU.cpp` and `RegisterSchema.cpp`. Now with this PR, once `static_dispatch` mode is enabled, `gen.py` will not generate `TORCH_LIBRARY` API calls in those cpp files, hence avoids interaction with the dispatcher. Once `USE_LIGHTWEIGHT_DISPATCH` is turned on, `cmake/Codegen.cmake` calls `gen_unboxing.py` which generates `UnboxingFunctions.h`, `UnboxingFunctions_[0-4].cpp` and `RegisterCodegenUnboxedKernels_[0-4].cpp`. 2. Build: `USE_LIGHTWEIGHT_DISPATCH` adds generated sources into `all_cpu_cpp` in `aten/src/ATen/CMakeLists.txt`. All other files remain unchanged. In reality all the `Operators_[0-4].cpp` are not necessary but we can rely on linker to strip them off. ## Current CI job test coverage update Created a new CI job `linux-xenial-py3-clang5-mobile-lightweight-dispatch-build` that enables the following build options: * `USE_LIGHTWEIGHT_DISPATCH=1` * `BUILD_LITE_INTERPRETER=1` * `STATIC_DISPATCH_BACKEND=CPU` This job triggers `test/mobile/lightweight_dispatch/build.sh` and builds `libtorch`. Then the script runs C++ tests written in `test_lightweight_dispatch.cpp` and `test_codegen_unboxing.cpp`. Recent commits added tests to cover as many C++ argument type as possible: in `build.sh` we installed PyTorch Python API so that we can export test models in `tests_setup.py`. Then we run C++ test binary to run these models on lightweight dispatch enabled runtime. Pull Request resolved: pytorch/pytorch#69881 Reviewed By: iseeyuan Differential Revision: D33692299 Pulled By: larryliu0820 fbshipit-source-id: 211e59f2364100703359b4a3d2ab48ca5155a023 (cherry picked from commit 58e1c9a25e3d1b5b656282cf3ac2f548d98d530b)
Summary: Pull Request resolved: pytorch#74069 RFC: pytorch/rfcs#40 In pytorch#69881 we added the ability to generate codegen unboxing source files. Notice that the generated code to register an operator looks like this: ``` // aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor OperatorGenerator( TORCH_SELECTIVE_SCHEMA("aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor"), [](Stack & stack) { RECORD_FUNCTION("add", std::vector<c10::IValue>()); at::unboxing::add_Tensor(stack); }, aliasAnalysisFromSchema() ), ``` However, this means we have to parse the schema and get back arguments with default values in static init time. As written in the RFC, there's a more performant option: providing these arguments with default values using codegen, then we don't have to do expensive regex pattern matching in parsing. Here's how it looks like: ``` // aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor OperatorGenerator( "aten::add", "Tensor", { c10::Argument("self", nullptr, c10::nullopt, c10::IValue(c10::nullopt)), c10::Argument("other", nullptr, c10::nullopt, c10::IValue(c10::nullopt)), c10::Argument("alpha", nullptr, c10::nullopt, c10::IValue(1)) }, { c10::Argument("") }, [](Stack & stack) { RECORD_FUNCTION("add", std::vector<c10::IValue>()); at::unboxing::add_Tensor(stack); }, aliasAnalysisFromSchema() ), ``` We also added corresponding APIs in `operator.h` to take in the arguments. Test Plan: Rely on CI Reviewed By: kimishpatel Differential Revision: D33077733 fbshipit-source-id: db6fcb3066ba352cb338a018324c02c32d67b941
Summary: Pull Request resolved: #74069 RFC: pytorch/rfcs#40 In #69881 we added the ability to generate codegen unboxing source files. Notice that the generated code to register an operator looks like this: ``` // aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor OperatorGenerator( TORCH_SELECTIVE_SCHEMA("aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor"), [](Stack & stack) { RECORD_FUNCTION("add", std::vector<c10::IValue>()); at::unboxing::add_Tensor(stack); }, aliasAnalysisFromSchema() ), ``` However, this means we have to parse the schema and get back arguments with default values in static init time. As written in the RFC, there's a more performant option: providing these arguments with default values using codegen, then we don't have to do expensive regex pattern matching in parsing. Here's how it looks like: ``` // aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor OperatorGenerator( "aten::add", "Tensor", { c10::Argument("self", nullptr, c10::nullopt, c10::IValue(c10::nullopt)), c10::Argument("other", nullptr, c10::nullopt, c10::IValue(c10::nullopt)), c10::Argument("alpha", nullptr, c10::nullopt, c10::IValue(1)) }, { c10::Argument("") }, [](Stack & stack) { RECORD_FUNCTION("add", std::vector<c10::IValue>()); at::unboxing::add_Tensor(stack); }, aliasAnalysisFromSchema() ), ``` We also added corresponding APIs in `operator.h` to take in the arguments. Test Plan: Rely on CI Reviewed By: kimishpatel Differential Revision: D33077733 fbshipit-source-id: e7f13a2f162c70d4e506b4f64cdbb7afec39f4e6
Summary: Pull Request resolved: #74069 RFC: pytorch/rfcs#40 In #69881 we added the ability to generate codegen unboxing source files. Notice that the generated code to register an operator looks like this: ``` // aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor OperatorGenerator( TORCH_SELECTIVE_SCHEMA("aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor"), [](Stack & stack) { RECORD_FUNCTION("add", std::vector<c10::IValue>()); at::unboxing::add_Tensor(stack); }, aliasAnalysisFromSchema() ), ``` However, this means we have to parse the schema and get back arguments with default values in static init time. As written in the RFC, there's a more performant option: providing these arguments with default values using codegen, then we don't have to do expensive regex pattern matching in parsing. Here's how it looks like: ``` // aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor OperatorGenerator( "aten::add", "Tensor", { c10::Argument("self", nullptr, c10::nullopt, c10::IValue(c10::nullopt)), c10::Argument("other", nullptr, c10::nullopt, c10::IValue(c10::nullopt)), c10::Argument("alpha", nullptr, c10::nullopt, c10::IValue(1)) }, { c10::Argument("") }, [](Stack & stack) { RECORD_FUNCTION("add", std::vector<c10::IValue>()); at::unboxing::add_Tensor(stack); }, aliasAnalysisFromSchema() ), ``` We also added corresponding APIs in `operator.h` to take in the arguments. Test Plan: Rely on CI Reviewed By: kimishpatel Differential Revision: D33077733 fbshipit-source-id: e7f13a2f162c70d4e506b4f64cdbb7afec39f4e6 (cherry picked from commit 08a0769)
Stack from ghstack: