[PyTorch] Add RFC for lightweight dispatch #40

larryliu0820 · 2022-02-16T00:30:12Z

Stack from ghstack:

-> [PyTorch] Add RFC for lightweight dispatch #40

[ghstack-poisoned]

ghstack-source-id: 03ff616 Pull Request resolved: #40

[ghstack-poisoned]

ghstack-source-id: d88f7a2 Pull Request resolved: #40

RFC-0020-Lightweight-Dispatch.md

raziel

Thanks!

raziel · 2022-02-16T19:00:01Z

RFC-0020-Lightweight-Dispatch.md

@@ -0,0 +1,291 @@
+# Context
+With recent developments in PyTorch Edge applications on embedded systems and resource limited devices, the issue of op registration/dispatching runtime overhead and build-time complexity has risen to the fore.


"PyTorch Edge" is not public yet, so I'd change to Mobile.

Or just rephrase it like:

"As PyTorch aims to support a wider set of devices and use cases, we see the need for a lighter version of our operator dispatching mechanism".

raziel · 2022-02-16T19:04:05Z

RFC-0020-Lightweight-Dispatch.md

+
+# Motivation
+* **Performance**
+  * For recent use cases of Edge interpreter, we need to satisfy more and more strict initialization latency requirements, where analysis shows op registration contributes to a large portion of it.


Rename edge>mobile

raziel · 2022-02-16T19:05:41Z

RFC-0020-Lightweight-Dispatch.md

+  * Also with static dispatch, we don’t have to register all of the ops into the JIT op registry, which saves runtime memory usage and further reduces static initialization time.
+  * It is possible to avoid dispatching at runtime.
+* **Modularity and binary size**
+  * Currently the mobile runtime consists of both JIT op registry and c10 dispatcher. This project will make it possible to not depend on the c10 dispatcher, delivering a cleaner runtime library.


Suggested change

* Currently the mobile runtime consists of both JIT op registry and c10 dispatcher. This project will make it possible to not depend on the c10 dispatcher, delivering a cleaner runtime library.

* Currently the mobile runtime consists of both JIT op registry and c10 dispatcher. This project will make it possible to not depend on the c10 dispatcher (opt-in), delivering a cleaner runtime library.

raziel · 2022-02-16T19:06:23Z

RFC-0020-Lightweight-Dispatch.md

+  * Currently the mobile runtime consists of both JIT op registry and c10 dispatcher. This project will make it possible to not depend on the c10 dispatcher, delivering a cleaner runtime library.
+  * This project creates an opportunity to reduce binary size by getting rid of the dispatcher and enables further size optimization on unboxing wrappers.
+* **Ability to incorporate custom implementation of ATen ops**
+  * For some of the edge use cases, we need to support custom implementations of ATen ops. With an extra op registration path such as codegen unboxing it is easier to hookup ops with custom native functions.


Suggested change

* For some of the edge use cases, we need to support custom implementations of ATen ops. With an extra op registration path such as codegen unboxing it is easier to hookup ops with custom native functions.

* For some of the mobile use cases, we need to support custom implementations of ATen ops. With an extra op registration path such as codegen unboxing, it is easier to hookup ops with custom native functions.

raziel · 2022-02-16T19:07:44Z

RFC-0020-Lightweight-Dispatch.md

+![codegen drawio](https://user-images.githubusercontent.com/8188269/154173938-baad9ee6-0e3c-40bb-a9d6-649137e3f3f9.png)  
+
+
+Currently the lite interpreter (or Edge runtime) registers all ATen ops into the dispatcher and some other ops into the JIT op registry. At model inference time the interpreter will look for the operator name in the JIT op registry first, if not found then it will look into the dispatcher. This proposal **adds a build flavor that moves these ATen ops from dispatcher to JIT op registry** so that it’s easier to optimize (e.g., avoid schema parsing) and can also reduce dependencies. 


Suggested change

Currently the lite interpreter (or Edge runtime) registers all ATen ops into the dispatcher and some other ops into the JIT op registry. At model inference time the interpreter will look for the operator name in the JIT op registry first, if not found then it will look into the dispatcher. This proposal **adds a build flavor that moves these ATen ops from dispatcher to JIT op registry** so that it’s easier to optimize (e.g., avoid schema parsing) and can also reduce dependencies.

Currently the mobile interpreter registers all ATen ops into the dispatcher and some other ops into the JIT op registry. At model inference time, the interpreter will look for the operator name in the JIT op registry first, if not found then it will look into the dispatcher. This proposal **adds a build flavor that moves these ATen ops from dispatcher to JIT op registry** so that it’s easier to optimize (e.g. avoid schema parsing) and reduce dependencies.

raziel · 2022-02-16T19:13:13Z

RFC-0020-Lightweight-Dispatch.md

+* **JIT type -> C++ type**. This is necessary for some of the optional C++ types, e.g., we need to map `int` to `int64_t` for the last argument in the example.
+    * This is already done in [types.py](https://github.com/pytorch/pytorch/blob/master/tools/codegen/api/types.py), and we need to integrate it into our new codegen.
+* **JIT type -> IValue to basic type conversion C++ code.** E.g., the first argument of this operator: `Tensor(a) self` needs to be translated to: `(std::move(peek(stack, 0, 4))).toTensor()`
+    * IValue provides APIs to directly convert an ivalue to these basic types. See [ivalue_inl.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/ivalue_inl.h#L1453-L1493)


Suggested change

* IValue provides APIs to directly convert an ivalue to these basic types. See [ivalue_inl.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/ivalue_inl.h#L1453-L1493)

* IValue provides APIs to directly convert an IValue to these basic types. See [ivalue_inl.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/ivalue_inl.h#L1453-L1493)

raziel · 2022-02-16T19:13:31Z

RFC-0020-Lightweight-Dispatch.md

+    * This is already done in [types.py](https://github.com/pytorch/pytorch/blob/master/tools/codegen/api/types.py), and we need to integrate it into our new codegen.
+* **JIT type -> IValue to basic type conversion C++ code.** E.g., the first argument of this operator: `Tensor(a) self` needs to be translated to: `(std::move(peek(stack, 0, 4))).toTensor()`
+    * IValue provides APIs to directly convert an ivalue to these basic types. See [ivalue_inl.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/ivalue_inl.h#L1453-L1493)
+    * Here’s a [list](#bookmark=id.deyvpbsb5yel) of all the JIT types appearing in native_functions.yaml, most of them can be converted using ivalue’s API.


Suggested change

* Here’s a [list](#bookmark=id.deyvpbsb5yel) of all the JIT types appearing in native_functions.yaml, most of them can be converted using ivalue’s API.

* Here’s a [list](#bookmark=id.deyvpbsb5yel) of all the JIT types appearing in native_functions.yaml, most of them can be converted using IValue’s API.

raziel · 2022-02-16T19:15:36Z

RFC-0020-Lightweight-Dispatch.md

+* For the scenario of adding a new operator (not to `native_functions.yaml`), we need to provide clear guidance to add it to the JIT op registry as well, otherwise JIT execution will break.
+* We can add tests on the mobile build for the sake of coverage.
+
+For OSS mobile integration, we will need to have a new build flavor to switch between c10 dispatcher vs jit op registry. This new flavor will include codegen source files (`CodegenFunctions.h, CodegenFunctions.cpp, CodegenUnboxingWrappers.cpp`) instead of existing dispatcher related source files: `Operators.cpp`, `RegisterSchema.cpp `etc, similar to the internal build configuration.


let's clarify here again that this new path will be opt in

RFC-0020-Lightweight-Dispatch.md

iseeyuan · 2022-02-17T04:43:18Z

RFC-0020-Lightweight-Dispatch.md

+    * This is already done in [types.py](https://github.com/pytorch/pytorch/blob/master/tools/codegen/api/types.py), and we need to integrate it into our new codegen.
+* **JIT type -> IValue to basic type conversion C++ code.** E.g., the first argument of this operator: `Tensor(a) self` needs to be translated to: `(std::move(peek(stack, 0, 4))).toTensor()`
+    * IValue provides APIs to directly convert an IValue to these basic types. See [ivalue_inl.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/ivalue_inl.h#L1453-L1493)
+    * Here’s a [list](#bookmark=id.deyvpbsb5yel) of all the JIT types appearing in native_functions.yaml, most of them can be converted using IValue’s API.


Is the [list](#bookmark=id.deyvpbsb5yel) valid?

Let me attach the list as appendix.

iseeyuan · 2022-02-17T04:50:50Z

RFC-0020-Lightweight-Dispatch.md

+
+#### Codegen source file details
+
+With the logic from the previous section, we should be able to wrap the code into a function pointer and register it into [torch::jit::OperatorRegistry](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/operator.cpp#L19). 


The size overhead of operator.cpp is around 10 KB. It's much smaller than torch library, but we still need to watch the size for embedded use cases.

I'm surprised it's that small. The file looks huge to me.

iseeyuan · 2022-02-17T05:01:19Z

RFC-0020-Lightweight-Dispatch.md

+  structured: True
+  structured_inherits: TensorIteratorBase
+  dispatch:
+    CPU, CUDA: acosh_out


QuantizedCPU is not listed as a dispatch (it's listed in relu_). Would code-gen consider the dispath listed here? If so, we don't need to generate QuantizedCPU for this op.

Right this may be a bad example, but on high level we will look at this dispatch keys and generate switch cases.

iseeyuan · 2022-02-17T05:02:59Z

RFC-0020-Lightweight-Dispatch.md

+    DispatchKeySet _dk_set = c10::detail::multi_dispatch_key_set(tensor);
+    DispatchKey _dk = _dk_set.highestPriorityBackendTypeId();
+    switch (_dk) {
+    case DispatchKey::CPU:


If it's runtime-control flow, does it mean that we need to build both the kernels, even if we just use one? We could use tracing based selective build, but tracing would still be required.

We should trace backend AOT to avoid building both kernels. I think Dhruv has a proposal of doing so.

Summary: RFC: pytorch/rfcs#40 This PR (re)introduces python codegen for unboxing wrappers. Given an entry of `native_functions.yaml` the codegen should be able to generate the corresponding C++ code to convert ivalues from the stack to their proper types. To trigger the codegen, run ``` tools/jit/gen_unboxing.py -d cg/torch/share/ATen ``` Merged changes on CI test. In #71782 I added an e2e test for static dispatch + codegen unboxing. The test exports a mobile model of mobilenetv2, load and run it on a new binary for lite interpreter: `test/mobile/custom_build/lite_predictor.cpp`. ## Lite predictor build specifics 1. Codegen: `gen.py` generates `RegisterCPU.cpp` and `RegisterSchema.cpp`. Now with this PR, once `static_dispatch` mode is enabled, `gen.py` will not generate `TORCH_LIBRARY` API calls in those cpp files, hence avoids interaction with the dispatcher. Once `USE_LIGHTWEIGHT_DISPATCH` is turned on, `cmake/Codegen.cmake` calls `gen_unboxing.py` which generates `UnboxingFunctions.h`, `UnboxingFunctions_[0-4].cpp` and `RegisterCodegenUnboxedKernels_[0-4].cpp`. 2. Build: `USE_LIGHTWEIGHT_DISPATCH` adds generated sources into `all_cpu_cpp` in `aten/src/ATen/CMakeLists.txt`. All other files remain unchanged. In reality all the `Operators_[0-4].cpp` are not necessary but we can rely on linker to strip them off. ## Current CI job test coverage update Created a new CI job `linux-xenial-py3-clang5-mobile-lightweight-dispatch-build` that enables the following build options: * `USE_LIGHTWEIGHT_DISPATCH=1` * `BUILD_LITE_INTERPRETER=1` * `STATIC_DISPATCH_BACKEND=CPU` This job triggers `test/mobile/lightweight_dispatch/build.sh` and builds `libtorch`. Then the script runs C++ tests written in `test_lightweight_dispatch.cpp` and `test_codegen_unboxing.cpp`. Recent commits added tests to cover as many C++ argument type as possible: in `build.sh` we installed PyTorch Python API so that we can export test models in `tests_setup.py`. Then we run C++ test binary to run these models on lightweight dispatch enabled runtime. Pull Request resolved: #69881 Reviewed By: iseeyuan Differential Revision: D33692299 Pulled By: larryliu0820 fbshipit-source-id: 211e59f2364100703359b4a3d2ab48ca5155a023

Summary: RFC: pytorch/rfcs#40 This PR (re)introduces python codegen for unboxing wrappers. Given an entry of `native_functions.yaml` the codegen should be able to generate the corresponding C++ code to convert ivalues from the stack to their proper types. To trigger the codegen, run ``` tools/jit/gen_unboxing.py -d cg/torch/share/ATen ``` Merged changes on CI test. In #71782 I added an e2e test for static dispatch + codegen unboxing. The test exports a mobile model of mobilenetv2, load and run it on a new binary for lite interpreter: `test/mobile/custom_build/lite_predictor.cpp`. ## Lite predictor build specifics 1. Codegen: `gen.py` generates `RegisterCPU.cpp` and `RegisterSchema.cpp`. Now with this PR, once `static_dispatch` mode is enabled, `gen.py` will not generate `TORCH_LIBRARY` API calls in those cpp files, hence avoids interaction with the dispatcher. Once `USE_LIGHTWEIGHT_DISPATCH` is turned on, `cmake/Codegen.cmake` calls `gen_unboxing.py` which generates `UnboxingFunctions.h`, `UnboxingFunctions_[0-4].cpp` and `RegisterCodegenUnboxedKernels_[0-4].cpp`. 2. Build: `USE_LIGHTWEIGHT_DISPATCH` adds generated sources into `all_cpu_cpp` in `aten/src/ATen/CMakeLists.txt`. All other files remain unchanged. In reality all the `Operators_[0-4].cpp` are not necessary but we can rely on linker to strip them off. ## Current CI job test coverage update Created a new CI job `linux-xenial-py3-clang5-mobile-lightweight-dispatch-build` that enables the following build options: * `USE_LIGHTWEIGHT_DISPATCH=1` * `BUILD_LITE_INTERPRETER=1` * `STATIC_DISPATCH_BACKEND=CPU` This job triggers `test/mobile/lightweight_dispatch/build.sh` and builds `libtorch`. Then the script runs C++ tests written in `test_lightweight_dispatch.cpp` and `test_codegen_unboxing.cpp`. Recent commits added tests to cover as many C++ argument type as possible: in `build.sh` we installed PyTorch Python API so that we can export test models in `tests_setup.py`. Then we run C++ test binary to run these models on lightweight dispatch enabled runtime. Pull Request resolved: #69881 Reviewed By: iseeyuan Differential Revision: D33692299 Pulled By: larryliu0820 fbshipit-source-id: 211e59f2364100703359b4a3d2ab48ca5155a023 (cherry picked from commit 58e1c9a)

raziel

lgtm I think there's a couple of things to follow up on, but maybe better to just get this version in.

Summary: RFC: pytorch/rfcs#40 This PR (re)introduces python codegen for unboxing wrappers. Given an entry of `native_functions.yaml` the codegen should be able to generate the corresponding C++ code to convert ivalues from the stack to their proper types. To trigger the codegen, run ``` tools/jit/gen_unboxing.py -d cg/torch/share/ATen ``` Merged changes on CI test. In pytorch/pytorch#71782 I added an e2e test for static dispatch + codegen unboxing. The test exports a mobile model of mobilenetv2, load and run it on a new binary for lite interpreter: `test/mobile/custom_build/lite_predictor.cpp`. ## Lite predictor build specifics 1. Codegen: `gen.py` generates `RegisterCPU.cpp` and `RegisterSchema.cpp`. Now with this PR, once `static_dispatch` mode is enabled, `gen.py` will not generate `TORCH_LIBRARY` API calls in those cpp files, hence avoids interaction with the dispatcher. Once `USE_LIGHTWEIGHT_DISPATCH` is turned on, `cmake/Codegen.cmake` calls `gen_unboxing.py` which generates `UnboxingFunctions.h`, `UnboxingFunctions_[0-4].cpp` and `RegisterCodegenUnboxedKernels_[0-4].cpp`. 2. Build: `USE_LIGHTWEIGHT_DISPATCH` adds generated sources into `all_cpu_cpp` in `aten/src/ATen/CMakeLists.txt`. All other files remain unchanged. In reality all the `Operators_[0-4].cpp` are not necessary but we can rely on linker to strip them off. ## Current CI job test coverage update Created a new CI job `linux-xenial-py3-clang5-mobile-lightweight-dispatch-build` that enables the following build options: * `USE_LIGHTWEIGHT_DISPATCH=1` * `BUILD_LITE_INTERPRETER=1` * `STATIC_DISPATCH_BACKEND=CPU` This job triggers `test/mobile/lightweight_dispatch/build.sh` and builds `libtorch`. Then the script runs C++ tests written in `test_lightweight_dispatch.cpp` and `test_codegen_unboxing.cpp`. Recent commits added tests to cover as many C++ argument type as possible: in `build.sh` we installed PyTorch Python API so that we can export test models in `tests_setup.py`. Then we run C++ test binary to run these models on lightweight dispatch enabled runtime. Pull Request resolved: pytorch/pytorch#69881 Reviewed By: iseeyuan Differential Revision: D33692299 Pulled By: larryliu0820 fbshipit-source-id: 211e59f2364100703359b4a3d2ab48ca5155a023 (cherry picked from commit 58e1c9a25e3d1b5b656282cf3ac2f548d98d530b)

Summary: Pull Request resolved: pytorch#74069 RFC: pytorch/rfcs#40 In pytorch#69881 we added the ability to generate codegen unboxing source files. Notice that the generated code to register an operator looks like this: ``` // aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor OperatorGenerator( TORCH_SELECTIVE_SCHEMA("aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor"), [](Stack & stack) { RECORD_FUNCTION("add", std::vector<c10::IValue>()); at::unboxing::add_Tensor(stack); }, aliasAnalysisFromSchema() ), ``` However, this means we have to parse the schema and get back arguments with default values in static init time. As written in the RFC, there's a more performant option: providing these arguments with default values using codegen, then we don't have to do expensive regex pattern matching in parsing. Here's how it looks like: ``` // aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor OperatorGenerator( "aten::add", "Tensor", { c10::Argument("self", nullptr, c10::nullopt, c10::IValue(c10::nullopt)), c10::Argument("other", nullptr, c10::nullopt, c10::IValue(c10::nullopt)), c10::Argument("alpha", nullptr, c10::nullopt, c10::IValue(1)) }, { c10::Argument("") }, [](Stack & stack) { RECORD_FUNCTION("add", std::vector<c10::IValue>()); at::unboxing::add_Tensor(stack); }, aliasAnalysisFromSchema() ), ``` We also added corresponding APIs in `operator.h` to take in the arguments. Test Plan: Rely on CI Reviewed By: kimishpatel Differential Revision: D33077733 fbshipit-source-id: db6fcb3066ba352cb338a018324c02c32d67b941

Summary: Pull Request resolved: #74069 RFC: pytorch/rfcs#40 In #69881 we added the ability to generate codegen unboxing source files. Notice that the generated code to register an operator looks like this: ``` // aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor OperatorGenerator( TORCH_SELECTIVE_SCHEMA("aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor"), [](Stack & stack) { RECORD_FUNCTION("add", std::vector<c10::IValue>()); at::unboxing::add_Tensor(stack); }, aliasAnalysisFromSchema() ), ``` However, this means we have to parse the schema and get back arguments with default values in static init time. As written in the RFC, there's a more performant option: providing these arguments with default values using codegen, then we don't have to do expensive regex pattern matching in parsing. Here's how it looks like: ``` // aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor OperatorGenerator( "aten::add", "Tensor", { c10::Argument("self", nullptr, c10::nullopt, c10::IValue(c10::nullopt)), c10::Argument("other", nullptr, c10::nullopt, c10::IValue(c10::nullopt)), c10::Argument("alpha", nullptr, c10::nullopt, c10::IValue(1)) }, { c10::Argument("") }, [](Stack & stack) { RECORD_FUNCTION("add", std::vector<c10::IValue>()); at::unboxing::add_Tensor(stack); }, aliasAnalysisFromSchema() ), ``` We also added corresponding APIs in `operator.h` to take in the arguments. Test Plan: Rely on CI Reviewed By: kimishpatel Differential Revision: D33077733 fbshipit-source-id: e7f13a2f162c70d4e506b4f64cdbb7afec39f4e6

Summary: Pull Request resolved: #74069 RFC: pytorch/rfcs#40 In #69881 we added the ability to generate codegen unboxing source files. Notice that the generated code to register an operator looks like this: ``` // aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor OperatorGenerator( TORCH_SELECTIVE_SCHEMA("aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor"), [](Stack & stack) { RECORD_FUNCTION("add", std::vector<c10::IValue>()); at::unboxing::add_Tensor(stack); }, aliasAnalysisFromSchema() ), ``` However, this means we have to parse the schema and get back arguments with default values in static init time. As written in the RFC, there's a more performant option: providing these arguments with default values using codegen, then we don't have to do expensive regex pattern matching in parsing. Here's how it looks like: ``` // aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor OperatorGenerator( "aten::add", "Tensor", { c10::Argument("self", nullptr, c10::nullopt, c10::IValue(c10::nullopt)), c10::Argument("other", nullptr, c10::nullopt, c10::IValue(c10::nullopt)), c10::Argument("alpha", nullptr, c10::nullopt, c10::IValue(1)) }, { c10::Argument("") }, [](Stack & stack) { RECORD_FUNCTION("add", std::vector<c10::IValue>()); at::unboxing::add_Tensor(stack); }, aliasAnalysisFromSchema() ), ``` We also added corresponding APIs in `operator.h` to take in the arguments. Test Plan: Rely on CI Reviewed By: kimishpatel Differential Revision: D33077733 fbshipit-source-id: e7f13a2f162c70d4e506b4f64cdbb7afec39f4e6 (cherry picked from commit 08a0769)

[PyTorch] Add RFC for lightweight dispatch

d501d70

[ghstack-poisoned]

larryliu0820 added a commit that referenced this pull request Feb 16, 2022

[PyTorch] Add RFC for lightweight dispatch

a1622af

ghstack-source-id: 03ff616 Pull Request resolved: #40

facebook-github-bot added the cla signed label Feb 16, 2022

Update on "[PyTorch] Add RFC for lightweight dispatch"

1d80689

[ghstack-poisoned]

larryliu0820 added a commit that referenced this pull request Feb 16, 2022

[PyTorch] Add RFC for lightweight dispatch

40dc072

ghstack-source-id: d88f7a2 Pull Request resolved: #40

larryliu0820 added 3 commits February 15, 2022 16:51

Remove internal links

f1cfef9

Add spaces

d8ef666

Add spaces

b6dda8f

larryliu0820 requested review from iseeyuan, ezyang, raziel and cccclai February 16, 2022 00:55

larryliu0820 added the commenting RFC is open for discussions label Feb 16, 2022

larryliu0820 mentioned this pull request Feb 16, 2022

[PyTorch] Add codegen unboxing ability pytorch/pytorch#69881

Closed

albanD requested a review from bdhirsh February 16, 2022 16:06

iseeyuan reviewed Feb 16, 2022

View reviewed changes

RFC-0020-Lightweight-Dispatch.md Show resolved Hide resolved

raziel reviewed Feb 16, 2022

View reviewed changes

Address comments

d47c888

iseeyuan reviewed Feb 17, 2022

View reviewed changes

Add IValue type list

e862e25

raziel approved these changes Mar 2, 2022

View reviewed changes

larryliu0820 merged commit 2f8e3ca into gh/larryliu0820/1/base Mar 2, 2022

larryliu0820 added the accepted RFC proposal is accepted in principle label Mar 2, 2022

larryliu0820 mentioned this pull request Mar 10, 2022

[PyTorch] Avoid schema parsing in lightweight dispatch pytorch/pytorch#74069

Closed

facebook-github-bot deleted the gh/larryliu0820/1/head branch April 2, 2022 14:17

		@@ -0,0 +1,291 @@
		# Context
		With recent developments in PyTorch Edge applications on embedded systems and resource limited devices, the issue of op registration/dispatching runtime overhead and build-time complexity has risen to the fore.

	* Currently the mobile runtime consists of both JIT op registry and c10 dispatcher. This project will make it possible to not depend on the c10 dispatcher, delivering a cleaner runtime library.
	* Currently the mobile runtime consists of both JIT op registry and c10 dispatcher. This project will make it possible to not depend on the c10 dispatcher (opt-in), delivering a cleaner runtime library.

	* For some of the edge use cases, we need to support custom implementations of ATen ops. With an extra op registration path such as codegen unboxing it is easier to hookup ops with custom native functions.
	* For some of the mobile use cases, we need to support custom implementations of ATen ops. With an extra op registration path such as codegen unboxing, it is easier to hookup ops with custom native functions.

		![codegen drawio](https://user-images.githubusercontent.com/8188269/154173938-baad9ee6-0e3c-40bb-a9d6-649137e3f3f9.png)


		Currently the lite interpreter (or Edge runtime) registers all ATen ops into the dispatcher and some other ops into the JIT op registry. At model inference time the interpreter will look for the operator name in the JIT op registry first, if not found then it will look into the dispatcher. This proposal adds a build flavor that moves these ATen ops from dispatcher to JIT op registry so that it’s easier to optimize (e.g., avoid schema parsing) and can also reduce dependencies.

	* IValue provides APIs to directly convert an ivalue to these basic types. See [ivalue_inl.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/ivalue_inl.h#L1453-L1493)
	* IValue provides APIs to directly convert an IValue to these basic types. See [ivalue_inl.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/ivalue_inl.h#L1453-L1493)

	* Here’s a [list](#bookmark=id.deyvpbsb5yel) of all the JIT types appearing in native_functions.yaml, most of them can be converted using ivalue’s API.
	* Here’s a [list](#bookmark=id.deyvpbsb5yel) of all the JIT types appearing in native_functions.yaml, most of them can be converted using IValue’s API.


		#### Codegen source file details

		With the logic from the previous section, we should be able to wrap the code into a function pointer and register it into [torch::jit::OperatorRegistry](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/operator.cpp#L19).

[PyTorch] Add RFC for lightweight dispatch #40

[PyTorch] Add RFC for lightweight dispatch #40

Uh oh!

Conversation

larryliu0820 commented Feb 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

raziel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raziel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

larryliu0820 commented Feb 16, 2022 •

edited

Loading