Skip to content

Commit d127912

Browse files
LeiWang1999BUG1989
andauthored
Release/backend opendla (#1061)
* Organizing OpenDLA files, edit cmake to debug by step * link compiler libs * fix protobuf link error * consummation of TensorMap and PreRun * append pooling op disc, fix rtti issue * update the date type of ir_tensor * loadable data generate done * libruntime map success, but data mapping didnot work. * data mapping work, single channel worked correctly * pooling op mapping success * add log define * convolution op support * dump data function move to odla_dump * conv done * relu test done * fc op test done * test overchannel case * element wise op append * reback nvdla_layer_type to spec_ty * try new feature| cleancodetree * elementwise op test pass; simplify dataflow * optimize pass flow * fix intxx inverse to fp32 , int8 conv test done * resnet18 model test done * fix avg pool quantify err * concat op test done * deconv test failed, rubik engine must be enabled. * fix a bug, multiple input and output can be supportted * speed up opendla inference time * update debug define * append int8 group conv op; fix lowval fp support * remove include_derectories for debug * update * remove lib and include for release * updat ignore * format some text * reback ignore * update tutorial * update readme * format * reback scale * update lisence * append opendla to architeture.png Co-authored-by: Lei Wang <[email protected]> Co-authored-by: BUG1989 <[email protected]> Co-authored-by: LeiWang1999 <[email protected]>
1 parent 6fdcfdd commit d127912

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+8926
-2
lines changed

CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ OPTION (TENGINE_STANDALONE_HCL_AUTO_LOAD "Auto load standalone hcl lib"
7575
OPTION (TENGINE_ENABLE_ACL "With Arm Compute Library support" OFF)
7676
OPTION (TENGINE_ENABLE_CUDA "With nVIDIA CUDA support" OFF)
7777
OPTION (TENGINE_ENABLE_OPENCL "With Khronos OpenCL support" OFF)
78+
OPTION (TENGINE_ENABLE_OPENDLA "With Khronos OpenDLA support" OFF)
7879
OPTION (TENGINE_ENABLE_TENSORRT "With nVIDIA TensorRT support" OFF)
7980
OPTION (TENGINE_ENABLE_TIM_VX "With VeriSilicon TIM-VX support" OFF)
8081
OPTION (TENGINE_ENABLE_NNIE "With HiSilicon NNIE support" OFF)

doc/architecture.png

1.35 KB
Loading

doc/dla_opendla_user_manual_zh.md

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
# Tengine Lite with Opensource DeepLearning Accelerator
2+
3+
## 1. 简介
4+
5+
opendla是基于英伟达开源的加速器NVDLA,之所以后端的名称叫opendla是因为英伟达官方的仓库已经停止维护两年了,而显然NVDLA还有许多可以改进的空间,改进之后的加速器需要和原来的NVDLA作区分,索性就直接叫opendla了,暂时在[ZYNQ-NVDLA](https://github.com/LeiWang1999/ZYNQ-NVDLA)这个仓库维护。
6+
7+
现在的后端,只对接了 NVDLA 的 small 配置,有如下特点:
8+
9+
1. ZYNQ 7045 | XCZU9EG-2 可以跑到 100 Mhz
10+
2. 8\*8 的 PE 阵列
11+
3. 没有 Global SRAM 缓存
12+
4. 没有查找表电路
13+
5. 没有RUBIK数据重排引擎
14+
6. 目前支持的算子有:Conv|Relu|Min/Max/Avg Pooling|FullyConntected|ElementWise 其它会切给CPU运行
15+
16+
## 2. 如何编译
17+
### 2.1 依赖项
18+
依赖项有三部分:
19+
> 第一部分是 芯片对应的 opendla.ko 程序,在[这篇文章](https://zhuanlan.zhihu.com/p/378202360)里有介绍如何编译,目前[仓库](https://github.com/LeiWang1999/ZYNQ-NVDLA)里放置的版本是针对Linux 4.13内核的,如果是别的内核版本需要更改一些函数;
20+
> 第二部分是 NVDLA 的依赖库,包括libjpeg与libprotobuf,如果是aarch64架构可以直接使用预编译好的文件。
21+
> 第三部分是 NVDLA 原来支持的 Compiler 和 Runtime,需要编译出链接库放到lib目录下,如果是aarch64架构可以直接使用预编译好的文件。
22+
23+
### 2.2 编译过程
24+
为了方便理解全流程的过程,首先描述编译的完整过程的流程。
25+
26+
为了编译Tengine的opendla后端支持代码,首先需要编译 libcompiler.so 与 libruntime.so,而 libcompiler 依赖 libprotobuf (版本为2.6.1),libruntime 依赖 libjpeg (版本为libjpeg6b)。
27+
28+
### 2.3 拉取代码
29+
首先,**这里演示的整个编译的过程都在开发板卡上运行**,否则需要交叉编译;例子都是以root的身份来运行的;如何使用开发板连网可以参考[这篇文章](https://zhuanlan.zhihu.com/p/378814739)
30+
31+
#### 2.3.1 拉取 ZYNQ-NVDLA
32+
33+
```bash
34+
$ git clone https://github.com/LeiWang1999/ZYNQ-NVDLA # clone不下来的话就本地下载用sftp传上去吧:D
35+
```
36+
37+
#### 2.3.2 拉取 Tengine-Lite
38+
```bash
39+
$ git clone https://github.com/OAID/Tengine.git Tengine
40+
```
41+
42+
### 2.4 Tengine-Lite 集成编译 opendla
43+
Tengine-Lite 目前只支持一种 opendla 的集成编译方法,即编译opendla的软件支持,首先生成.so文件,而在Tengine编译opendla后端的时候进行链接。
44+
45+
其他的方案,例如在Tengine编译的过程中连同opendla的编译器和运行时的源代码一起编译,由于代码肯定是要重构的,所以现在还不支持。
46+
47+
这里不将内核驱动程序`opendla.ko`是如何编译的,如何在Petalinux里编译看这篇[文章](https://zhuanlan.zhihu.com/p/378202360)
48+
49+
如果是 aarch64 的架构,可以直接使用 [prebuilt](https://github.com/LeiWang1999/ZYNQ-NVDLA/tree/master/prebuilt/lib/aarch64-ubuntu) 的lib。
50+
51+
#### 2.4.0 载入内核驱动程序
52+
53+
```bash
54+
$ insmod /lib/modules/4.19.0-xilinx-v2019.1/extra/opendla.ko
55+
```
56+
57+
使用dmesg查看内核日志:
58+
59+
```bash
60+
$ dmesg | tail
61+
[ 12.817877] macb ff0e0000.ethernet eth0: link up (1000/Full)
62+
[ 12.817900] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
63+
[ 20.661453] opendla: loading out-of-tree module taints kernel.
64+
[ 20.664248] Probe NVDLA config nvidia,nv_small
65+
[ 20.669152] 0 . 12 . 5
66+
[ 20.669155] reset engine done
67+
[ 20.671257] [drm] Initialized nvdla 0.0.0 20171017 for a0000000.NV_nvdla_wrapper on minor 1
68+
```
69+
70+
查看是否注册了nvdla的中断以及nvdla驱动所需的设备`renderD128`是否存在来确定是否真的安装完成驱动了:
71+
72+
```bash
73+
root@arm:~# insmod /lib/modules/4.19.0-xilinx-v2019.1/extra/opendla.ko
74+
root@arm:~# cat /proc/interrupts | grep nvdla
75+
45: 0 0 GIC-0 61 Level 40000000.NV_nvdla_wrapper
76+
root@arm:~# ls /dev/dri/
77+
card0 renderD128
78+
```
79+
80+
#### 2.4.1 编译libjpeg6b
81+
82+
如果是aarch64,跳过该步骤即可,直接使用仓库里的libjpeg.a.
83+
84+
``` bash
85+
$ wget http://www.ijg.org/files/jpegsrc.v6b.tar.gz
86+
$ tar -xzvf jpegsrc.v6b.tar.gz
87+
$ cd jpeg-6b/
88+
$ ./configure
89+
$ make -j `nproc`
90+
$ make install
91+
$ cp /usr/local/lib/libjpeg.a ~/ZYNQ-NVDLA/umd/external/
92+
```
93+
94+
#### 2.4.2 编译libprotobuf.a
95+
96+
```bash
97+
$ cd ~/ZYNQ-NVDLA/umd/external/protobuf-2.6/
98+
$ apt-get install -y autoconf automake libtool
99+
$ autoscan & aclocal & autoconf
100+
$ automake --add-missing
101+
$ ./configure
102+
$ make -j `nproc`
103+
$ make install
104+
$ cp /usr/local/lib/libprotobuf.a ~/ZYNQ-NVDLA/umd/apps/compiler/
105+
$ cp /usr/local/lib/libprotobuf.a ~/ZYNQ-NVDLA/umd/core/src/compiler/
106+
```
107+
108+
#### 2.4.3 编译 Compiler 与 Runtime
109+
```bash
110+
$ cd ~/ZYNQ-NVDLA/umd/
111+
$ make -j `nproc` TOP=${PWD} TOOLCHAIN_PREFIX=/usr/bin/ compiler
112+
$ make -j `nproc` TOP=${PWD} TOOLCHAIN_PREFIX=/usr/bin/ runtime
113+
```
114+
这样在out目录下就会生成所需的lib,将lib和include拷贝到Tengine目录下:
115+
116+
```bash
117+
$ cp ~/ZYNQ-NVDLA/include -r ~/Tengine/source/device/opendla
118+
$ cp ~/ZYNQ-NVDLA/umd/out/core/src/compiler/libnvdla_compiler/libnvdla_compiler.so -r ~/Tengine/source/device/opendla/lib/
119+
$ cp ~/ZYNQ-NVDLA/umd/out/core/src/runtime/libnvdla_runtime/libnvdla_runtime.so -r ~/Tengine/source/device/opendla/lib/
120+
$ cp /usr/local/lib/libprotobuf.a ~/Tengine/source/device/opendla/lib/
121+
```
122+
123+
#### 2.4.4 编译 Tengine
124+
125+
```bash
126+
$ cd ~/Tengine
127+
$ mkdir build & cd build
128+
$ cmake .. -DTENGINE_ENABLE_OPENDLA=ON
129+
```
130+
131+
## 3. Demo
132+
133+
#### 3.1 Classification
134+
135+
**Resnet18-Cifar10**
136+
137+
```bash
138+
$ cd <tengine-lite-root-dir>/build
139+
$ cmake --build . --target tm_classification_opendla
140+
$ cd examples
141+
$ ./tm_classification_opendla -m /root/Tengine/models/resnet18-cifar10-nosoftmax-relu_int8.tmfile -i /root/Tengine/images/cat.jpg -g 32,32 -s 1,1,1
142+
Mean value not specified, use default 104.0, 116.7, 122.7
143+
tengine-lite library version: 1.4-dev
144+
NVDLA time: 0.012502 seconds
145+
146+
model file : /root/Tengine/models/resnet18-cifar10-nosoftmax-relu_int8.tmfile
147+
image file : /root/Tengine/images/cat.jpg
148+
img_h, img_w, scale[3], mean[3] : 32 32 , 1.000 1.000 1.000, 104.0 116.7 122.7
149+
Repeat 1 times, thread 1, avg time 12.62 ms, max_time 12.62 ms, min_time 12.62 ms
150+
--------------------------------------
151+
10.087049, 3
152+
3.833079, 2
153+
3.026115, 5
154+
2.420892, 4
155+
-0.403482, 0
156+
--------------------------------------
157+
```
158+
159+
#### 3.2 Detection
160+
161+
**Yolox-nano**
162+
163+
```bash
164+
$ cd <tengine-lite-root-dir>/build
165+
$ cmake --build . --target tm_classification_opendla tm_yolox_opendla
166+
$ cd examples
167+
$ ./tm_yolox_opendla -m /root/Tengine/models/yolox_nano_relu_int8.tmfile -i /root/Tengine/images/dog.jpg -r 1
168+
tengine-lite library version: 1.4-dev
169+
Repeat 1 times, thread 1, avg time 1138.80 ms, max_time 1138.80 ms, min_time 1138.80 ms
170+
--------------------------------------
171+
detection num: 3
172+
2: 70%, [ 463, 80, 676, 163], car
173+
16: 52%, [ 122, 220, 315, 517], dog
174+
1: 48%, [ 180, 181, 564, 430], bicycle
175+
```
176+
177+
Output:
178+
179+
![yolox_dla_out](yolox_dla_out.jpg)
180+
181+
## 附:其他
182+
183+
欢迎加入 QQ 群 829565581 来一起讨论!

doc/yolox_dla_out.jpg

182 KB
Loading

examples/CMakeLists.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,10 @@ IF(TENGINE_ENABLE_CUDA)
8181
TENGINE_EXAMPLE (tm_classification_cuda tm_classification_cuda.cpp)
8282
ENDIF()
8383

84+
IF(TENGINE_ENABLE_OPENDLA)
85+
TENGINE_EXAMPLE (tm_classification_opendla tm_classification_opendla.c)
86+
ENDIF()
87+
8488
IF(TENGINE_ENABLE_TENSORRT)
8589
TENGINE_EXAMPLE (tm_classification_trt tm_classification_trt.cpp)
8690
ENDIF()
@@ -127,6 +131,8 @@ IF (OpenCV_FOUND)
127131
TENGINE_EXAMPLE_CV (tm_yolov3 tm_yolov3.cpp)
128132
TENGINE_EXAMPLE_CV (tm_yolov3_uint8 tm_yolov3_uint8.cpp)
129133
TENGINE_EXAMPLE_CV (tm_yolov3_tiny tm_yolov3_tiny.cpp)
134+
TENGINE_EXAMPLE_CV (tm_yolov3_tiny_opendla tm_yolov3_tiny_opendla.cpp)
135+
TENGINE_EXAMPLE_CV (tm_yolov3_tiny_int8 tm_yolov3_tiny_int8.cpp)
130136
TENGINE_EXAMPLE_CV (tm_yolov3_tiny_uint8 tm_yolov3_tiny_uint8.cpp)
131137
TENGINE_EXAMPLE_CV (tm_yolov4 tm_yolov4.cpp)
132138
TENGINE_EXAMPLE_CV (tm_yolov4_uint8 tm_yolov4_uint8.cpp)
@@ -137,6 +143,8 @@ IF (OpenCV_FOUND)
137143
TENGINE_EXAMPLE_CV (tm_hrnet tm_hrnet.cpp)
138144
TENGINE_EXAMPLE_CV (tm_nanodet_m tm_nanodet_m.cpp)
139145
TENGINE_EXAMPLE_CV (tm_yolox tm_yolox.cpp)
146+
TENGINE_EXAMPLE_CV (tm_yolox_int8 tm_yolox_int8.cpp)
147+
TENGINE_EXAMPLE_CV (tm_yolox_opendla tm_yolox_opendla.cpp)
140148
TENGINE_EXAMPLE_CV (tm_yolox_darknet53 tm_yolox_darknet53.cpp)
141149
TENGINE_EXAMPLE_CV (tm_scrfd tm_scrfd.cpp)
142150
TENGINE_EXAMPLE_CV (tm_segformer tm_segformer.cpp)

0 commit comments

Comments
 (0)