【NPU】paddleocr在昇腾910B上推理耗时异常

### 🔎 Search before asking

- [x] I have searched the PaddleOCR [Docs](https://paddlepaddle.github.io/PaddleOCR/) and found no similar bug report.
- [x] I have searched the PaddleOCR [Issues](https://github.com/PaddlePaddle/PaddleOCR/issues) and found no similar bug report.
- [x] I have searched the PaddleOCR [Discussions](https://github.com/PaddlePaddle/PaddleOCR/discussions) and found no similar bug report.

### 🐛 Bug (问题描述)

在Atlas 800I A2（64G）上参考https://www.paddlepaddle.org.cn/documentation/docs/zh/hardware_support/npu/install_cn.html 官方指导文档完成环境部署：
1、拉取昇腾 NPU 开发镜像
2、创建容器
3、安装paddlepaddle 、 paddle-custom-npu
4、基础功能检查均正常
接下来参考https://www.paddleocr.ai/main/version3.x/pipeline_usage/instructions/benchmark.html?h=benchmark 里的benchmark测试脚本进行测试，结果两台相同机型的机器跑出来的结果有很大的差异：机器A耗时25s，机器B既然耗时140s。如下表所示：

| Level | Operation | Time (ms)-A | Time (ms)-B | 
|--------|--------|--------|--------|
| 1 | _OCRPipeline.predict | 25473.5812769969 | 140310.851979814 | 
| 2 | Layer | 25473.5812769969 | 140310.851979814 | 
|  | Core | 2730.83080103388 | 4887.04342823475 | 
|  | Other | 22742.7504759631 | 135423.808551579 | 

耗时差异主要体现在Other这一部分。查询benchmark测试脚本得知Other耗时含义：summary["other"] = summary["end_to_end"] - summary["core"]。
这部分耗时无明确的含义。
这一问题应该如何进行下一步排查或者如何解决？

### 🏃‍♂️ Environment (运行环境)

```bash
OS                             openEuler 22.03 (LTS-SP4)
docker                         18.09.0
image                          ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-aarch64-gcc84
CANN                           8.0.RC2
python                         3.10.16
paddle-custom-npu              0.0.0
paddle2onnx                    1.3.1
paddleocr                      3.3.1
paddlepaddle                   3.2.1
paddlex                        3.3.9 
```

### 🌰 Minimal Reproducible Example (最小可复现问题的Demo)

```bash
from paddleocr import PaddleOCR, benchmark

image = "01.jpg"
gpu_id = 0

if __name__ == '__main__':
    pipeline = PaddleOCR(lang="ch",
                         doc_orientation_classify_model_dir='./models/PP-LCNet_x1_0_doc_ori_infer/',
                         doc_unwarping_model_dir='./models/UVDoc_infer/',
                         textline_orientation_model_dir='./models/PP-LCNet_x1_0_textline_ori_infer/',
                         text_detection_model_dir='./models/PP-OCRv5_server_det_infer/',
                         text_recognition_model_dir='./models/PP-OCRv5_server_rec_infer/',
                         use_doc_orientation_classify=True, use_doc_unwarping=True, use_textline_orientation=True,
                         enable_mkldnn=False, precision='fp16', device='npu:' + str(gpu_id),
                         text_det_limit_type='max', text_det_limit_side_len=960
                         )

    benchmark.start_warmup() # warmup开始
    for _ in range(10):
        pipeline.predict(image)
    benchmark.stop_warmup() # warmup结束

    for _ in range(10): # 开始正式测速
        pipeline.predict(image)

    benchmark.print_pipeline_data()  # 打印汇总的benchmark数据
    benchmark.save_pipeline_data("./benchmark") # 将benchmark数据保存至benchmark文件
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

【NPU】paddleocr在昇腾910B上推理耗时异常 #17423

🔎 Search before asking

🐛 Bug (问题描述)

🏃‍♂️ Environment (运行环境)

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Level	Operation	Time (ms)-A	Time (ms)-B
1	_OCRPipeline.predict	25473.5812769969	140310.851979814
2	Layer	25473.5812769969	140310.851979814
	Core	2730.83080103388	4887.04342823475
	Other	22742.7504759631	135423.808551579

【NPU】paddleocr在昇腾910B上推理耗时异常 #17423

Description

🔎 Search before asking

🐛 Bug (问题描述)

🏃‍♂️ Environment (运行环境)

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions