Skip to content

【NPU】paddleocr在昇腾910B上推理耗时异常 #17423

@zaka16118

Description

@zaka16118

🔎 Search before asking

  • I have searched the PaddleOCR Docs and found no similar bug report.
  • I have searched the PaddleOCR Issues and found no similar bug report.
  • I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

在Atlas 800I A2(64G)上参考https://www.paddlepaddle.org.cn/documentation/docs/zh/hardware_support/npu/install_cn.html 官方指导文档完成环境部署:
1、拉取昇腾 NPU 开发镜像
2、创建容器
3、安装paddlepaddle 、 paddle-custom-npu
4、基础功能检查均正常
接下来参考https://www.paddleocr.ai/main/version3.x/pipeline_usage/instructions/benchmark.html?h=benchmark 里的benchmark测试脚本进行测试,结果两台相同机型的机器跑出来的结果有很大的差异:机器A耗时25s,机器B既然耗时140s。如下表所示:

Level Operation Time (ms)-A Time (ms)-B
1 _OCRPipeline.predict 25473.5812769969 140310.851979814
2 Layer 25473.5812769969 140310.851979814
Core 2730.83080103388 4887.04342823475
Other 22742.7504759631 135423.808551579

耗时差异主要体现在Other这一部分。查询benchmark测试脚本得知Other耗时含义:summary["other"] = summary["end_to_end"] - summary["core"]。
这部分耗时无明确的含义。
这一问题应该如何进行下一步排查或者如何解决?

🏃‍♂️ Environment (运行环境)

OS                             openEuler 22.03 (LTS-SP4)
docker                         18.09.0
image                          ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-aarch64-gcc84
CANN                           8.0.RC2
python                         3.10.16
paddle-custom-npu              0.0.0
paddle2onnx                    1.3.1
paddleocr                      3.3.1
paddlepaddle                   3.2.1
paddlex                        3.3.9 

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

from paddleocr import PaddleOCR, benchmark

image = "01.jpg"
gpu_id = 0

if __name__ == '__main__':
    pipeline = PaddleOCR(lang="ch",
                         doc_orientation_classify_model_dir='./models/PP-LCNet_x1_0_doc_ori_infer/',
                         doc_unwarping_model_dir='./models/UVDoc_infer/',
                         textline_orientation_model_dir='./models/PP-LCNet_x1_0_textline_ori_infer/',
                         text_detection_model_dir='./models/PP-OCRv5_server_det_infer/',
                         text_recognition_model_dir='./models/PP-OCRv5_server_rec_infer/',
                         use_doc_orientation_classify=True, use_doc_unwarping=True, use_textline_orientation=True,
                         enable_mkldnn=False, precision='fp16', device='npu:' + str(gpu_id),
                         text_det_limit_type='max', text_det_limit_side_len=960
                         )

    benchmark.start_warmup() # warmup开始
    for _ in range(10):
        pipeline.predict(image)
    benchmark.stop_warmup() # warmup结束

    for _ in range(10): # 开始正式测速
        pipeline.predict(image)

    benchmark.print_pipeline_data()  # 打印汇总的benchmark数据
    benchmark.save_pipeline_data("./benchmark") # 将benchmark数据保存至benchmark文件

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions