8. 模型转换示例#

本章节提供多种典型模型的 pulsar2 build 转换示例, 包括完整的配置文件、转换命令、真实 log 及模型输入输出说明. 所有示例均基于 AX650 平台, 使用 Pulsar2 5.1 版本.

备注

本章节中的模型和配置文件均来自 AXERA-TECH HuggingFace
转换前请确保已使用 onnxsim 工具对原始模型进行优化
模型的输入输出 tensor 名称需与 ONNX 模型实际定义一致, 可通过 onnx inspect --io model.onnx 查看

8.1. YOLOv5s (目标检测)#

8.1.1. 模型简介#

YOLOv5s 是 Ultralytics 发布的实时目标检测模型, 采用 CSPDarknet 骨干网络, 适用于实时检测场景.

HuggingFace: AXERA-TECH/YOLOv5
模型来源: ultralytics/yolov5
AxSamples: ax-samples / axcl-samples

8.1.2. 配置文件#

yolov5_build.json:

{
  "model_type": "ONNX",
  "npu_mode": "NPU1",
  "quant": {
    "input_configs": [
      {
        "tensor_name": "images",
        "calibration_dataset": "calib-cocotest2017.tar",
        "calibration_size": 32,
        "calibration_mean": [0, 0, 0],
        "calibration_std": [255.0, 255.0, 255.0]
      }
    ],
    "calibration_method": "MinMax",
    "precision_analysis": false
  },
  "input_processors": [
    {
      "tensor_name": "images",
      "tensor_format": "RGB",
      "src_format": "BGR",
      "src_dtype": "U8",
      "src_layout": "NHWC"
    }
  ],
  "output_processors": [
    {
      "tensor_name": "/model.24/m.0/Conv_output_0",
      "dst_perm": [0, 2, 3, 1]
    },
    {
      "tensor_name": "/model.24/m.1/Conv_output_0",
      "dst_perm": [0, 2, 3, 1]
    },
    {
      "tensor_name": "/model.24/m.2/Conv_output_0",
      "dst_perm": [0, 2, 3, 1]
    }
  ],
  "compiler": {
    "check": 0
  }
}

注意

output_processors 中的 tensor_name 为 YOLOv5s 模型三个检测头的输出名称. 不同版本的模型可能不同, 请使用 onnx inspect --io model.onnx 查看实际的 tensor 名称. dst_perm 用于将输出从 NCHW 转换为 NHWC 布局, 便于后处理.

8.1.3. 编译执行#

pulsar2 build --target_hardware AX650 --input yolov5s-cut.onnx --output_dir output --config yolov5_build.json

8.1.3.1. log 参考信息#

+----------------------------------+----------------------------+
|            Model Name            |         OnnxModel          |
+----------------------------------+----------------------------+
|            Model Info            | Op Set: 17 / IR Version: 8 |
+----------------------------------+----------------------------+
|            IN: images            | float32: (1, 3, 640, 640)  |
| OUT: /model.24/m.0/Conv_output_0 | float32: (1, 255, 80, 80)  |
| OUT: /model.24/m.1/Conv_output_0 | float32: (1, 255, 40, 40)  |
| OUT: /model.24/m.2/Conv_output_0 | float32: (1, 255, 20, 20)  |
+----------------------------------+----------------------------+
|               Add                |             7              |
|              Concat              |             13             |
|               Conv               |             60             |
|             MaxPool              |             3              |
|               Mul                |             57             |
|              Resize              |             2              |
|             Sigmoid              |             57             |
+----------------------------------+----------------------------+
|            Model Size            |          27.56 MB          |
+----------------------------------+----------------------------+
...
Calibration Progress(Phase 1): 100%|██████████| 32/32 [00:17<00:00,  1.79it/s]
...
--------- Network Snapshot ---------
Num of Op:                    [142]
Num of Quantized Op:          [142]
Num of Variable:              [269]
Num of Quantized Var:         [269]
------- Quantization Snapshot ------
Num of Quant Config:          [432]
BAKED:                        [60]
OVERLAPPED:                   [168]
SLAVE:                        [9]
ACTIVATED:                    [129]
SOI:                          [6]
PASSIVE_BAKED:                [60]
Network Quantization Finished.
...
tiling op...   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 147/147 0:00:00
build op serially...   ━━━━━━━━━━━━━━━━━━━━━━━━━━ 649/649 0:00:02
build op...   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1139/1139 0:00:00
...
2026-03-23 19:44:00.890 | INFO     | yamain.command.build:compile_ptq_model:1365 - fuse 1 subgraph(s)

8.1.4. 模型输入输出说明#

方向	Tensor 名称	数据类型	Shape	说明
输入	images	UINT8	(1, 640, 640, 3)	BGR 格式图像, NHWC 布局, 需 Letterbox 预处理
输出	/model.24/m.0/Conv_output_0	FLOAT32	(1, 80, 80, 255)	大尺度特征图 (检测小目标)
输出	/model.24/m.1/Conv_output_0	FLOAT32	(1, 40, 40, 255)	中尺度特征图 (检测中目标)
输出	/model.24/m.2/Conv_output_0	FLOAT32	(1, 20, 20, 255)	小尺度特征图 (检测大目标)

提示

板端推理耗时约 6.32 ms (AX650). 完整的板端运行示例参考 AXERA-TECH/YOLOv5.

8.2. YOLO11s (目标检测)#

8.2.1. 模型简介#

YOLO11s 是 Ultralytics 发布的最新一代 YOLO 检测模型, 采用改进的骨干网络和检测头设计, 精度和速度优于上一代.

HuggingFace: AXERA-TECH/YOLO11
模型来源: ultralytics/ultralytics
AxSamples: ax-samples / axcl-samples

8.2.2. 配置文件#

yolo11_build.json:

{
  "model_type": "ONNX",
  "npu_mode": "NPU1",
  "quant": {
    "input_configs": [
      {
        "tensor_name": "images",
        "calibration_dataset": "calib-cocotest2017.tar",
        "calibration_size": 32,
        "calibration_mean": [0, 0, 0],
        "calibration_std": [255.0, 255.0, 255.0]
      }
    ],
    "calibration_method": "MinMax",
    "precision_analysis": false
  },
  "input_processors": [
    {
      "tensor_name": "images",
      "tensor_format": "BGR",
      "src_format": "BGR",
      "src_dtype": "U8",
      "src_layout": "NHWC"
    }
  ],
  "output_processors": [
    {
      "tensor_name": "/model.23/Concat_output_0",
      "dst_perm": [0, 2, 3, 1]
    },
    {
      "tensor_name": "/model.23/Concat_1_output_0",
      "dst_perm": [0, 2, 3, 1]
    },
    {
      "tensor_name": "/model.23/Concat_2_output_0",
      "dst_perm": [0, 2, 3, 1]
    }
  ],
  "compiler": {
    "check": 0
  }
}

8.2.3. 编译执行#

pulsar2 build --target_hardware AX650 --input yolo11s-cut.onnx --output_dir output --config yolo11_build.json

8.2.3.1. log 参考信息#

+----------------------------------+----------------------------+
|            Model Name            |         OnnxModel          |
+----------------------------------+----------------------------+
|            Model Info            | Op Set: 17 / IR Version: 9 |
+----------------------------------+----------------------------+
|            IN: images            | float32: (1, 3, 640, 640)  |
|  OUT: /model.23/Concat_output_0  | float32: (1, 144, 80, 80)  |
| OUT: /model.23/Concat_1_output_0 | float32: (1, 144, 40, 40)  |
| OUT: /model.23/Concat_2_output_0 | float32: (1, 144, 20, 20)  |
+----------------------------------+----------------------------+
|               Add                |             14             |
|              Concat              |             20             |
|               Conv               |             87             |
|              MatMul              |             2              |
|             MaxPool              |             3              |
|               Mul                |             78             |
|             Reshape              |             3              |
|              Resize              |             2              |
|             Sigmoid              |             77             |
|             Softmax              |             1              |
|              Split               |             10             |
|            Transpose             |             2              |
+----------------------------------+----------------------------+
|            Model Size            |          36.03 MB          |
+----------------------------------+----------------------------+
...
Calibration Progress(Phase 1): 100%|██████████| 32/32 [00:25<00:00,  1.25it/s]
...
--------- Network Snapshot ---------
Num of Op:                    [222]
Num of Quantized Op:          [222]
Num of Variable:              [426]
Num of Quantized Var:         [426]
------- Quantization Snapshot ------
Num of Quant Config:          [693]
BAKED:                        [88]
OVERLAPPED:                   [295]
SLAVE:                        [16]
ACTIVATED:                    [190]
SOI:                          [17]
PASSIVE_BAKED:                [87]
Network Quantization Finished.
...
tiling op...   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 235/235 0:00:01
build op serially...   ━━━━━━━━━━━━━━━━━━━━━━━ 1033/1033 0:00:05
build op...   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1689/1689 0:00:00
...
2026-03-23 19:45:20.303 | INFO     | yamain.command.build:compile_ptq_model:1365 - fuse 1 subgraph(s)

8.2.4. 模型输入输出说明#

方向	Tensor 名称	数据类型	Shape	说明
输入	images	UINT8	(1, 640, 640, 3)	BGR 格式图像, NHWC 布局, 需 Letterbox 预处理
输出	/model.23/Concat_output_0	FLOAT32	(1, 80, 80, 144)	大尺度特征图 (检测小目标)
输出	/model.23/Concat_1_output_0	FLOAT32	(1, 40, 40, 144)	中尺度特征图 (检测中目标)
输出	/model.23/Concat_2_output_0	FLOAT32	(1, 20, 20, 144)	小尺度特征图 (检测大目标)

提示

YOLO11 相比 YOLOv5 采用了注意力机制 (含 MatMul 和 Softmax 算子), 模型更大但检测精度更高. 板端推理耗时约 25 ms (AX650). 完整的板端运行示例参考 AXERA-TECH/YOLO11.

8.3. Depth-Anything-V2 (单目深度估计)#

8.3.1. 模型简介#

Depth-Anything-V2 是基于 DINOv2 的单目深度估计模型, 输入单张 RGB 图片, 输出逐像素深度图. 本示例使用 ViT-Small 版本.

HuggingFace: AXERA-TECH/Depth-Anything-V2
模型来源: depth-anything/Depth-Anything-V2-Small
ONNX导出参考: DepthAnythingV2

8.3.2. 配置文件#

config.json (省略部分 layer_configs 条目, 完整配置请参考 HuggingFace 仓库):

{
  "model_type": "ONNX",
  "npu_mode": "NPU3",
  "quant": {
    "input_configs": [
      {
        "tensor_name": "DEFAULT",
        "calibration_dataset": "calib-cocotest2017.tar",
        "calibration_size": 32,
        "calibration_mean": [123.675, 116.28, 103.53],
        "calibration_std": [58.395, 57.12, 57.375]
      }
    ],
    "calibration_method": "MinMax",
    "precision_analysis": true,
    "precision_analysis_method": "EndToEnd",
    "conv_bias_data_type": "FP32",
    "enable_smooth_quant": true,
    "disable_auto_refine_scale": true,
    "layer_configs":  [
      {
        "layer_name": "op_173:onnx.Mul_1",
        "data_type": "U16"
      },
      {
        "layer_name": "op_173:onnx.Softmax_0",
        "data_type": "U16"
      },
      {
        "layer_name": "op_173:onnx.MatMul_qkv_0",
        "data_type": "U16"
      },
      ...
    ]
  },
  "input_processors": [
    {
      "tensor_name": "DEFAULT",
      "tensor_format": "RGB",
      "src_format": "BGR",
      "src_dtype": "U8",
      "src_layout": "NHWC"
    }
  ],
  "compiler": {
    "check": 0
  }
}

注意

该模型使用 NPU3 模式 (三核), 充分利用 AX650 的全部 NPU 算力
开启了 enable_smooth_quant 以降低 Transformer 结构中的 outlier 影响
layer_configs 中配置了大量 Softmax、MatMul 等算子使用 U16 精度, 以保证 ViT 模型的量化精度
conv_bias_data_type 设置为 FP32 以提升精度
完整的 layer_configs (约 50 项) 请参考 HuggingFace 仓库中的 config.json

8.3.3. 编译执行#

pulsar2 build --target_hardware AX650 --input depth_anything_v2_vits.onnx --output_dir output --config config.json

8.3.3.1. log 参考信息#

+---------------+----------------------------+
|  Model Name   |         OnnxModel          |
+---------------+----------------------------+
|  Model Info   | Op Set: 12 / IR Version: 7 |
+---------------+----------------------------+
|   IN: input   | float32: (1, 3, 518, 518)  |
|  OUT: output  | float32: (1, 1, 518, 518)  |
+---------------+----------------------------+
|      Add      |            148             |
|    Concat     |             1              |
|     Conv      |             31             |
| ConvTranspose |             2              |
|      Div      |             37             |
|      Erf      |             12             |
|    Gather     |             36             |
|    MatMul     |             72             |
|      Mul      |             88             |
|      Pow      |             25             |
|  ReduceMean   |             50             |
|     Relu      |             16             |
|    Reshape    |             29             |
|    Resize     |             5              |
|     Slice     |             4              |
|    Softmax    |             12             |
|     Sqrt      |             25             |
|      Sub      |             25             |
|   Transpose   |             41             |
+---------------+----------------------------+
|  Model Size   |          94.26 MB          |
+---------------+----------------------------+
...
Enable Smooth Quant, this pass is used for outlier activation.
...
Analysing Smooth Quantization Error(Phrase 1): 100%|██████████| 32/32 [00:51<00:00,  1.62s/it]
Get Outlier Progress: 100%|██████████| 32/32 [01:15<00:00,  2.35s/it]
...
Analysing Smooth Quantization Error(Phrase 2): 100%|██████████| 32/32 [00:51<00:00,  1.62s/it]
...
--------- Network Snapshot ---------
Num of Op:                    [792]
Num of Quantized Op:          [792]
Num of Variable:              [1552]
Num of Quantized Var:         [1552]
------- Quantization Snapshot ------
Num of Quant Config:          [2581]
...
Network Quantization Finished.
...
tiling op...   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 762/762 0:00:02
build op serially...   ━━━━━━━━━━━━━━━━━━━━━━━ 1178/1178 0:00:11
build op...   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1734/1734 0:00:00
add ddr swap...   ━━━━━━━━━━━━━━━━━━━━━━━━━ 15821/15821 0:00:01
...
2026-03-23 19:42:38.553 | INFO     | yamain.command.build:compile_ptq_model:1365 - fuse 1 subgraph(s)

备注

该模型转换全流程耗时约 8 分钟, 其中 Smooth Quant 分析和逐层精度对分占用了大部分时间. 如果不需要精度分析, 可将 precision_analysis 设置为 false 以加快转换速度.

8.3.4. 模型输入输出说明#

方向	Tensor 名称	数据类型	Shape	说明
输入	input	UINT8	(1, 518, 518, 3)	BGR 格式图像 (运行时输入 BGR 自动转换为 RGB), NHWC 布局
输出	output	FLOAT32	(1, 1, 518, 518)	逐像素深度图, 值越大表示距离越远

提示

板端推理耗时约 33 ms (AX650, NPU3 三核模式). 使用 Python 推理示例参考 AXERA-TECH/Depth-Anything-V2, 需安装 pyaxengine.

8.4. CN-CLIP (中文多模态文本编码器)#

8.4.1. 模型简介#

Chinese-CLIP 是基于 CLIP 框架的中文多模态预训练模型. 本示例为其 BERT 文本编码器 部分 (ViT-L/14 配套), 将中文文本编码为嵌入向量, 用于与图像嵌入计算相似度.

HuggingFace: AXERA-TECH/cnclip
模型来源: OFA-Sys/Chinese-CLIP
ONNX导出参考: cnclip.axera
AxSamples: CLIP-ONNX-AX650-CPP

8.4.2. 配置文件#

cnclip_build.json:

{
  "model_type": "ONNX",
  "npu_mode": "NPU1",
  "quant": {
    "input_configs": [
      {
        "tensor_name": "text",
        "calibration_dataset": "calib_text.tar",
        "calibration_format": "Numpy",
        "calibration_size": 32,
        "calibration_mean": [0],
        "calibration_std": [1]
      }
    ],
    "calibration_method": "MinMax",
    "precision_analysis": false,
    "transformer_opt_level": 1
  },
  "input_processors": [
    {
      "tensor_name": "text",
      "src_dtype": "S32",
      "src_layout": "NCHW"
    }
  ],
  "compiler": {
    "check": 0
  }
}

注意

该模型为 文本编码器, 输入为分词后的 token ID 序列, 不是图像
calibration_format 设置为 Numpy, 校准数据为预分词的 numpy 数组 (shape (1, 52), dtype int64)
src_dtype 设置为 S32 (有符号 32 位整数), 对应 token ID 输入
transformer_opt_level 设置为 1, 启用 Transformer 模型专用量化优化

8.4.3. 编译执行#

pulsar2 build --target_hardware AX650 --input cnclip_vit_l14_336px_bert_encoder.onnx --output_dir output --config cnclip_build.json

8.4.3.1. log 参考信息#

+---------------------------+----------------------------+
|        Model Name         |         OnnxModel          |
+---------------------------+----------------------------+
|        Model Info         | Op Set: 14 / IR Version: 7 |
+---------------------------+----------------------------+
|         IN: text          |       int64: (1, 52)       |
| OUT: unnorm_text_features |     float32: (1, 768)      |
+---------------------------+----------------------------+
|            Add            |            172             |
|           Cast            |             3              |
|         Constant          |            154             |
|            Div            |             49             |
|            Erf            |             12             |
|          Gather           |             4              |
|          MatMul           |             97             |
|            Mul            |             50             |
|            Pow            |             25             |
|        ReduceMean         |             50             |
|          Reshape          |             48             |
|          Softmax          |             12             |
|           Sqrt            |             25             |
|            Sub            |             26             |
|         Transpose         |             48             |
+---------------------------+----------------------------+
|        Model Size         |         390.12 MB          |
+---------------------------+----------------------------+
...
Transformer optimize level: 1
...
Calibration Progress(Phase 1): 100%|██████████| 32/32 [00:11<00:00,  2.81it/s]
...
--------- Network Snapshot ---------
Num of Op:                    [312]
Num of Quantized Op:          [308]
Num of Variable:              [588]
Num of Quantized Var:         [583]
------- Quantization Snapshot ------
Num of Quant Config:          [949]
BAKED:                        [89]
OVERLAPPED:                   [452]
ACTIVATED:                    [224]
SOI:                          [61]
PASSIVE_BAKED:                [72]
FP32:                         [51]
Network Quantization Finished.
...
tiling op...   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 340/340 0:00:02
build op serially...   ━━━━━━━━━━━━━━━━━━━━━━━━━━ 300/300 0:00:04
build op...   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 386/386 0:00:00
add ddr swap...   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3859/3859 0:00:00
...
2026-03-23 19:47:45.563 | INFO     | yamain.command.build:compile_ptq_model:1365 - fuse 1 subgraph(s)

8.4.4. 模型输入输出说明#

方向	Tensor 名称	数据类型	Shape	说明
输入	text	S32	(1, 52)	分词后的 token ID 序列, 最大长度 52
输出	unnorm_text_features	FLOAT32	(1, 768)	未归一化的文本嵌入向量

提示

部署时需配合视觉编码器使用: 视觉编码器提取图像嵌入, 文本编码器提取文本嵌入, 通过计算余弦相似度完成图文匹配. 分词器使用 cn_vocab.txt (随模型提供). 板端运行示例参考 CLIP-ONNX-AX650-CPP.