2. Pulsar2 Toolchain overview
2.1. Introduction
Pulsar2 is an all-in-one
new generation neural network compiler independently developed by Axera,
That is, conversion, quantification, compilation, and heterogeneous are four-in-one to achieve the fast and efficient deployment requirements of deep learning neural network models.
In-depth customization and optimization have been carried out for the characteristics of the new generation of AX6、M7、M5 series chips (AX630C、AX620Q、AX650A、AX650N、M76H、M57H), giving full play to the computing power of the on-chip heterogeneous computing unit (CPU+NPU) to improve the performance of the neural network model. Product deployment efficiency.
Special Note:
- Tips in the toolchain documentation
Note: Note content, further explanation of certain professional terms
Hint: Hint content, reminding users to confirm relevant information
Attention: Attention content, reminding users of relevant precautions for tool configuration
Warning: Warning content, reminding users to pay attention to the correct use of the tool chain. If the customer does not use it according to the Warning prompt content, incorrect results may occur.
The commands in the tool chain document are compatible with on-board chips, such as
Pulsar2
supportsM76H
The example commands and example output in the tool chain documentation are all based on
AX650
.The computing power configuration of the specific chip is subject to the chip SPEC.
The core function of the Pulsar2
tool chain is to compile the .onnx
model into an .axmodel
model that the chip can parse and run.
Deployment Process

2.2. Introduction to Virtual NPU

AX650 and M76H NPU are mainly composed of 3 Conv convolution calculation cores and 3 sets of vector Vector calculation cores. These Conv and Vector computing cores are allocated in a 1:1 ratio and divided into 3 groups of vNPU.
At runtime, you can set the working mode of the NPU through the AXEngine API and flexibly group the vNPUs. It can be set to three symmetrical vNPU modes of 1 + 1 + 1, or a large vNPU mode of 2 + 1. It can also be set to 3 large computing power single vNPU mode.
When converting the model, you can flexibly specify the number of vNPUs required for model inference according to your needs (see the
--npu_mode parameter
ofpulsar2 build
for details). When the model is deployed and loaded on the chip platform, AXEngine can allocate the model to run on the vNPU with corresponding computing power according to the currently set NPU working mode.

AX630C、AX620Q adopt a dual-core NPU design and are divided into two working conditions to allocate different computing power to users depending on whether AI-ISP is enabled.
When converting the model, you need to explicitly configure the NPU working mode of the user model according to the actual working conditions of AI-ISP in the business (for details, please see the
--npu_mode parameter
ofpulsar2 build
).The NPU modules in AX630C、AX620Q all use Neutron 4.0 NPU engine. The subsequent chapters use
AX620E
to simplify the target hardware platform specification.
2.3. Guide to the content of subsequent chapters
Section3: This chapter introduces the preparation and installation of the hardware and software environment for model conversion and deployment using the NPU tool chain. How to install
Docker
and start the container in different system environmentsSection4: This chapter introduces the basic application process of the NPU tool chain on the Axera AX650 (including AX650A, AX650N, M76H) chip platform
Section5: This chapter introduces the basic application process of the NPU tool chain on the Axera AX620E (including AX620Q, AX630C) chip platform
Section6: This chapter is an advanced description of model conversion, that is, a detailed description of how to use the
Pulsar2 Docker
tool chain to convert theonnx
model to theaxmodel
modelSection7: This chapter is an advanced description of model simulation, that is, a detailed description of how to use the
axmodel
model to simulate and run on thex86
platform and measure the difference between the inference results and theonnx
inference results (internally calledbisection
)Section8: This chapter is an advanced description of running the model on the board, that is, a detailed description of how to run
axmodel
on the board to get the inference results of the model on the Axera SOC hardwareSection9: This chapter describes in detail the configuration file used in the model conversion and compilation process
Section10: The model exported by the Caffe AI training platform is not in the
onnx
format supported by the NPU tool chain. A tool is required to convert the Caffe model into anonnx
model. This chapter introduces how to use this model conversion tool.Section11: This chapter is an instruction for using the speed and accuracy test tool on the model board
Section12: This chapter is a statement of functional safety compliance of the NPU tool chain
Appendix: The document appendix includes a list of supported operators and accuracy tuning suggestions
Note
The so-called bisection
is to compare the error between the inference results of different versions (file types) of the same model before and after the toolchain is compiled.