9. Common configuration examples#
This section provides configuration examples for common pulsar2 build scenarios, which can be quickly referenced and reused. All examples are based on the AX650 platform.
Note
For the complete definition of configuration fields, please refer to Configuration File Detailed Description
tensor_namemust match the actual tensor names defined in the ONNX model. You can check them viaonnx inspect --io model.onnx
9.1. RGB input#
This is the most common configuration for image models. input_processors declares the runtime input data attributes of compiled.axmodel. The toolchain automatically embeds preprocessing operators into the model based on the configuration (such as dtype conversion, normalization, and layout conversion).
Warning
The combination of tensor_format and src_format does not support RGB ↔ BGR channel swapping. If you set src_format to BGR and tensor_format to RGB (or vice versa), the compiled model will not embed a channel-reorder operator. Color space conversion is only supported in the YUV input scenario.
9.1.1. Preprocessing is done inside compiled.axmodel#
Embed preprocessing (normalization and layout conversion) into compiled.axmodel with the following configuration:
{
"model_type": "ONNX",
"npu_mode": "NPU1",
"quant": {
"input_configs": [
{
"tensor_name": "input",
"calibration_dataset": "./dataset/imagenet-32-images.tar",
"calibration_size": 32,
"calibration_mean": [103.939, 116.779, 123.68],
"calibration_std": [58.0, 58.0, 58.0]
}
],
"calibration_method": "MinMax",
"precision_analysis": false
},
"input_processors": [
{
"tensor_name": "input",
"tensor_format": "BGR",
"tensor_layout": "NHWC",
"src_format": "BGR",
"src_dtype": "U8",
"src_layout": "NHWC"
}
],
"compiler": {
"check": 0
}
}
Key configuration:
Set
src_dtypetoU8: the input ofcompiled.axmodelbecomes U8, and the toolchain automatically inserts anAxDequantizeLineardequantization operator in the frontend to convert U8 to FP32 as required by the model.Set
src_layouttoNHWC: the toolchain inserts anAxTransposeoperator to convert NHWC to NCHW as required by the model.calibration_mean/calibration_std: the toolchain inserts anAxNormalizeoperator to perform normalization.
You can confirm that preprocessing operators are embedded from the build log (check the output after Building native):
... | WARNING | yamain.command.load_model:pre_process:616 - preprocess tensor [input]
... | INFO | yamain.command.load_model:pre_process:618 - tensor: input, (1, 224, 224, 3), U8
... | INFO | yamain.command.load_model:pre_process:619 - op: op:pre_dequant_1, AxDequantizeLinear, {'const_inputs': {'x_zeropoint': array(0, dtype=int32), 'x_scale': array(1., dtype=float32)}, 'output_dtype': <class 'numpy.float32'>, 'quant_method': 0}
... | INFO | yamain.command.load_model:pre_process:618 - tensor: tensor:pre_norm_1, (1, 224, 224, 3), FP32
... | INFO | yamain.command.load_model:pre_process:619 - op: op:pre_norm_1, AxNormalize, {'dim': 3, 'mean': [103.93900299072266, 116.77899932861328, 123.68000030517578], 'std': [58.0, 58.0, 58.0], 'output_dtype': FP32}
... | INFO | yamain.command.load_model:pre_process:618 - tensor: tensor:pre_transpose_1, (1, 224, 224, 3), FP32
... | INFO | yamain.command.load_model:pre_process:619 - op: op:pre_transpose_1, AxTranspose, {'perm': [0, 3, 1, 2]}
... | WARNING | yamain.command.load_model:post_process:627 - postprocess tensor [output]
Between preprocess tensor [input] and postprocess tensor [output], the log shows three preprocessing operators:
AxDequantizeLinear: U8 → FP32 dtype conversionAxNormalize: normalization (subtract mean, divide by std)AxTranspose: NHWC → NCHW layout conversion
9.1.2. Preprocessing is NOT done inside compiled.axmodel#
If you want to perform preprocessing on CPU side (normalization, layout conversion, etc.) and then feed it to NPU for inference, you should configure input_processors to be exactly the same as the floating-point model input:
{
"model_type": "ONNX",
"npu_mode": "NPU1",
"quant": {
"input_configs": [
{
"tensor_name": "input",
"calibration_dataset": "./dataset/imagenet-32-images.tar",
"calibration_size": 32,
"calibration_mean": [103.939, 116.779, 123.68],
"calibration_std": [58.0, 58.0, 58.0]
}
],
"calibration_method": "MinMax",
"precision_analysis": false
},
"input_processors": [
{
"tensor_name": "input",
"tensor_format": "BGR",
"tensor_layout": "NCHW",
"src_format": "BGR",
"src_dtype": "FP32",
"src_layout": "NCHW",
"mean": [0, 0, 0],
"std": [1, 1, 1]
}
],
"compiler": {
"check": 0
}
}
Key configuration:
Set
src_dtypetoFP32: same as the model input type, no dtype-conversion operator is inserted.Set
src_layouttoNCHW: same as the model input layout, no layout-conversion operator is inserted.Explicitly set
meanto[0, 0, 0]andstdto[1, 1, 1]: override the default values fromcalibration_mean/calibration_stdso that no normalization operator is inserted.
Attention
You must explicitly configure mean and std. If they are not configured, the toolchain will use calibration_mean / calibration_std by default and will still embed a normalization operator into the model.
You can confirm that no preprocessing operator is embedded from the build log (check the output after Building native):
... | WARNING | yamain.command.load_model:pre_process:616 - preprocess tensor [input]
... | WARNING | yamain.command.load_model:post_process:627 - postprocess tensor [output]
If there is no op: line between preprocess tensor [input] and postprocess tensor [output], it means compiled.axmodel does not include preprocessing operators. At runtime, users need to do the following by themselves:
Image decode and resize
BGR channel normalization (subtract mean, divide by std)
NHWC → NCHW layout conversion
Convert dtype to FP32
9.1.3. Field description#
Field |
Description |
|---|---|
|
The channel order used during model training ( |
|
The channel order of the runtime input, usually |
|
Runtime input dtype. When set to |
|
Runtime input layout. When set to |
|
Normalization parameters. By default, |
Note
The combination of tensor_format and src_format does not support RGB ↔ BGR channel swapping, and the compiled model will not reorder channels. Color space conversion is only used in the YUV input scenario.
9.2. YUV input#
Cameras usually output YUV formats such as NV12/NV21. Pulsar2 supports embedding YUV → RGB/BGR color space conversion into the model to avoid additional runtime overhead.
9.2.1. NV12 (YUV420SP)#
{
"model_type": "ONNX",
"npu_mode": "NPU1",
"quant": {
"input_configs": [
{
"tensor_name": "input",
"calibration_dataset": "./dataset/imagenet-32-images.tar",
"calibration_size": 32,
"calibration_mean": [103.939, 116.779, 123.68],
"calibration_std": [58.0, 58.0, 58.0]
}
],
"calibration_method": "MinMax",
"precision_analysis": false
},
"input_processors": [
{
"tensor_name": "input",
"tensor_format": "BGR",
"src_format": "YUV420SP",
"src_dtype": "U8",
"src_layout": "NHWC",
"csc_mode": "FullRange"
}
],
"compiler": {
"check": 0
}
}
9.2.2. NV21 (YVU420SP)#
Just change src_format to YVU420SP:
{
"input_processors": [
{
"tensor_name": "input",
"tensor_format": "BGR",
"src_format": "YVU420SP",
"src_dtype": "U8",
"src_layout": "NHWC",
"csc_mode": "FullRange"
}
]
}
9.2.3. YUYV422#
{
"input_processors": [
{
"tensor_name": "input",
"tensor_format": "BGR",
"src_format": "YUYV422",
"src_dtype": "U8",
"src_layout": "NHWC",
"csc_mode": "LimitedRange"
}
]
}
9.2.4. Parameter description#
Parameter |
Description |
Options |
|---|---|---|
|
The YUV format of the runtime input |
|
|
The expected color space of the model |
|
|
Color space conversion mode |
|
csc_mode details:
FullRange: Full-range YUV conversion coefficients, suitable for most cameras.LimitedRange: Limited-range (BT.601/BT.709) coefficients, suitable for video streams.Matrix: user-defined 3×4 conversion matrix, configured via thecsc_matfield.
Custom CSC matrix:
{
"input_processors": [
{
"tensor_name": "input",
"tensor_format": "BGR",
"src_format": "YUV420SP",
"src_dtype": "U8",
"src_layout": "NHWC",
"csc_mode": "Matrix",
"csc_mat": [1.164, 0.0, 1.596, -0.871,
1.164, -0.392, -0.813, 0.529,
1.164, 2.017, 0.0, -1.082]
}
]
}
Warning
After configuring YUV input,
src_layoutwill be automatically changed toNHWC.For NV12/NV21 input, the height of the input shape is 1.5× the original height (Y + UV planes).
In
csc_mat, the bias values (indices 3, 7, 11) must be in (-9, 8). The other parameters must be in (-524289, 524288).When validating accuracy on board, if
src_formatis YUV, it is recommended to use IVE TDP for resize. This preprocessing is aligned with OpenCV bilinear interpolation.
9.3. Static batch configuration#
The compiler builds the model for the user-specified batch sizes. Weights are shared between batches, so the output model size is much smaller than the sum of individual batch models.
Config file approach — add static_batch_sizes under compiler:
{
"model_type": "ONNX",
"npu_mode": "NPU1",
"quant": {
"input_configs": [
{
"tensor_name": "input",
"calibration_dataset": "./dataset/imagenet-32-images.tar",
"calibration_size": 32,
"calibration_mean": [103.939, 116.779, 123.68],
"calibration_std": [58.0, 58.0, 58.0]
}
],
"calibration_method": "MinMax"
},
"input_processors": [
{
"tensor_name": "input",
"tensor_format": "BGR",
"src_format": "BGR",
"src_dtype": "U8",
"src_layout": "NHWC"
}
],
"compiler": {
"check": 0,
"static_batch_sizes": [1, 2, 4]
}
}
Command line approach:
pulsar2 build --target_hardware AX650 --input model.onnx --output_dir output --config config.json --compiler.static_batch_sizes 1 2 4
Hint
Take mobilenetv2 as an example. The original input shape is [1, 224, 224, 3]. After setting static_batch_sizes to [1, 2, 4], the input shape of the compilation output becomes [4, 224, 224, 3].
Attention
Static batch and dynamic batch modes are mutually exclusive and cannot be configured at the same time.
If the model contains the
Reshapeoperator, you may need to use the Constant Data Patch feature to change the batch dimension of shapes to-1or0.
9.4. Dynamic batch configuration#
The compiler automatically derives a batch-size set that the NPU can run efficiently and that does not exceed max_dynamic_batch_size. At runtime, the inference framework splits the actual batch into multiple runs if needed.
Config file approach — add max_dynamic_batch_size under compiler:
{
"model_type": "ONNX",
"npu_mode": "NPU1",
"quant": {
"input_configs": [
{
"tensor_name": "input",
"calibration_dataset": "./dataset/imagenet-32-images.tar",
"calibration_size": 32,
"calibration_mean": [103.939, 116.779, 123.68],
"calibration_std": [58.0, 58.0, 58.0]
}
],
"calibration_method": "MinMax"
},
"input_processors": [
{
"tensor_name": "input",
"tensor_format": "BGR",
"src_format": "BGR",
"src_dtype": "U8",
"src_layout": "NHWC"
}
],
"compiler": {
"check": 0,
"max_dynamic_batch_size": 4
}
}
Command line approach:
pulsar2 build --target_hardware AX650 --input model.onnx --output_dir output --config config.json --compiler.max_dynamic_batch_size 4
Derivation rules:
The compiler starts from batch 1 and doubles the batch size (1 → 2 → 4 → ...). It stops when the batch exceeds the configured value or when the theoretical inference efficiency decreases.
Theoretical inference efficiency = theoretical inference time / batch_size.
Hint
When max_dynamic_batch_size is set to 4, the compilation output may include three batches: [1, 2, 4].
At runtime, the inference framework automatically splits the workload:
batch=3 → internally runs batch 2 + batch 1 (two inferences)
batch=9 → internally runs batch 4 + batch 4 + batch 1 (three inferences)
9.5. Multi-input configuration#
When an ONNX model has multiple inputs (such as stereo vision, image + mask, multi-sensor fusion, etc.), you need to configure input_configs and input_processors for each input separately.
{
"model_type": "ONNX",
"npu_mode": "NPU1",
"quant": {
"input_configs": [
{
"tensor_name": "rgb_image",
"calibration_dataset": "./dataset/rgb_images.tar",
"calibration_size": 32,
"calibration_mean": [103.939, 116.779, 123.68],
"calibration_std": [58.0, 58.0, 58.0]
},
{
"tensor_name": "depth_map",
"calibration_dataset": "./dataset/depth_maps.tar",
"calibration_format": "Numpy",
"calibration_size": 32,
"calibration_mean": [0],
"calibration_std": [1]
}
],
"calibration_method": "MinMax",
"precision_analysis": false
},
"input_processors": [
{
"tensor_name": "rgb_image",
"tensor_format": "BGR",
"src_format": "BGR",
"src_dtype": "U8",
"src_layout": "NHWC"
},
{
"tensor_name": "depth_map",
"tensor_format": "GRAY",
"src_format": "GRAY",
"src_dtype": "FP32",
"src_layout": "NCHW"
}
],
"compiler": {
"check": 0
}
}
Key points:
Each input needs an independent
input_configsentry and an independentinput_processorsentry.Different inputs can use different calibration datasets, data formats, and normalization parameters.
calibration_formatsupports four formats:Image(default),Numpy,Binary, andNumpyObject.If all inputs share the same configuration, you can set
tensor_nametoDEFAULT.
Mapping between calibration data and inputs:
Input tensor |
Calibration format |
Dataset content |
Notes |
|---|---|---|---|
rgb_image |
Image (default) |
JPEG/PNG packed into a tar file |
The toolchain reads images and normalizes automatically |
depth_map |
Numpy |
.npy files packed into a tar file |
Must be preprocessed into numpy arrays that match the model input shape |
Warning
tensor_name must match the actual input names in the ONNX model. For simulation run, you need to prepare a bin file for each input, and the file name must match the tensor name.
9.6. Skip onnxslim (onnxsim)#
By default, pulsar2 build runs internal graph optimization on the ONNX model using the open source onnxslim tool. In some scenarios (for example, the model has already been manually optimized, it contains custom operators, or the optimization causes compilation failure), you may need to skip these optimization steps.
Command line approach:
pulsar2 build --target_hardware AX650 --input model.onnx --output_dir output --config config.json --onnx_opt.disable_onnx_optimization true
Config file approach — add onnx_opt at the top level:
{
"model_type": "ONNX",
"npu_mode": "NPU1",
"onnx_opt": {
"disable_onnx_optimization": true
},
"quant": {
"input_configs": [
{
"tensor_name": "input",
"calibration_dataset": "./dataset/imagenet-32-images.tar",
"calibration_size": 32,
"calibration_mean": [103.939, 116.779, 123.68],
"calibration_std": [58.0, 58.0, 58.0]
}
],
"calibration_method": "MinMax"
},
"input_processors": [
{
"tensor_name": "input",
"tensor_format": "BGR",
"src_format": "BGR",
"src_dtype": "U8",
"src_layout": "NHWC"
}
],
"compiler": {
"check": 0
}
}