5. Common configuration examples#

This section provides configuration examples for common pulsar2 build scenarios, which can be quickly referenced and reused. All examples are based on the AX650 platform.

Note

For the complete definition of configuration fields, please refer to Configuration File Detailed Description
tensor_name must match the actual tensor names defined in the ONNX model. You can check them via onnx inspect --io model.onnx

5.1. RGB input#

This is the most common configuration for image models. input_processors declares the runtime input data attributes of compiled.axmodel. The toolchain automatically embeds preprocessing operators into the model based on the configuration (such as dtype conversion, normalization, and layout conversion).

Warning

The combination of tensor_format and src_format does not support RGB ↔ BGR channel swapping. If you set src_format to BGR and tensor_format to RGB (or vice versa), the compiled model will not embed a channel-reorder operator. Color space conversion is only supported in the YUV input scenario.

5.1.1. Preprocessing is done inside compiled.axmodel#

Embed preprocessing (normalization and layout conversion) into compiled.axmodel with the following configuration:

{
  "model_type": "ONNX",
  "npu_mode": "NPU1",
  "quant": {
    "input_configs": [
      {
        "tensor_name": "input",
        "calibration_dataset": "./dataset/imagenet-32-images.tar",
        "calibration_size": 32,
        "calibration_mean": [103.939, 116.779, 123.68],
        "calibration_std": [58.0, 58.0, 58.0]
      }
    ],
    "calibration_method": "MinMax",
    "precision_analysis": false
  },
  "input_processors": [
    {
      "tensor_name": "input",
      "tensor_format": "BGR",
      "tensor_layout": "NHWC",
      "src_format": "BGR",
      "src_dtype": "U8",
      "src_layout": "NHWC"
    }
  ],
  "compiler": {
    "check": 0
  }
}

Key configuration:

Set src_dtype to U8: the input of compiled.axmodel becomes U8, and the toolchain automatically inserts an AxDequantizeLinear dequantization operator in the frontend to convert U8 to FP32 as required by the model.
Set src_layout to NHWC: the toolchain inserts an AxTranspose operator to convert NHWC to NCHW as required by the model.
calibration_mean / calibration_std: the toolchain inserts an AxNormalize operator to perform normalization.

You can confirm that preprocessing operators are embedded from the build log (check the output after Building native):

... | WARNING  | yamain.command.load_model:pre_process:616 - preprocess tensor [input]
... | INFO     | yamain.command.load_model:pre_process:618 - tensor: input, (1, 224, 224, 3), U8
... | INFO     | yamain.command.load_model:pre_process:619 - op: op:pre_dequant_1, AxDequantizeLinear, {'const_inputs': {'x_zeropoint': array(0, dtype=int32), 'x_scale': array(1., dtype=float32)}, 'output_dtype': <class 'numpy.float32'>, 'quant_method': 0}
... | INFO     | yamain.command.load_model:pre_process:618 - tensor: tensor:pre_norm_1, (1, 224, 224, 3), FP32
... | INFO     | yamain.command.load_model:pre_process:619 - op: op:pre_norm_1, AxNormalize, {'dim': 3, 'mean': [103.93900299072266, 116.77899932861328, 123.68000030517578], 'std': [58.0, 58.0, 58.0], 'output_dtype': FP32}
... | INFO     | yamain.command.load_model:pre_process:618 - tensor: tensor:pre_transpose_1, (1, 224, 224, 3), FP32
... | INFO     | yamain.command.load_model:pre_process:619 - op: op:pre_transpose_1, AxTranspose, {'perm': [0, 3, 1, 2]}
... | WARNING  | yamain.command.load_model:post_process:627 - postprocess tensor [output]

Between preprocess tensor [input] and postprocess tensor [output], the log shows three preprocessing operators:

AxDequantizeLinear: U8 → FP32 dtype conversion
AxNormalize: normalization (subtract mean, divide by std)
AxTranspose: NHWC → NCHW layout conversion

5.1.2. Preprocessing is NOT done inside compiled.axmodel#

If you want to perform preprocessing on CPU side (normalization, layout conversion, etc.) and then feed it to NPU for inference, you should configure input_processors to be exactly the same as the floating-point model input:

{
  "model_type": "ONNX",
  "npu_mode": "NPU1",
  "quant": {
    "input_configs": [
      {
        "tensor_name": "input",
        "calibration_dataset": "./dataset/imagenet-32-images.tar",
        "calibration_size": 32,
        "calibration_mean": [103.939, 116.779, 123.68],
        "calibration_std": [58.0, 58.0, 58.0]
      }
    ],
    "calibration_method": "MinMax",
    "precision_analysis": false
  },
  "input_processors": [
    {
      "tensor_name": "input",
      "tensor_format": "BGR",
      "tensor_layout": "NCHW",
      "src_format": "BGR",
      "src_dtype": "FP32",
      "src_layout": "NCHW",
      "mean": [0, 0, 0],
      "std": [1, 1, 1]
    }
  ],
  "compiler": {
    "check": 0
  }
}

Key configuration:

Set src_dtype to FP32: same as the model input type, no dtype-conversion operator is inserted.
Set src_layout to NCHW: same as the model input layout, no layout-conversion operator is inserted.
Explicitly set mean to [0, 0, 0] and std to [1, 1, 1]: override the default values from calibration_mean / calibration_std so that no normalization operator is inserted.

Attention

You must explicitly configure mean and std. If they are not configured, the toolchain will use calibration_mean / calibration_std by default and will still embed a normalization operator into the model.

You can confirm that no preprocessing operator is embedded from the build log (check the output after Building native):

... | WARNING  | yamain.command.load_model:pre_process:616 - preprocess tensor [input]
... | WARNING  | yamain.command.load_model:post_process:627 - postprocess tensor [output]

If there is no op: line between preprocess tensor [input] and postprocess tensor [output], it means compiled.axmodel does not include preprocessing operators. At runtime, users need to do the following by themselves:

Image decode and resize
BGR channel normalization (subtract mean, divide by std)
NHWC → NCHW layout conversion
Convert dtype to FP32

5.1.3. Field description#

Field	Description
`tensor_format`	The channel order used during model training (`RGB` or `BGR`), used for color space conversion when reading calibration data.
`src_format`	The channel order of the runtime input, usually `BGR` (OpenCV default).
`src_dtype`	Runtime input dtype. When set to `U8`, a dequantization operator will be embedded; when set to `FP32`, it will not be embedded.
`src_layout`	Runtime input layout. When set to `NHWC`, layout conversion is automatically embedded; when set to `NCHW`, it will not be embedded.
`mean` / `std`	Normalization parameters. By default, `calibration_mean` / `calibration_std` are used. Setting them explicitly to `[0,0,0]` / `[1,1,1]` disables embedding normalization.

Note

The combination of tensor_format and src_format does not support RGB ↔ BGR channel swapping, and the compiled model will not reorder channels. Color space conversion is only used in the YUV input scenario.

5.2. YUV input#

Cameras usually output YUV formats such as NV12/NV21. Pulsar2 supports embedding YUV → RGB/BGR color space conversion into the model to avoid additional runtime overhead.

5.2.1. NV12 (YUV420SP)#

{
  "model_type": "ONNX",
  "npu_mode": "NPU1",
  "quant": {
    "input_configs": [
      {
        "tensor_name": "input",
        "calibration_dataset": "./dataset/imagenet-32-images.tar",
        "calibration_size": 32,
        "calibration_mean": [103.939, 116.779, 123.68],
        "calibration_std": [58.0, 58.0, 58.0]
      }
    ],
    "calibration_method": "MinMax",
    "precision_analysis": false
  },
  "input_processors": [
    {
      "tensor_name": "input",
      "tensor_format": "BGR",
      "src_format": "YUV420SP",
      "src_dtype": "U8",
      "src_layout": "NHWC",
      "csc_mode": "FullRange"
    }
  ],
  "compiler": {
    "check": 0
  }
}

5.2.2. NV21 (YVU420SP)#

Just change src_format to YVU420SP:

{
  "input_processors": [
    {
      "tensor_name": "input",
      "tensor_format": "BGR",
      "src_format": "YVU420SP",
      "src_dtype": "U8",
      "src_layout": "NHWC",
      "csc_mode": "FullRange"
    }
  ]
}

5.2.3. YUYV422#

{
  "input_processors": [
    {
      "tensor_name": "input",
      "tensor_format": "BGR",
      "src_format": "YUYV422",
      "src_dtype": "U8",
      "src_layout": "NHWC",
      "csc_mode": "LimitedRange"
    }
  ]
}

5.2.4. Parameter description#

Parameter	Description	Options
`src_format`	The YUV format of the runtime input	`YUV420SP` (NV12), `YVU420SP` (NV21), `YUYV422`, `UYVY422`
`tensor_format`	The expected color space of the model	`BGR`, `RGB`
`csc_mode`	Color space conversion mode	`FullRange`, `LimitedRange`, `Matrix`

csc_mode details:

FullRange: Full-range YUV conversion coefficients, suitable for most cameras.
LimitedRange: Limited-range (BT.601/BT.709) coefficients, suitable for video streams.
Matrix: user-defined 3×4 conversion matrix, configured via the csc_mat field.

Custom CSC matrix:

{
  "input_processors": [
    {
      "tensor_name": "input",
      "tensor_format": "BGR",
      "src_format": "YUV420SP",
      "src_dtype": "U8",
      "src_layout": "NHWC",
      "csc_mode": "Matrix",
      "csc_mat": [1.164, 0.0, 1.596, -0.871,
                  1.164, -0.392, -0.813, 0.529,
                  1.164, 2.017, 0.0, -1.082]
    }
  ]
}

Warning

After configuring YUV input, src_layout will be automatically changed to NHWC.
For NV12/NV21 input, the height of the input shape is 1.5× the original height (Y + UV planes).
In csc_mat, the bias values (indices 3, 7, 11) must be in (-9, 8). The other parameters must be in (-524289, 524288).
When validating accuracy on board, if src_format is YUV, it is recommended to use IVE TDP for resize. This preprocessing is aligned with OpenCV bilinear interpolation.

5.3. Static batch configuration#

The compiler builds the model for the user-specified batch sizes. Weights are shared between batches, so the output model size is much smaller than the sum of individual batch models.

Config file approach — add static_batch_sizes under compiler:

{
  "model_type": "ONNX",
  "npu_mode": "NPU1",
  "quant": {
    "input_configs": [
      {
        "tensor_name": "input",
        "calibration_dataset": "./dataset/imagenet-32-images.tar",
        "calibration_size": 32,
        "calibration_mean": [103.939, 116.779, 123.68],
        "calibration_std": [58.0, 58.0, 58.0]
      }
    ],
    "calibration_method": "MinMax"
  },
  "input_processors": [
    {
      "tensor_name": "input",
      "tensor_format": "BGR",
      "src_format": "BGR",
      "src_dtype": "U8",
      "src_layout": "NHWC"
    }
  ],
  "compiler": {
    "check": 0,
    "static_batch_sizes": [1, 2, 4]
  }
}

Command line approach:

pulsar2 build --target_hardware AX650 --input model.onnx --output_dir output --config config.json --compiler.static_batch_sizes 1 2 4

Hint

Take mobilenetv2 as an example. The original input shape is [1, 224, 224, 3]. After setting static_batch_sizes to [1, 2, 4], the input shape of the compilation output becomes [4, 224, 224, 3].

Attention

Static batch and dynamic batch modes are mutually exclusive and cannot be configured at the same time.
If the model contains the Reshape operator, you may need to use the Constant Data Patch feature to change the batch dimension of shapes to -1 or 0.

5.4. Dynamic batch configuration#

The compiler automatically derives a batch-size set that the NPU can run efficiently and that does not exceed max_dynamic_batch_size. At runtime, the inference framework splits the actual batch into multiple runs if needed.

Config file approach — add max_dynamic_batch_size under compiler:

{
  "model_type": "ONNX",
  "npu_mode": "NPU1",
  "quant": {
    "input_configs": [
      {
        "tensor_name": "input",
        "calibration_dataset": "./dataset/imagenet-32-images.tar",
        "calibration_size": 32,
        "calibration_mean": [103.939, 116.779, 123.68],
        "calibration_std": [58.0, 58.0, 58.0]
      }
    ],
    "calibration_method": "MinMax"
  },
  "input_processors": [
    {
      "tensor_name": "input",
      "tensor_format": "BGR",
      "src_format": "BGR",
      "src_dtype": "U8",
      "src_layout": "NHWC"
    }
  ],
  "compiler": {
    "check": 0,
    "max_dynamic_batch_size": 4
  }
}

Command line approach:

pulsar2 build --target_hardware AX650 --input model.onnx --output_dir output --config config.json --compiler.max_dynamic_batch_size 4

Derivation rules:

The compiler starts from batch 1 and doubles the batch size (1 → 2 → 4 → ...). It stops when the batch exceeds the configured value or when the theoretical inference efficiency decreases.
Theoretical inference efficiency = theoretical inference time / batch_size.

Hint

When max_dynamic_batch_size is set to 4, the compilation output may include three batches: [1, 2, 4].

At runtime, the inference framework automatically splits the workload:

batch=3 → internally runs batch 2 + batch 1 (two inferences)
batch=9 → internally runs batch 4 + batch 4 + batch 1 (three inferences)

5.5. Multi-input configuration#

When an ONNX model has multiple inputs (such as stereo vision, image + mask, multi-sensor fusion, etc.), you need to configure input_configs and input_processors for each input separately.

{
  "model_type": "ONNX",
  "npu_mode": "NPU1",
  "quant": {
    "input_configs": [
      {
        "tensor_name": "rgb_image",
        "calibration_dataset": "./dataset/rgb_images.tar",
        "calibration_size": 32,
        "calibration_mean": [103.939, 116.779, 123.68],
        "calibration_std": [58.0, 58.0, 58.0]
      },
      {
        "tensor_name": "depth_map",
        "calibration_dataset": "./dataset/depth_maps.tar",
        "calibration_format": "Numpy",
        "calibration_size": 32,
        "calibration_mean": [0],
        "calibration_std": [1]
      }
    ],
    "calibration_method": "MinMax",
    "precision_analysis": false
  },
  "input_processors": [
    {
      "tensor_name": "rgb_image",
      "tensor_format": "BGR",
      "src_format": "BGR",
      "src_dtype": "U8",
      "src_layout": "NHWC"
    },
    {
      "tensor_name": "depth_map",
      "tensor_format": "GRAY",
      "src_format": "GRAY",
      "src_dtype": "FP32",
      "src_layout": "NCHW"
    }
  ],
  "compiler": {
    "check": 0
  }
}

Key points:

Each input needs an independent input_configs entry and an independent input_processors entry.
Different inputs can use different calibration datasets, data formats, and normalization parameters.
calibration_format supports four formats: Image (default), Numpy, Binary, and NumpyObject.
If all inputs share the same configuration, you can set tensor_name to DEFAULT.

Mapping between calibration data and inputs:

Input tensor	Calibration format	Dataset content	Notes
rgb_image	Image (default)	JPEG/PNG packed into a tar file	The toolchain reads images and normalizes automatically
depth_map	Numpy	.npy files packed into a tar file	Must be preprocessed into numpy arrays that match the model input shape

Warning

tensor_name must match the actual input names in the ONNX model. For simulation run, you need to prepare a bin file for each input, and the file name must match the tensor name.

5.6. Skip onnxslim#

By default, pulsar2 build runs internal graph optimization on the ONNX model using the open source onnxslim tool. In some scenarios (for example, the model has already been manually optimized, it contains custom operators, or the optimization causes compilation failure), you may need to skip these optimization steps.

Command line approach:

pulsar2 build --target_hardware AX650 --input model.onnx --output_dir output --config config.json --onnx_opt.disable_onnx_optimization true

Config file approach — add onnx_opt at the top level:

{
  "model_type": "ONNX",
  "npu_mode": "NPU1",
  "onnx_opt": {
    "disable_onnx_optimization": true
  },
  "quant": {
    "input_configs": [
      {
        "tensor_name": "input",
        "calibration_dataset": "./dataset/imagenet-32-images.tar",
        "calibration_size": 32,
        "calibration_mean": [103.939, 116.779, 123.68],
        "calibration_std": [58.0, 58.0, 58.0]
      }
    ],
    "calibration_method": "MinMax"
  },
  "input_processors": [
    {
      "tensor_name": "input",
      "tensor_format": "BGR",
      "src_format": "BGR",
      "src_dtype": "U8",
      "src_layout": "NHWC"
    }
  ],
  "compiler": {
    "check": 0
  }
}

Common configuration examples

Contents

5. Common configuration examples#

5.1. RGB input#

5.1.1. Preprocessing is done inside compiled.axmodel#

5.1.2. Preprocessing is NOT done inside compiled.axmodel#

5.1.3. Field description#

5.2. YUV input#

5.2.1. NV12 (YUV420SP)#

5.2.2. NV21 (YVU420SP)#

5.2.3. YUYV422#

5.2.4. Parameter description#

5.3. Static batch configuration#

5.4. Dynamic batch configuration#

5.5. Multi-input configuration#

5.6. Skip onnxslim#