2. QAT 4W4F support#
QAT (Quantization-Aware Training) refers to simulating the quantization process during model training to adapt the model to low-precision calculations and reduce accuracy loss after quantization. QAT usually inserts fake quantization operations in forward propagation to simulate low-bit quantization, but still uses FP32 to calculate gradients during back propagation.
For 4-bit quantization, please refer to the resnet50/config_4w4f configuration, and use simplify_and_fix_4bit_dtype to replace onnxsim/onnxslim.