2. QAT 4W4F support#

QAT (Quantization-Aware Training) refers to simulating the quantization process during model training to adapt the model to low-precision calculations and reduce accuracy loss after quantization. QAT usually inserts fake quantization operations in forward propagation to simulate low-bit quantization, but still uses FP32 to calculate gradients during back propagation.