Quantization of model is a technique to reduce the size and improve the performance of deep learning models by reducing the precision of the model's weights. This is done by converting the values from high-precision floating-point numbers to lower-precision floating-point or integer numbers.
There are two main types of quantization:
Post-training quantization: This is done after the model has been trained and converted to a production format.
Quantization-aware training: This is done during the training process. The model is trained using quantized weights, which helps to reduce the impact on metrics of interest.
In this talk we will go through details of Quantization-aware training example with Keras.