11 May 2022

Fundamental Methods of Prediction Speed-Ups

There are four fundamental ways in which one can speed-up prediction and reduce memory footprint of transformer models:
  • Knowledge Distillation
  • Quantization
  • Pruning
  • Graph Optimization