anwaar-khalid - Articles by Anwaar Khalid

Early Breakthroughs in Transformer Quantization

By Anwaar Khalid in Quantization on Thu 16 July 2026

In my previous blogpost, we discussed in detail the outlier problem in transformers. We realized that outliers are not just random noise - they encode important directional information that lets the model selectively skip updating token...

Why do transformers have outliers?

By Anwaar Khalid in Quantization on Wed 01 July 2026

Modern Machine Learning models are trained with a large number of parameters, often too large, and this overparameterization is very useful during training as it creates a vast search space for the model to encode rich representations from data...

Integer Quantization: Deep Dive 🤿

By Anwaar Khalid in Quantization on Thu 18 June 2026

A lot has happened in transformer quantization over the past few years, from barely being able to quantize a 7B model in INT8 without destroying accuracy, to routinely fitting a 70B model in 4-bits on a single GPU. But existing guides on the...

Model Compression: A Survey of Techniques

By Anwaar Khalid in Compression on Sat 10 August 2024

Machine Learning (ML) has witnessed a surge in interest in recent years driven by the availability of large-scale datasets, advances in ML frameworks such as PyTorch and TensorFlow, rise of hardware accelerators (e.g., GPUs and TPUs) that enable...

Case Study: Compressing DeepMind’s RepNet for Edge Deployment

By Anwaar Khalid in Compression on Mon 10 June 2024

In this case study, we explore compressing neural networks for efficient deployment on edge devices with limited resources. We explore practical techniques like quantization, pruning, and tensorization using off-the-shelf open-source tools. Our...