Microsoft Research Blog

Advances to low-bit quantization enable LLMs on edge devices

February 5, 2025 | Shijie Cao, Lingxiao Ma, and Ting Cao
Advances in low-bit quantization techniques enable efficient operation of LLMs on resource-constrained edge devices. Discover how innovations like T-MAC, Ladder, and LUT Tensor Core improve computational efficiency and enhance hardware compatibility.

Recent Posts

  1. Research Focus: Week of January 31, 2025

    Research Focus: Week of January 27, 2025 

    January 31, 2025

    In this issue: A new approach to multimodal pretraining for remote sensing; Managed-retention memory for the AI era; Improving detection of macular telangiectasia type 2; Generalizing symbolic automata.

  2. Research Focus: January 17, 2025

    Research Focus: Week of January 13, 2025 

    January 17, 2025

    In this edition: Privacy enhancements for multiparty deep learning; using smaller, open-source models to provide relevance judgments; new tool uses AI, data to automate innovation and development; Yasuyuki Matsushita named IEEE 2025 Computer Society Fellow.

  3. White outline illustrations for AIOps on a blue and green gradient background.

    AIOpsLab: Building AI agents for autonomous clouds 

    December 20, 2024

    AIOpsLab is an open-source framework designed to evaluate and improve AI agents for cloud operations, offering standardized, scalable benchmarks for real-world testing, enhancing cloud system reliability.

  4. Research Focus: Week of December 16, 2024

    Research Focus: Week of December 16, 2024 

    December 18, 2024

    NeoMem: hardware/software co-design for CXL-native memory tiering; Chimera: accurate retrosynthesis prediction by ensembling models with diverse inductive biases; GA4GH task execution API enables multicloud task execution.

  5. Research Focus: Week of December 2, 2024

    Research Focus: Week of December 2, 2024 

    December 4, 2024

    Can a new SOS-RMT protocol enable more efficient CL-MPC?; A fair-by-design, cloud-based algorithmic trading platform; LLM2CLIP unlocks richer visual representation; New technique enhances Low-Rank Adaptation’s expressiveness, generalization capabilities.

Explore More

Events & conferences

Events & conferences 

Meet our community of researchers, learn about exciting research topics, and grow your network

Podcasts

Podcasts 

Ongoing conversations at the cutting edge of research

Microsoft Research Forum

Microsoft Research Forum 

Join us for a continuous exchange of ideas about research in the era of general AI