Microsoft Research Blog

Advances to low-bit quantization enable LLMs on edge devices 

February 5, 2025 | Shijie Cao, Lingxiao Ma, and Ting Cao
Advances in low-bit quantization techniques enable efficient operation of LLMs on resource-constrained edge devices. Discover how innovations like T-MAC, Ladder, and LUT Tensor Core improve computational efficiency and enhance hardware compatibility.

Recent Posts

  1. graphical user interface, application, icon

    Advances to low-bit quantization enable LLMs on edge devices 

    February 5, 2025 | Shijie Cao, Lingxiao Ma, and Ting Cao

    Advances in low-bit quantization techniques enable efficient operation of LLMs on resource-constrained edge devices. Discover how innovations like T-MAC, Ladder, and LUT Tensor Core improve computational efficiency and enhance hardware compatibility.

  2. Research Focus: Week of January 31, 2025

    Research Focus: Week of January 27, 2025 

    January 31, 2025

    In this issue: A new approach to multimodal pretraining for remote sensing; Managed-retention memory for the AI era; Improving detection of macular telangiectasia type 2; Generalizing symbolic automata.

  3. Research Focus: January 17, 2025

    Research Focus: Week of January 13, 2025 

    January 17, 2025

    In this edition: Privacy enhancements for multiparty deep learning; using smaller, open-source models to provide relevance judgments; new tool uses AI, data to automate innovation and development; Yasuyuki Matsushita named IEEE 2025 Computer Society Fellow.

  4. White outline illustrations for AIOps on a blue and green gradient background.

    AIOpsLab: Building AI agents for autonomous clouds 

    December 20, 2024

    AIOpsLab is an open-source framework designed to evaluate and improve AI agents for cloud operations, offering standardized, scalable benchmarks for real-world testing, enhancing cloud system reliability.

  5. Research Focus: Week of December 16, 2024

    Research Focus: Week of December 16, 2024 

    December 18, 2024

    NeoMem: hardware/software co-design for CXL-native memory tiering; Chimera: accurate retrosynthesis prediction by ensembling models with diverse inductive biases; GA4GH task execution API enables multicloud task execution.

  6. Research Focus: Week of December 2, 2024

    Research Focus: Week of December 2, 2024 

    December 4, 2024

    Can a new SOS-RMT protocol enable more efficient CL-MPC?; A fair-by-design, cloud-based algorithmic trading platform; LLM2CLIP unlocks richer visual representation; New technique enhances Low-Rank Adaptation’s expressiveness, generalization capabilities.

Explore More

  • Events & conferences

    Events & conferences 

    Meet our community of researchers, learn about exciting research topics, and grow your network

  • Podcasts

    Podcasts 

    Ongoing conversations at the cutting edge of research

  • Microsoft Research Forum

    Microsoft Research Forum 

    Join us for a continuous exchange of ideas about research in the era of general AI