Portrait of Wei Cui

Wei Cui

Principal Researcher

About

Research Areas:

1. Low-cost inference and cost-efficient post-training on heterogeneous accelerators

2. Accelerating relational databases and query systems

3. Distributed and P2P optimization for LLMs and query processing

4. Cross-platform compilation for Edge & Cloud (CUDA, ROCm, DirectX, Vulkan, SYCL, WebGPU)

5. Coding agents and multimodal inference systems

6. Quantization and compression for standard hardware without native ISA (e.g., NVFP4, MXFP4, FP8)

 

Other Works:

microsoft/tutel: Tutel MoE: An Optimized Mixture-of-Experts Implementation (opens in new tab)

microsoft/antares: Antares: an automatic engine for multi-platform kernel generation and optimization (opens in new tab)