About
Research Areas:
1. Low-cost inference and cost-efficient post-training on heterogeneous accelerators
2. Accelerating relational databases and query systems
3. Distributed and P2P optimization for LLMs and query processing
4. Cross-platform compilation for Edge & Cloud (CUDA, ROCm, DirectX, Vulkan, SYCL, WebGPU)
5. Coding agents and multimodal inference systems
6. Quantization and compression for standard hardware without native ISA (e.g., NVFP4, MXFP4, FP8)
Other Works:
microsoft/tutel: Tutel MoE: An Optimized Mixture-of-Experts Implementation (opens in new tab)