About
Research Areas (AI infra):
1. Low-cost inference and cost-efficient post-training on heterogeneous accelerators
2. Accelerating relational databases and query systems
3. Distributed and NVLink/XGMI optimization for LLMs and query processing
4. Cross-platform compilation for Cloud & Edge (CUDA, ROCm, DirectX, Vulkan, SYCL, WebGPU)
5. Coding agents, sandbox and multimodal inference systems
6. Advance Quantization and compression for cost-performance hardware (e.g., NVFP4, MXFP4, FP8 for Intel/AMD/A100/..)
Other Works:
microsoft/tutel: Tutel MoE: An Optimized Mixture-of-Experts Implementation (opens in new tab)