Research Intern – Machine Learning and Optimization
The Machine Learning and Optimization (MLO) group in MSR-Redmond performs research in the intersection of optimization, machine learning and systems. Our focus right now is in combining Large Language Model (LLM) technology with optimization for…
Senior Researcher – AI and Systems Reliability – Microsoft Research
We are seeking Senior Researcher – AI and Systems Reliability – Microsoft Research areas such as distributed systems and reliability, formal methods and verification, machine learning for system reliability, and reliability of machine learning systems.…
Research Intern – AI System Architecture Modeling and Performance
The Azure Hardware and Systems Infrastructure organization is central to defining Microsoft’s first-party Artificial Intelligence (AI) infrastructure architecture and strategy. This is a dynamic and fast-paced environment that in close partnership with sister organizations helps…
Agent Lightning: Adding reinforcement learning to AI agents without code rewrites
By decoupling how agents work from how they’re trained, Agent Lightning turns each step an agent takes into data for reinforcement learning. This makes it easy for developers to improve agent performance with almost zero…
Research Intern – AI Network Observability
As a Research Intern in the Strategic Planning and Architecture (SPARC) group, you will contribute to the research, design, and development of tools to provide insights into multi-path network transports for large-scale Artificial Intelligence (AI)…
Research Intern – AI Frameworks (Network Systems and Tools)
Advances in Artificial Intelligence (AI) increasingly depend on breakthroughs in systems and architecture, where hardware, models, and software must be co-designed to scale efficiently. This Research Internship offers the opportunity to explore next-generation AI systems…
Research Intern – Data Center and AI Networking
As a Research Intern in the Strategic Planning and Architecture (SPARC) group, you will contribute to the research, design, and development of network transport features and telemetry systems for large scale data center and Artificial…