As large language models and other large-scale AI technologies advance rapidly, we are witnessing a profound societal transformation. These models not only possess general-purpose capabilities—excelling at tasks such as translation, question-answering, and programming—but in many scenarios perform at a level comparable to or even surpassing humans, becoming powerful, readily accessible tools across various domains.
As Brad Smith, President of Microsoft, has said: “The more powerful the tool, the greater the good—or harm—it can do.”
While the immense power of AI brings convenience, it also introduces unprecedented challenges. The complexity of the technology intertwined with its broad societal impact means that without careful guidance, potential risks could be amplified. To ensure AI coexists harmoniously with human society, evolves collaboratively, and possesses sufficient resilience, advancing research in “Societal AI” is critically important. This field emphasizes interdisciplinary integration of computer science and social sciences to systematically address the complex dynamics of AI reshaping the world.
If you are eager to address the societal challenges and opportunities arising from large-scale AI models like LLMs, fostering a harmonious integration through multidisciplinary research in computer and social sciences, we warmly invite you to join the Microsoft Research Asia StarTrack Scholars Program. Applications are now open for the 2026 program. Last year’s article provided a detailed introduction to the research theme of Societal AI.For more details and to submit your registration, visit our official website: Microsoft Research Asia StarTrack Scholars Program – Microsoft Research.
In 2025, two StarTrack scholars joined the Social Computing Group at Microsoft Research Asia to explore how to make AI “more responsible”:
– Yupeng Li, Assistant Professor in the Department of Interactive Media at Hong Kong Baptist University, pursuing the question of “trustworthiness in large language models”;
– Ziang Xiao, Assistant Professor at Johns Hopkins University, pioneering the introduction of “psychometrics into AI evaluation”.
Though hailing from different regions and following distinct research paths, these two scholars worked in the same group, at the same time, and in the same place, pushing the foundational propositions of Societal AI—”trustworthiness” and “science”—to deeper levels.
StarTrack Scholar Yupeng Li: “I Want to Tackle Problems That Won’t Be Solved in Ten Years”
In 2025, Dr. Yupeng Li, Assistant Professor in the Department of Interactive Media at Hong Kong Baptist University, joined Microsoft Research Asia’s Social Computing Group through the StarTrack Program and collaborated closely with researchers Xing Xie and Fangzhao Wu. This nearly six-month immersive offline collaboration allowed Yupeng to fully experience the open and friendly research atmosphere at Microsoft Research Asia, while daily interactions with the team helped him identify a core issue in Societal AI—Trustworthy AI for large language models.
From “Hearing About It” to “Joining”: A Natural Collaboration
As early as his PhD years, Yupeng had heard from senior colleagues about Microsoft Research Asia’s visiting scholar programs. Over the past decade, many Asian scholars who participated have become leaders in their fields—living testimonials to the program. This legacy gave him high expectations for the StarTrack Program, and a strong alignment with researcher Xing Xie on the key topic of trustworthy AI ultimately led to his participation.
At the end of 2024, he applied without hesitation. In May of the following year, he arrived in Beijing with his suitcase, spending from early summer to late autumn—half a year—fully immersed in the StarTrack experience at Microsoft Research Asia.

Starting from Scratch: Pursuing Fundamental Questions
From the outset, both sides agreed on a clear yet challenging path: rather than extending existing projects, they would start from zero to address the most fundamental question—How can we make AI truly trustworthy in the era of large models?
They broke this grand question into three progressively deeper sub-directions:
– Systematic detection and mitigation of hallucinations in large language models;
– When models are endowed with “human-like” attributes, whether their values shift or they exhibit proactive deceptive behavior;
– As human assistants, whether they can remain consistently reliable and safe in long-term interactions, especially in extreme scenarios involving children, ethics, or public safety.
“We deliberately avoided superficial issues that the next large model might easily fix,” Yupeng said. “What we want to do is identify essential problems that will still require human wisdom to answer and solve even after another decade or two of technological iteration.”
Facing the highly complex behavior and rapid iteration cycles of large models, the team developed a unique intellectual tension: Xing Xie’s inclusiveness and cross-disciplinary approach, combined with the group’s diverse members, enabled them to re-examine problems from perspectives in communication, psychology, sociology, and beyond, jointly exploring solutions. This combination of “diversity” and “inclusiveness” significantly broadened and deepened Yupeng’s research, further refining his awareness and academic taste regarding “what kind of research to do” and “how to do research.”
Offline StarTrack: An Immersive Research Experience
Beyond existing research outcomes, what truly impressed Yupeng were the countless “accidental” sparks during his visit. Many key breakthroughs did not come from formal meetings but emerged suddenly during casual chats in the cafeteria, corridors, or coffee corners. “Sometimes a casual ‘Why don’t we try this angle?’ could overturn a month’s worth of assumptions in just ten minutes,” he said. “This kind of anytime-anywhere intellectual collision is irreplaceable by online collaboration.”
To sustain deep collaboration, Yupeng recommended one of his PhD students as an intern at the lab, where the student remains today. Weekly meetings, experiment reproduction, and paper discussions have become the most solid bridge after the visit. “The visit lasts only a few months, but students can stay long-term. This is a worthwhile way to transform short-term visits into long-term collaboration.”
Over six months, Yupeng participated in every StarTrack activity—Ignition Talks, StarTrack lectures, group meetings, and offline gatherings—and connected with scholars such as Professor Yu Xie from Princeton University and young researchers from the University of Chicago and Carnegie Mellon University through the team’s platform, significantly broadening his international perspective. He highly praised the experience: “Here you have world-class computing resources, a free and relaxed academic atmosphere, and frequent, deep intellectual exchanges under an international vision.”
In his final interview, Yupeng shared heartfelt advice for young scholars planning to apply for the 2026 StarTrack Program: “If time permits, definitely apply—this is one of the best platforms for deep collaboration between corporate research institutes and young scholars. Once here, don’t just dip your toes in; only by fully immersing yourself and investing substantial time and energy can you reap the greatest rewards. Most importantly, communicate thoroughly with your intended mentor during the application stage to ensure strong alignment of research interests—this is the key to successful collaboration. Here, you will meet a more capable version of yourself.”
Though the visit has ended, collaboration continues. Projects between Yupeng and the StarTrack team are still advancing, and the research sparks continue to burn brightly. In the future, the StarTrack Program at Microsoft Research Asia looks forward to welcoming more outstanding young scholars dedicated to Societal AI to jointly explore a trustworthy and benevolent future for AI in the era of large models.
StarTrack Scholar Ziang Xiao: Bringing Psychometrics into the Scientific Practice of AI Evaluation
In the summer of 2025, Professor Ziang Xiao from Johns Hopkins University arrived at Microsoft Research Asia through the StarTrack Program and conducted in-depth collaboration with researchers Xing Xie and Xiaoyuan Yi in the Social Computing Group, focusing on an emerging yet under-systematized field: the Science of AI Evaluation.
Ziang’s connection with MSRA actually began earlier. During his postdoctoral fellowship at Microsoft Research Montréal, his project overlapped significantly with the directions of Xing Xie and Xiaoyuan Yi, planting the seeds of collaboration. When he saw the new round of StarTrack recruitment on the MSRA public account, he applied without hesitation.
He arrived with a simple wish: to learn and exchange ideas. When he left, he was deeply engaged in several large parallel projects and a group of long-term collaborators. After hearing about his experience, “I’ll apply next round” became the most common response from his colleagues.

Starting from Problems: Building an Item-Level Response Database for AI Evaluation
“We observed a widespread phenomenon: although AI evaluation studies are surging, the vast majority only report final scores, almost no one preserves the model’s raw outputs on each item, and even fewer systematically validate the validity and reliability of the evaluations themselves.”
Drawing on his psychology background and the human-like attributes of AI, Ziang wondered: Could we, like psychometrics has done over the past century, accumulate hundreds of thousands or even millions of complete “answer records” and build a super-large item-level response database for AI? Only with such a database can researchers truly verify whether traditional measurement models apply and explore new variables that may exist in AI’s capability structure.
Thus, the team established their core mission: to truly transplant psychometrics into AI evaluation and establish the Science of Evaluation. They aimed to migrate mature validity and reliability theories and models from psychometrics to AI evaluation scenarios, exploring the applicability of traditional methods and areas requiring innovation.
Facing the current lack of evaluation data and processes, they began systematically collecting and preserving the raw item-level outputs of mainstream models on mainstream benchmarks to build a super-large AI evaluation response repository. They gathered existing data from the open-source community, filled critical gaps, and actively launched a “raw evaluation data donation” initiative within the academic community, calling for contributions of historical data that is “no longer useful to the original owners but extremely valuable for evaluation science.” Ziang likened this to “picking up all the answer sheets the world has thrown away.”
Ziang and Xiaoyuan Yi had previously co-authored a paper on automatic test item generation as preliminary work for this broader direction. Currently, several sub-topics—such as “using model interpretability techniques to mine latent variable relationships in AI measurement”—are progressing steadily. He proudly and earnestly introduced the value of their research: “We are not just doing one project; we are fostering the birth of a new research community.” In just a few months, this once-worried-about-being “too interdisciplinary and too niche” direction has attracted growing attention, participation, and data contributions from researchers, with the community rapidly taking shape.
Collaboration and Integration: Achieving Interdisciplinary Complementarity
Ziang brought perspectives in measurement from psychology and human-computer interaction, while Xiaoyuan Yi contributed deep expertise in natural language processing and machine learning. The integration and deep fusion of these different domain strengths generated numerous new ideas, cleverly achieving interdisciplinary complementarity in this emerging field. The collision of diverse backgrounds sparked many new concepts, even leading them to complete an additional collaborative study—analyzing how AI papers cite psychological literature—simply because “it was too interesting not to do.”
Moreover, the diversity of Xing Xie’s group deeply impressed Ziang: despite its small size, the team nearly covers intersections of AI with humanity, psychology, society, law, and more. A chance encounter in the hallway or tea room could spark new inspiration. “Teacher Xie’s group is extremely friendly to short-term visiting scholars; all academic activities are open to us.” Ziang attended almost all external lectures invited by the group and the StarTrack forums, maintained frequent exchanges with Yupeng Li (also in Xing Xie’s group), and is currently exploring concrete directions for further collaboration. He never felt like a “guest” but truly a member of the group.
When talking about the team, Ziang remarked: “A project of this scale might not have been chosen with only university resources, but in the research institute’s academic atmosphere, everyone decided to push forward, giving me much greater confidence.” During the three months, the team’s full collaboration and collective effort made this large-scale, longer-cycle project possible.
When asked for advice for 2026 applicants, Ziang said: “This is a precious opportunity to broaden academic horizons and establish long-term collaborations. More importantly, three months allows one to focus deeply on an important problem without distractions and spar repeatedly with top researchers. Under the daily pressures of teaching and administration in academia, it’s hard to find such a concentrated period. I sincerely encourage young scholars interested in this direction to apply actively.”
Although the visit has ended, the projects continue to advance steadily. The “AI version of a psychometric database” keeps expanding, and community response grows increasingly enthusiastic. The StarTrack Program at Microsoft Research Asia looks forward to welcoming more outstanding young scholars willing to bring rigorous social science methods into the AI era, jointly advancing AI evaluation from empirical practice toward a true scientific paradigm.
To ensure the harmonious, collaborative, and resilient integration of AI with society while minimizing potential negative impacts, a vital path is to vigorously promote Societal AI research. Through multidisciplinary collaboration, we combine the strengths of computer science and social sciences to systematically address the complex dynamics of AI reshaping the world.
We sincerely invite young researchers in computer science and social sciences to join this exciting exploration. Let us work together to find innovative paths, ensuring that AI technology continues to advance in a responsible, fair, and trustworthy manner—truly benefiting human society.
Microsoft Research Asia StarTrack Scholars advocates an open attitude, encouraging dialogue and joint experimentation with researchers from various disciplines to discover viable solutions. Now visit our official website to know more: Microsoft Research Asia StarTrack Scholars Program – Microsoft Research.
Theme Team
- Xing Xie, Managing Director Assistant, Microsoft Research Asia
- Xiaoyuan Yi, Senior Researcher, Microsoft Research Asia
- Jianxun Lian, Principal Researcher, Microsoft Research Asia
- Fangzhao Wu, Principal Researcher, Microsoft Research Asia