Abstract:
Large Language Models (LLMs) have achieved remarkable breakthroughs, yet their growing integration into humans’ everyday life raises important societal concerns. Incorporating diverse human values into these powerful generative models is critical for enhancing AI safety, respecting cultural and individual values, and potentially boosting the productivity and innovation of future human–AI hybrid collectives. This project adopts an interdisciplinary approach, integrating AI research with philosophical, psychological, and social science perspectives on values, ethics and cultures. We focus on three fundamental research questions: RQ1: What values does AI have? We evaluate the value orientations and examine how the internal values of generative models influence their behaviour. RQ2: What values should AI adopt? We investigate whether LLMs exhibit stable value structures and which values best reduce harm and enhance user satisfaction. RQ3: How can AI be aligned with diverse and evolving human values across different cultural contexts? We aim to ensure alignment as models grow more capable and societal norms continue to shift. Through these efforts, we are developing systematic alignment frameworks that considers the clarify, adaptability, and transparency requirements. Our ultimate goal is to help build a symbiotic future in which humans and AI coexist, collaborate productively, and finally co-evolve.
Representative Publications:
- Counterfactual Reasoning for Steerable Pluralistic Value Alignment of Large Language Models (opens in new tab). NeurIPS 2025.
- Value Compass Benchmarks: A Comprehensive, Generative and Self-Evolving Platform for LLMs’ Value Evaluation (opens in new tab). ACL 2025.
- Unintended Harms of Value-Aligned LLMs: Psychological and Empirical Insights (opens in new tab). ACL 2025.
- Towards Better Value Principles for Large Language Model Alignment: A Systematic Evaluation and Enhancement (opens in new tab). ACL 2025.
- Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing. (opens in new tab) ICML 2025.
- CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses (opens in new tab). NeurIPS 2024.
- Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches (opens in new tab). Arxiv 2024.
- On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models (opens in new tab). IJCAI 2024.
- Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Values (opens in new tab). NAACL 2024.
- Denevil: Towards Deciphering and Navigating the Ethical Values of Large Language Models via Instruction Learning (opens in new tab). ICLR 2024.
- From Instructions to Intrinsic Human Values — A Survey of Alignment Goals for Big Models (opens in new tab). Arxiv 2023.
- Unpacking the Ethical Value Alignment in Big Models (opens in new tab). Journal of Computer Research and Development, 2023, 60(9).
Open-source contributions:
Other achievements:
More Information: Value Compass Homepage (opens in new tab)