Insights into the Challenges and Opportunities of Large Multi-Modal Models for Blind and Low Vision Users: CLIP
PARIKSHA: A Scalable, Democratic, Transparent Evaluation Platform for Assessing Indic Large Language Models
Publication Do Transformers Use their Depth Adaptively? Evidence from a Relational Reasoning Task A. Curth, Rachel Lawrence, Sushrut Karmalkar, Niranjani Prasad April 2026
Publication Discourse Diversity in Multi-Turn Empathic Dialogue Hongli Zhan, Emma S. Gueorguieva, Javier Hernandez, Jina Suh, Desmond C. Ong, Junyi Jessy Li April 2026
Publication Litmus (Re)Agent: A Benchmark and Agentic System for Predictive Evaluation of Multilingual Models Avni Mittal, Shanu Kumar, Sandipan Dandapat, Monojit Choudhury April 2026
Publication Confident in a Confidence Score: Investigating the Sensitivity of Confidence Scores to Supervised Fine-Tuning Lorenzo Jaime Flores, Cesare Spinoso di-Piano, Jackie Cheung April 2026
Publication Do LLMs Follow Their Own Rules? A Reflexive Audit of Self-Stated Safety Policies Avni Mittal April 2026
Publication Human Values Matter: Investigating How Misalignment Shapes Collective Behaviors in LLM Agent Communities Xiangxu Zhang, Jiaming Wang, Qinlin Zhao, Hanze Guo, Linzhuo Li, Jing Yao, Xiao Zhou, Xiaoyuan Yi, Xing Xie April 2026
Publication LLM Reasoning as Trajectories: Step-Specific Representation Geometry and Correctness Signals Lihao Sun, Hang Dong, Bo Qiao, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan April 2026
Publication The Tool Illusion: Rethinking Tool Use in Web Agents Renze Lou, Baolin Peng, Wenlin Yao, Qianhui Wu, Hao Cheng, Suman Nath, Wenpeng Yin, Jianfeng Gao April 2026
Publication Magic, Madness, Heaven, Sin: LLM Output Diversity is Everything, Everywhere, All at Once Harnoor Dhingra April 2026
Publication Identifying Harm in Personalized, Generative AI Systems Require User-Centered Auditing at the Interaction Level Hannah Cha HEAL @ CHI ’26 | April 2026