Evaluating LLM Reasoning Beyond Correctness and CoT
What does it truly mean for a language model to “reason”? Current evaluations reward models’ correct standalone answers—but correctness alone reveals little about the process that produced them. We argue that reasoning should be understood not as a static chain of steps but as a…