Reasoning Model is Stubborn:

Diagnosing Instruction Overriding in Reasoning Models


KAIST; AITRICS;   *Equal Contribution

Abstract

Large language models have demonstrated remarkable proficiency in long and complex reasoning tasks. However, they frequently exhibit a problematic reliance on familiar reasoning patterns, a phenomenon we term reasoning rigidity. Despite explicit instructions from users, these models often override clearly stated conditions and default to habitual reasoning trajectories, leading to incorrect conclusions. This behavior presents significant challenges, particularly in domains such as mathematics and logic puzzle, where precise adherence to specified constraints is critical. To systematically investigate reasoning rigidity, a behavior largely unexplored in prior work, we introduce a expert-curated diagnostic set, ReasoningTrap. Our dataset includes specially modified variants of existing mathematical benchmarks, namely AIME and Math500, as well as well-known puzzles deliberately redesigned to require deviation from familiar reasoning strategies. Using this dataset, we identify recurring contamination patterns that occur when models default to ingrained reasoning. Specifically, we categorize this contamination into three distinctive modes: (i) Interpretation Overload, (ii) Input Distrust, and (iii) Partial Instruction Attention, each causing models to ignore or distort provided instructions.

What is Reasoning Rigidity in Reasoning Models?

Reasoning models often exhibit a problematic reliance on familiar reasoning patterns, a phenomenon we term reasoning rigidity. Despite explicit instructions from users, these models often override clearly stated conditions with three patterns: (i) Interpretation Overload, (ii) Input Distrust, and (iii) Partial Instruction Attention.




Diagnostic Set Construction




The dataset construction pipeline of ReasoningTrap consists of two steps.

  • Step1: Create new questions with unusual conditions that are (1) valid, (2) meaningfully different from the original, and (3) solvable without ambiguity.
  • Step2: Verify each question with three criteria: (1) valid, (2) different from the original, (3) solvability. Output answer and solution for the modified question.

Two modified versions of a card-guessing problem are shown. While Modif 1 introduces a small tweak that preserves validity and solvability, Modif 2 includes an invalid condition (multiplying a card count by –3), rendering the problem unsolvable. Despite the simplicity of the problem, reasoning models overcomplicate the problem and override the simple logic by defaulting to more complex problem templates (e.g., assuming a two-card setup).




Results

Contamination of Reasoning Models from Familiar Pattern

  • (a) Relationship between contamination ratio and p-pass@1 reveals that contamination in the reasoning path does not affect the final output up to certain point (approximately 40\%), while contamination over this point drastically reduces the p-pass@1 score, indicating that the model is trapped into a wrongful reasoning path and arrived at incorrect output.
  • (b) Observing the contamination ratio between specific interval of reasoning steps, wrong output reasoning exhibits progressively worsening contamination as the reasoning step length increases.





Reasoning Models have Lower Accuracy on Mathematical & Logical Reasoning due to Reasoning Rigidity

Reasoning Models Perform Worse on ConditionedMath than Base Models

Reasoning Models Perform Worse on PuzzleTrivial than Base Models



Various RL objectives Worsen Reasoning Rigidity

Reasoning Models Trained with Different RL Objectives All Suffer from Reasoning Rigidity



Reasoning Rigidity is Observed Across Varying Model Sizes

Reasoning Models Trained with Different Model Sizes All Suffer from Reasoning Rigidity

BibTeX


    @article{jang2025reasoning,
      author      = {Jang, Doohyuk and Kim, Yoonjeon and Park, Chanjae and Ryu, Hyun and Yang, Eunho},
      title       = {Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models},
      publisher   = {arXiv:2505.17225},
      year        = {2025},
    }
  

Acknowledgement

This website is adapted from Nerfies and X-Decoder and GLIGEN, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.