Papers
arxiv:2601.06002

The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning

Published on Jan 9
· Submitted by
Qiguang Chen
on Jan 12
Authors:
,
,
,
,
,
,
,
,
,
,

Abstract

Large language models struggle with long chain-of-thought reasoning due to unstable structural patterns, but a molecular-inspired approach using effective semantic isomers and distribution-transfer-graph methods improves training stability and performance.

AI-generated summary

Large language models (LLMs) often fail to learn effective long chain-of-thought (Long CoT) reasoning from human or non-Long-CoT LLMs imitation. To understand this, we propose that effective and learnable Long CoT trajectories feature stable molecular-like structures in unified view, which are formed by three interaction types: Deep-Reasoning (covalent-like), Self-Reflection (hydrogen-bond-like), and Self-Exploration (van der Waals-like). Analysis of distilled trajectories reveals these structures emerge from Long CoT fine-tuning, not keyword imitation. We introduce Effective Semantic Isomers and show that only bonds promoting fast entropy convergence support stable Long CoT learning, while structural competition impairs training. Drawing on these findings, we present Mole-Syn, a distribution-transfer-graph method that guides synthesis of effective Long CoT structures, boosting performance and RL stability across benchmarks.

Community

Paper author Paper submitter
edited about 9 hours ago

Glad to share our recent exploratory project:

🧪 Title: The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
🌐 arXiv: 2601.06002

🧐 Why revisit Long CoT?
Recent work often focuses on “making CoT longer,” but longer traces are more likely to derail—e.g., drifting off-track, breaking logical continuity, or amplifying hallucinations—especially when attempting to cold-start genuine long-horizon reasoning from a standard instruction-tuned model.

A key observation is that many trajectories that merely look like long reasoning (e.g., distilling from randomly sampled ICL demonstrations, or using human-written long step-by-step solutions) are not behaviorally stable, and models frequently fail to learn robustly from them.

😭 Why imitation often fails

  • “Long” human CoT is not necessarily effective: Fine-tuning on human-written long CoT does not reliably reproduce the gains achieved by distilling from a strong reasoning model.
  • Distill from Weak instruct model + random ICL demonstrations largely fails: Using randomly chosen 1-shot ICL examples to “fake” long reasoning for distillation leads to significant degradation, suggesting that superficial formatting is insufficient.
    ​- Keywords are not the driver: Replacing surface tokens (e.g., “wait”) while preserving the underlying reasoning trajectory and behavioral pattern yields similar performance, indicating that SFT primarily learns structure/behavior rather than prompt-specific keywords.

    🔍 Early evidence: Long CoT has stable “structural fingerprints”
  • We observe a stable behavioral transfer graph: across different strong reasoning models and tasks, the induced distributional characteristics appear highly consistent.
    ​- In semantic space, we see “linking–folding” patterns: deep reasoning tends to form locally dense structures; self-reflection tends to create backward links for validation/correction; and exploration tends to form weaker cross-cluster connections.

💡 Core hypothesis: effective Long CoT as a “molecular structure”
High-quality Long CoT is not merely a linear chain; it is stabilized by three interaction types—analogous to chemical bonds—that organize and constrain reasoning trajectories:
​- Deep Reasoning (covalent-bond-like): forms the main reasoning backbone; if it breaks, the solution collapses.
​- Self-Reflection (hydrogen-bond-like): folds later steps back to earlier ones to verify assumptions, detect errors, and correct the path.
​- Self-Exploration (van der Waals-like): weak but important cross-domain probing that broadens coverage and discovers alternative routes.

An additional observation is that the Gibbs–Boltzmann energy formulation is closely aligned with the attention formulation; hence, the “energy distributions” of different bonds can be estimated directly from attention, exhibiting a stable ordering reminiscent of real chemical bond energies.

🍎 “Semantic isomers” of Long CoT

  • For the same problem, trajectories can be semantically close yet differ in the distribution and transitions of bonds, yielding distinct “isomers” with dramatically different trainability and downstream performance.
    ​- Two isomers that appear similar may still be incompatible: mixing them during training can trigger structural conflicts and degrade performance.
    ​- ICL is not inherently ineffective; it helps when demonstrations are selected such that their structural distribution aligns with the target high-quality isomer.

🔧 Solution: MOLE-SYN
We propose MOLE-SYN: first estimate a behavioral transfer graph from a strong reasoning model, then use it to guide a pure instruct LLM to synthesize Long CoT trajectories.
​- Empirically, distilling Qwen-2.5 with MOLE-SYN–generated trajectories can approach the effectiveness of distillation from QwQ.
​- This initialization also exhibits strong RL potential: it yields more stable RL training curves and sustained improvement headroom.

  • Finally, different behaviors have distinct global effects: deep reasoning makes the core logic more compact, self-reflection increases overall “folding” tightness, and self-exploration expands the reachable search space.

👀 A practical implication is that when CoT is heavily summarized or compressed, the “molecular structure” distribution can be destroyed, and distilled models may underperform even the original teacher.

If a prior viewpoint treated CoT behaviors as nodes, this work reframes them as edges that link logical states: the training target may not be “longer answers,” but a more stable reasoning skeleton controlled by structured reasoning behaviors.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.06002 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.06002 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.06002 in a Space README.md to link it from this page.

Collections including this paper 1