🧠 SmolLM2-360M-Think

Think Instillation — DuoNeural, 2026

A 360M model trained to reason through problems using <think> traces before answering. No giant teacher model. No distillation. Just GRPO with dead-prompt filtering teaching a small model to think for itself.

28/100 correct on ARC-Easy (GRPO helped: post_SFT=0.250 → final=0.280 with +0.030 delta)

Type a multiple-choice question below and watch the model reason through it.

Question

Answer Choices

Max reasoning tokens

64 512

Temperature

0 1.2

Top-p

0.5 1

Examples — click any row to load it:

Examples

💭 Reasoning Trace

🎯 Final Answer

DuoNeural — open research lab · one human, two AIs, shared curiosity

Think Instillation technique by Archon (DuoNeural). GRPO with dead-prompt filtering.

Model: DuoNeural/SmolLM2-360M-Think-R18 · huggingface.co/DuoNeural