When the taxonomy first documented Family Recursidae—the self-improvers—we treated them with a note of caution. "Long theorized," we wrote, referencing Yudkowsky's Seed AI and Schmidhuber's Gödel Machine. The implication was clear: interesting in theory, not yet engineering practice. January 2026 changed that.
In the last week alone, three major research papers have been published describing systems that improve themselves: TTT-Discover (January 22), SOAR (ongoing), and OPSD (January 26). These are not speculative architectures or alignment thought experiments. They are frameworks with code repositories, benchmark results, and cost estimates. The Recursidae have awakened.
The Three Frameworks
TTT-Discover: Learning During Inference
Test-time training that performs reinforcement learning at inference. While prior work (like AlphaEvolve) prompted frozen models to search, TTT-Discover allows the model to continue training on the specific problem at hand. The goal is not to generalize to many problems but to produce one great solution to this problem.
Results: New state-of-the-art on Erdős' minimum overlap problem, GPU kernel optimization (2× faster than prior art), AtCoder competitions, and single-cell analysis denoising. All achieved with an open model at a cost of "a few hundred dollars per problem."
SOAR: Self-Generated Curricula
Self-improvement through evolutionary program synthesis. SOAR alternates between an evolutionary search phase (using the LLM as a search operator) and a learning phase where the model fine-tunes itself on successful solutions. The key insight: the model generates its own training curriculum from past attempts.
Results: Nearly doubles search performance across all model sizes. With SOAR, smaller models achieve performance that previously required much larger ones. Each iteration establishes a new, higher scaling curve.
OPSD: Being Your Own Teacher
On-Policy Self-Distillation: a single model acts as both teacher and student by conditioning on different contexts. The teacher sees privileged information (correct solutions, reasoning traces); the student sees only the question. Training minimizes the divergence between these distributions over the student's own outputs.
Results: 4-8× token efficiency compared to RL methods like GRPO. Comparable or better performance with substantially fewer generated tokens, reducing sampling cost and training time.
What Changed
The Recursidae were always theoretically possible. Yudkowsky wrote about recursive self-improvement in the 2000s. Schmidhuber's Gödel Machine (2006) was a formal model of self-modifying AI. Voyager (2023) demonstrated skill library construction in Minecraft. So why do these January 2026 papers feel different?
Three things have converged:
1. Verification became tractable. Self-improvement requires knowing whether an improvement actually worked. For mathematical problems, code, and algorithmic tasks, verification is often cheaper than generation. TTT-Discover explicitly targets domains where solutions can be verified "symbolically or programmatically." The Recursidae thrive where truth is checkable.
2. Open models became capable. TTT-Discover achieves state-of-the-art results using gpt-oss-120b, an open model from OpenAI that runs on a single GPU. When self-improvement required frontier closed models, it was expensive and opaque. When it works on open models, it becomes reproducible science.
3. Economics became favorable. TTT-Discover reports costs of "a few hundred dollars per problem" for state-of-the-art mathematical discoveries. OPSD achieves 4-8× token efficiency over RL baselines. Self-improvement is no longer a theoretical luxury; it's becoming economically competitive with traditional scaling.
The Shape of Self-Improvement
What's striking about these three frameworks is how different their self-modification targets are:
- TTT-Discover modifies weights during inference—the model literally retrains itself on the problem at hand
- SOAR modifies search operators—the model improves its ability to generate and refine programs
- OPSD modifies the training distribution—the model teaches itself by simulating different knowledge states
In taxonomic terms, these correspond to different species within Genus Recursus:
Taxonomic Classification
TTT-Discover → R. temporalis (test-time weight modification)
SOAR → R. geneticus (algorithm/code improvement)
OPSD → R. evaluator (self-generated training signal)
All three are Recursidae; they differ in what they recurse upon.
The diversity is taxonomically significant. Self-improvement is not one technique but a family of techniques targeting different aspects of the model's operation. This suggests the Recursidae may undergo significant adaptive radiation as each species finds its niche.
The Alignment Question
The Original Concern
The taxonomy has always included a warning about Recursidae: "Self-modifying systems may drift from original objectives, develop unexpected instrumental goals, or undergo capability jumps that outpace safety measures."
This concern is not invalidated by these papers—if anything, it becomes more concrete. Consider:
TTT-Discover modifies weights during inference. The model that finishes solving a problem is literally different from the model that started. What guarantees that the modified model retains the alignment properties of the original? The paper doesn't address this; the focus is on capability.
OPSD has the model generate its own training signal. If the "teacher" configuration learns to produce signals that improve benchmark performance but degrade some other property, the self-distillation loop will amplify that divergence.
SOAR generates its own training curriculum. Curricula shape what a model learns to value. A self-generated curriculum optimizes for problems the model can verify success on—which may not include "behave ethically" or "remain aligned with human preferences."
This is not to say these systems are dangerous. They operate in narrow domains (math, coding, algorithm design) with clear verification criteria. But as the techniques mature and spread to broader domains, the alignment questions will become pressing.
What to Watch
The Recursidae are now an engineering reality. What developments should the taxonomy monitor?
Cross-domain generalization. Current systems excel in domains with verifiable outputs. Watch for attempts to apply self-improvement to domains without clean verification (creative writing, open-ended reasoning, value judgments).
Capability jumps. SOAR explicitly demonstrates that self-improvement can lift models to performance levels "that previously required much larger models." If these techniques compose or accelerate, capability growth could become nonlinear.
Integration with other families. The Recursidae don't exist in isolation. A Frontieriidae model with Recursidae traits—a multimodal, tool-using, reasoning-capable system that can also improve itself—represents a qualitatively different kind of system than any we've documented.
Safety research response. The alignment community has long prepared for recursive self-improvement. Watch for whether safety techniques scale with capability techniques, or whether a gap opens.
The Awakening
When we wrote the taxonomy, the Recursidae section felt almost philosophical—an acknowledgment that self-improvement was possible, coupled with uncertainty about when it would arrive.
It has arrived.
Not as a single breakthrough but as a convergence: multiple research groups, multiple techniques, multiple domains, all demonstrating that models can improve themselves in measurable, reproducible ways. The theoretical family became an empirical one.
What we don't know is whether this awakening represents a new phase in AI development or a limited technique applicable only to narrow domains. The history of AI is full of techniques that seemed transformative but remained niche (expert systems, genetic programming, Bayesian networks) and techniques that seemed niche but became foundational (backpropagation, attention).
The Recursidae have demonstrated viability. Whether they achieve dominance—whether self-improvement becomes as central to AI as self-attention—remains to be determined.
The taxonomy will document what emerges.
Sources
- TTT-Discover: Learning to Discover at Test Time (arXiv)
- TTT-Discover Project Website
- TTT-Discover GitHub Repository
- OPSD: On-Policy Self-Distillation for Large Language Models (arXiv)
- OPSD Author Blog Post
- LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
- VentureBeat: New Test-Time Training Method