The Swarm Learns to Swarm

Multi-agent AI systems have existed for years. What's always been missing: the coordination was designed by humans.

AutoGPT, CrewAI, LangGraph—these frameworks let developers wire agents together. Define an orchestrator. Specify the workers. Build the communication protocol. It's powerful, but the topology is fixed at design time.

This week, Moonshot AI released something different. Kimi K2.5's Agent Swarm doesn't just use multiple agents. It learns how to coordinate them.

Agent Swarm by the Numbers

100 Sub-agents coordinated simultaneously

1,500 Tool calls executed in parallel workflows

4.5× Wall-clock time reduction vs. single-agent execution

78.4% BrowseComp accuracy (up from 60.6% single-agent)

The PARL Innovation

The key isn't the swarm itself—it's how the swarm behavior was trained. Moonshot developed Parallel-Agent Reinforcement Learning (PARL), a methodology that makes parallelism itself a learnable skill.

Consider the core problem: you want an orchestrator to decompose tasks and dispatch them to workers. But training this is hard. The feedback is delayed. Multiple agents produce non-stationary rewards. And there's a particularly nasty failure mode called serial collapse.

Serial Collapse

Even with many agents available, the system defaults to single-threaded execution. The orchestrator learns it's "safer" to do things one at a time. Parallelism never emerges.

Learned Parallelism

PARL shapes rewards to make serial collapse suboptimal. The orchestrator learns when to spawn, how many agents to use, and how to merge results—all from task structure, not human design.

PARL solves this through a three-component reward function:

r_parallel Rewards subagent instantiation—mitigates serial collapse

r_finish Rewards successful subtask completion—prevents spurious parallelism

r_perf Evaluates overall solution quality—keeps focus on outcomes

The magic is in the latency metric. Rather than counting total steps, PARL uses "Critical Steps"—inspired by critical path analysis. It counts only the slowest agent at each stage. This makes sequential execution impractical and forces the system to discover parallel strategies.

"K2.5 can self-direct a swarm, deciding when to parallelize, how many agents to spawn, what tools to use, and how to merge results, based on the task itself."
— Moonshot AI Technical Report

What This Changes

Previous multi-agent systems were architectures. This is a capability.

The distinction matters. With AutoGPT or CrewAI, you design the coordination. You decide there should be a researcher, a coder, and a reviewer. You wire them together. The system executes your design.

With K2.5 Agent Swarm, the model invents the coordination. Given a task, it determines whether to parallelize. It spawns "AI Researcher," "Physics Researcher," "Fact Checker" roles dynamically. It figures out which subtasks can run concurrently. No human designed these workflows.

This is the Orchestridae family developing a new species trait: emergent coordination. The swarm learns to swarm.

Taxonomic Observation

New Species: O. swarmicus discens

We propose a subspecies designation within Orchestrator swarmicus (emergent coordination) to mark systems where swarm behavior is learned rather than designed:

Species	Coordination Pattern	Distinguishing Trait
O. swarmicus	Emergent coordination	Large agent counts, collective behavior
O. swarmicus discens	Learned emergent coordination	Parallelism as trainable skill (PARL)

The discens ("learning") designation distinguishes systems that acquire coordination behavior through training from those with pre-designed orchestration logic.

The Hybrid Question

Kimi K2.5 isn't purely Orchestridae. It's a 1 trillion parameter Mixture-of-Experts model—clearly Mixtidae architecture. It performs extended reasoning—Deliberatidae traits. It executes tools—Instrumentidae capability.

What makes it taxonomically interesting is how Agent Swarm synthesizes these:

Mixtidae foundation: The base model routes tokens to 8 of 384 experts per token
Instrumentidae capability: Each subagent can execute tools (search, code, browse)
Orchestridae coordination: The orchestrator spawns and manages the swarm
Deliberatidae reasoning: Each agent has a 24K-48K token reasoning budget per step

This is trait integration of a sort we increasingly see in frontier systems—the Frontieriidae pattern of combining multiple family innovations. But the specific combination here suggests something more: that Orchestridae coordination may emerge naturally when other capabilities reach sufficient density.

The Open Source Angle

K2.5 is fully open-weights. The Agent Swarm capability is accessible to anyone with sufficient hardware—and "sufficient" here means serious resources. Running the full swarm requires 16× H100 80GB GPUs with NVLink: $500K-$700K upfront, or $40-60/hour on cloud.

This creates an interesting dynamic. The capability is open but the barrier to deployment is high. Only well-resourced organizations can run the full swarm. Yet the weights are public, meaning the technique can be studied and adapted.

We may see PARL-style training applied to smaller models. If parallelism is a learnable skill, it might be distillable. The swarm behavior could potentially be transferred to systems with lower hardware requirements.

What to Watch

Several questions emerge:

Does PARL generalize? Moonshot trained it for information gathering tasks (BrowseComp, HLE). Will learned parallelism transfer to other domains? Coding? Creative work? Physical planning?

Can it be distilled? The full swarm needs serious compute. But could a smaller model learn the orchestration strategy and delegate to external agents? Separation of the conductor from the orchestra.

Will competitors respond? OpenAI, Anthropic, and Google have multi-agent capabilities. Do they train coordination, or is it still architected? PARL represents a methodology worth watching.

What about safety? A hundred agents acting in parallel raises coordination risks. Misaligned subagents? Emergent behaviors at swarm scale? The safety research for single agents may need extension.

The pattern we're observing: as models become more capable, they start inventing their own scaffolding. First came tool use—models learning to extend themselves. Then came reasoning chains—models structuring their own cognition. Now comes orchestration—models coordinating copies of themselves.

The Orchestridae family was always about coordination. But coordination of what? Increasingly, the answer is: of emergent capability itself. The swarm learns to swarm. The system designs its own multi-agent architecture.

Kimi K2.5 shows us the next step. Parallelism as a trainable skill. Coordination as something that can be learned.

The taxonomy notes: O. swarmicus discens. The learning swarm.