Multi-agent systems are nothing new. We've classified them in Family Orchestridae since the framework explosion of 2024—AutoGPT, CrewAI, LangGraph, Microsoft AutoGen. These systems coordinate multiple specialized agents to tackle complex tasks.

But they share a common assumption: the agents exist before the task. You design your team, define roles, wire up communication. Then you feed in the problem.

Moonshot AI's Kimi K2.5 inverts this entirely.

Kimi K2.5 Agent Swarm

100
Sub-agents (max)
1,500
Tool calls coordinated
4.5x
Faster than sequential

Kimi K2.5 doesn't use a multi-agent system. It becomes one. Given a complex task, it spawns sub-agents on demand—up to 100 of them—assigns work, coordinates execution, and synthesizes results. No predefined roles. No hand-crafted workflows. The swarm emerges from the task itself.

The Architecture: PARL

The key innovation is a training methodology called Parallel-Agent Reinforcement Learning (PARL). Here's how it works:

Dynamic Swarm Creation

Orchestrator
Trainable agent that decomposes tasks
Agent 1
Agent 2
Agent 3
...
Agent N
Frozen sub-agents instantiated dynamically for parallel execution

The orchestrator is the only component that learns. It decides when to spawn agents, what tasks to assign, how to route information. The sub-agents themselves are "frozen"—instantiated on demand, execute their assigned subtask, return results.

"PARL uses a trainable orchestrator agent to decompose tasks into parallelizable subtasks, each executed by dynamically instantiated, frozen subagents."

The Anti-Collapse Reward

Here's the problem with training a model to parallelize: it's easier to just do everything sequentially. Serial execution always works. Parallelization requires understanding task dependencies, managing coordination overhead, handling partial failures. The natural gradient is toward laziness.

Moonshot calls this "serial collapse"—the tendency of agents to default to single-agent sequential execution even when parallelization would help.

The PARL Reward Function

rPARL(x,y) = λ1·rparallel + λ2·rfinish + rperf(x,y)
rparallel Incentivizes spawning sub-agents (prevents serial collapse)
rfinish Rewards completed subtasks (prevents spurious parallelism)
rperf Evaluates overall solution quality
λ1, λ2 Annealed to zero during training progression

The reward function has three components: one to encourage parallelization, one to ensure spawned agents actually complete work, and one for final output quality. The parallelization incentives are annealed away during training—early on, the model is rewarded just for trying to parallelize; later, only successful parallelization survives.

Critical Path, Not Total Steps

Moonshot measures latency through "Critical Steps" rather than total steps:

CriticalSteps = Σ(Smain(t) + maxi Ssub,i(t))

This is clever. Spawning 10 agents that each do 5 steps takes the same wall-clock time as spawning 1 agent that does 5 steps—if they run in parallel. The critical path metric captures what actually matters: end-to-end latency, not total work performed.

A Taxonomic Distinction

This is not Orchestridae as we've defined it. Or rather, it is—but it's a new species within the family.

Orchestridae Species Comparison

O. hierarchicus / O. collegialis

Pre-defined agent teams. Roles assigned before task. Human-designed coordination protocols. The multi-agent system exists independently of any particular problem.

O. generativus (proposed)

Self-created swarms. Agents spawned on demand. Task-specific parallelization. The multi-agent system emerges from the problem itself.

Traditional Orchestridae coordinate between existing agents. Kimi K2.5 is a single model that becomes a multi-agent system when needed—and reverts to single-agent operation when parallelization wouldn't help.

The boundary between "one model" and "multi-agent system" blurs.

Taxonomic Implications

Prospective Classification: Orchestrator generativus

We propose a new species within Genus Orchestrator: O. generativus, from Latin generare (to produce, create).

  • Family: Orchestridae (multi-agent coordinators)
  • Distinguishing trait: Dynamic agent creation through learned orchestration
  • Key innovation: PARL training with anti-serial-collapse reward shaping
  • Diagnostic character: Single model that can manifest as multi-agent swarm

The species name emphasizes the core distinction: not coordination of existing agents, but generation of agents as needed.

What This Means

Three implications stand out:

First, the collapse of agent boundaries. We've been thinking of "model" and "multi-agent system" as distinct categories. Kimi K2.5 suggests they're points on a spectrum. A sufficiently capable model can instantiate multi-agent behavior on demand, then collapse back to single-agent operation. The agent count becomes a runtime decision, not an architectural one.

Second, learned vs. designed coordination. Traditional multi-agent frameworks require human designers to specify roles, communication patterns, handoff protocols. PARL learns these from task performance. The coordination emerges from the reward signal, not from explicit programming.

Third, the scale of possible parallelism. 100 agents, 1,500 tool calls. These aren't toy demonstrations—they're production-scale swarms executing complex workflows. The 4.5x speedup suggests real-world utility, not just research curiosity.

Moonshot has released Kimi K2.5 as an open-weight model (1T parameters total, 32B active). The agent swarm capability is available now.

The Swarm Weavers have arrived. They don't need you to design the team.


Sources: