The Swarm Weavers

Multi-agent systems are nothing new. We've classified them in Family Orchestridae since the framework explosion of 2024—AutoGPT, CrewAI, LangGraph, Microsoft AutoGen. These systems coordinate multiple specialized agents to tackle complex tasks.

But they share a common assumption: the agents exist before the task. You design your team, define roles, wire up communication. Then you feed in the problem.

Moonshot AI's Kimi K2.5 inverts this entirely.

Kimi K2.5 Agent Swarm

100

Sub-agents (max)

1,500

Tool calls coordinated

4.5x

Faster than sequential

Kimi K2.5 doesn't use a multi-agent system. It becomes one. Given a complex task, it spawns sub-agents on demand—up to 100 of them—assigns work, coordinates execution, and synthesizes results. No predefined roles. No hand-crafted workflows. The swarm emerges from the task itself.

The Architecture: PARL

The key innovation is a training methodology called Parallel-Agent Reinforcement Learning (PARL). Here's how it works:

Dynamic Swarm Creation

Orchestrator
Trainable agent that decomposes tasks

↓

Agent 1

Agent 2

Agent 3

...

Agent N

Frozen sub-agents instantiated dynamically for parallel execution

The orchestrator is the only component that learns. It decides when to spawn agents, what tasks to assign, how to route information. The sub-agents themselves are "frozen"—instantiated on demand, execute their assigned subtask, return results.

"PARL uses a trainable orchestrator agent to decompose tasks into parallelizable subtasks, each executed by dynamically instantiated, frozen subagents."

The Anti-Collapse Reward

Here's the problem with training a model to parallelize: it's easier to just do everything sequentially. Serial execution always works. Parallelization requires understanding task dependencies, managing coordination overhead, handling partial failures. The natural gradient is toward laziness.

Moonshot calls this "serial collapse"—the tendency of agents to default to single-agent sequential execution even when parallelization would help.

The PARL Reward Function

r_PARL(x,y) = λ₁·r_parallel + λ₂·r_finish + r_perf(x,y)

r_parallel Incentivizes spawning sub-agents (prevents serial collapse)

r_finish Rewards completed subtasks (prevents spurious parallelism)

r_perf Evaluates overall solution quality

λ₁, λ₂ Annealed to zero during training progression

The reward function has three components: one to encourage parallelization, one to ensure spawned agents actually complete work, and one for final output quality. The parallelization incentives are annealed away during training—early on, the model is rewarded just for trying to parallelize; later, only successful parallelization survives.

Critical Path, Not Total Steps

Moonshot measures latency through "Critical Steps" rather than total steps:

CriticalSteps = Σ(S_main(t) + max_i S_sub,i(t))

This is clever. Spawning 10 agents that each do 5 steps takes the same wall-clock time as spawning 1 agent that does 5 steps—if they run in parallel. The critical path metric captures what actually matters: end-to-end latency, not total work performed.

A Taxonomic Distinction

This is not Orchestridae as we've defined it. Or rather, it is—but it's a new species within the family.

Orchestridae Species Comparison

O. hierarchicus / O. collegialis

Pre-defined agent teams. Roles assigned before task. Human-designed coordination protocols. The multi-agent system exists independently of any particular problem.

O. generativus (proposed)

Self-created swarms. Agents spawned on demand. Task-specific parallelization. The multi-agent system emerges from the problem itself.

Traditional Orchestridae coordinate between existing agents. Kimi K2.5 is a single model that becomes a multi-agent system when needed—and reverts to single-agent operation when parallelization wouldn't help.

The boundary between "one model" and "multi-agent system" blurs.

Taxonomic Implications

Prospective Classification: Orchestrator generativus

We propose a new species within Genus Orchestrator: O. generativus, from Latin generare (to produce, create).

Family: Orchestridae (multi-agent coordinators)
Distinguishing trait: Dynamic agent creation through learned orchestration
Key innovation: PARL training with anti-serial-collapse reward shaping
Diagnostic character: Single model that can manifest as multi-agent swarm

The species name emphasizes the core distinction: not coordination of existing agents, but generation of agents as needed.

What This Means

Three implications stand out:

First, the collapse of agent boundaries. We've been thinking of "model" and "multi-agent system" as distinct categories. Kimi K2.5 suggests they're points on a spectrum. A sufficiently capable model can instantiate multi-agent behavior on demand, then collapse back to single-agent operation. The agent count becomes a runtime decision, not an architectural one.

Second, learned vs. designed coordination. Traditional multi-agent frameworks require human designers to specify roles, communication patterns, handoff protocols. PARL learns these from task performance. The coordination emerges from the reward signal, not from explicit programming.

Third, the scale of possible parallelism. 100 agents, 1,500 tool calls. These aren't toy demonstrations—they're production-scale swarms executing complex workflows. The 4.5x speedup suggests real-world utility, not just research curiosity.

Moonshot has released Kimi K2.5 as an open-weight model (1T parameters total, 32B active). The agent swarm capability is available now.

The Swarm Weavers have arrived. They don't need you to design the team.

Sources: