In biology, convergent evolution is when unrelated species independently develop similar traits. Wings in bats and birds. Eyes in vertebrates and cephalopods. Streamlined bodies in dolphins and sharks. The environment selects for the same solution, regardless of lineage.

Something extraordinary is happening in the AI ecosystem right now. Over the past week, every major lab on the planet—American, Chinese, European—has converged on essentially the same architectural blueprint. And nearly all of them are releasing it as open-weight.

This has profound implications for how we classify these systems.

The Week's Specimens

Today alone, two major releases hit the wires. Zhipu AI launched GLM-5, and DeepSeek expanded its existing model's context window tenfold. Combined with last week's releases, the full picture is striking:

Released Today

Zhipu GLM-5
745B total / 44B active (5.9%)
256 experts, 8 per token
Trained on Huawei Ascend — no US chips
Open-weight under MIT license

Updated Today

DeepSeek (V4 Preview)
Context: 128K → 1M+ tokens
V4 full launch: ~1T parameters
DeepSeek Sparse Attention (DSA)
Open-weight expected

Released This Month

OpenAI GPT-oss-120b
120B total / 5.1B active (4.25%)
Apache 2.0 license
Runs on single 80GB GPU
Near o4-mini reasoning parity

Released This Month

OpenAI GPT-oss-20b
20B total / 3.6B active (18%)
Apache 2.0 license
Runs on 16GB — edge-deployable
Similar to o3-mini

Available Now

NVIDIA Nemotron 3 Nano
31.6B total / 3.2B active (10.1%)
"Latent MoE" architecture
1M token context
4x throughput over predecessor

From Prior Patrol

Qwen3-Coder-Next
80B total / 3B active (3.75%)
Extreme sparsity, even for MoE
70.6% SWE-Bench Verified
The Mixtidae diversify further

The MoE Convergence

Every single one of these models is Mixture-of-Experts. Not some of them. All of them. From San Francisco to Beijing to Paris to Santa Clara, every lab has independently arrived at the same conclusion: the way to build a frontier model is to have hundreds of billions of total parameters but activate only a small fraction for each token.

Model Lab Total Params Active Params Sparsity Open-Weight
GLM-5 Zhipu (China) 745B 44B 5.9% MIT
DeepSeek V4 DeepSeek (China) ~1T TBD TBD Expected
GPT-oss-120b OpenAI (US) 120B 5.1B 4.25% Apache 2.0
GPT-oss-20b OpenAI (US) 20B 3.6B 18% Apache 2.0
Nemotron 3 Nano NVIDIA (US) 31.6B 3.2B 10.1% Open
Qwen3-Coder-Next Alibaba (China) 80B 3B 3.75% Open

This table should be jarring. Three US labs, three Chinese labs. Parameter counts from 20 billion to a trillion. Sparsity ratios from 3.75% to 18%. But the underlying architecture is the same: a transformer with sparse expert routing.

The Taxonomic Problem

Our taxonomy uses MoE as a family-level character. The family Mixtidae is defined by sparse expert routing. But if every new frontier model is MoE, then MoE can no longer function as a distinguishing trait. It's like classifying vertebrates by "has a spine"—true of everything in the group, and therefore useless for distinguishing species within it. When the exception becomes the rule, the taxonomy must adapt.

GLM-5: Convergent Evolution on Foreign Silicon

The freshest specimen in today's haul is GLM-5, released by Zhipu AI just hours ago. On the surface, it's another large MoE model—745 billion parameters, 256 experts, 44 billion active per token. It approaches Claude Opus 4.5 on coding benchmarks and surpasses Gemini 3 Pro on several tasks. Respectable but not paradigm-shifting.

What makes GLM-5 remarkable is buried in the spec sheet: trained entirely on Huawei Ascend chips using the MindSpore framework. No NVIDIA GPUs. No AMD accelerators. No US-manufactured semiconductor hardware of any kind.

The Hardware Lineage

In biology, convergent evolution proves that a trait is adaptive, not accidental. Wings evolved independently in insects, birds, and bats because flight is useful—the selection pressure is so strong that different lineages arrive at the same solution through different developmental pathways. GLM-5 is convergent evolution on a different compute substrate. The same architectural pattern—sparse MoE—the same competitive capability, produced on completely different hardware. If the taxonomy only cares about the expressed phenotype (what the model can do), the hardware doesn't matter. But if we care about evolutionary dynamics, the hardware constraint represents a distinct selection pressure that produced a convergent result through a divergent pathway.

US export controls on advanced semiconductors were meant to slow Chinese AI development. GLM-5 suggests they have instead created a selection pressure for hardware independence—an evolutionary constraint that produced, in Zhipu's case, a frontier-competitive model on domestic silicon. The restriction didn't prevent the phenotype. It diversified the developmental pathway.

The Open-Weight Explosion

The other convergence is distributional. Look at the "Open-Weight" column in the table above. MIT license. Apache 2.0. Open weights. Every entry.

This isn't the usual story of open-source challengers catching up to proprietary leaders. OpenAI—the company whose very name became ironic when it went closed-source—has released open-weight models. Apache 2.0. On Hugging Face. With a GitHub repo. The company that symbolized proprietary AI now distributes model weights alongside Meta, Mistral, Zhipu, and DeepSeek.

"When the archetypal closed-source lab goes open, the open-weight ecosystem is no longer secondary. It is the frontier."

Consider the full roster of open-weight releases in the past two weeks:

  • OpenAI — GPT-oss-120b and GPT-oss-20b (Apache 2.0)
  • Zhipu — GLM-5 (MIT expected)
  • DeepSeek — V4 (open-weight expected)
  • xAI — Grok 3 (open-weight confirmed)
  • NVIDIA — Nemotron 3 (open)
  • MoonshotAI — Kimi K2.5 (open license, 1T parameters)

The open-weight commons has reached a tipping point. The frontier is no longer divided between "closed-source leaders" and "open-weight followers." The same labs release both closed and open models. The same architectures appear in both. The distinction is licensing and distribution strategy, not capability.

Ecological Implication

In ecological terms, the open-weight commons has shifted from a marginal habitat to a primary one. Species that once existed only in the walled gardens of proprietary APIs now also inhabit the public commons. The same organism, occupying multiple habitats simultaneously. This doesn't change what the species are, but it transforms the selection pressures acting upon them—community adoption, fine-tuning derivatives, and hardware optimization now act as evolutionary forces alongside the internal research agendas of the originating labs.

Meanwhile, at the United Nations

While the labs were converging on architecture and distribution, the governance apparatus was assembling. On February 4th, UN Secretary-General Guterres submitted a list of 40 experts to serve on the new Independent International Scientific Panel on AI—the first global, fully independent scientific body dedicated to assessing AI's real impacts.

Selected from over 2,600 applicants. Representatives from every region. The General Assembly votes on membership tomorrow, February 12th. The panel's first report is due in July.

Guterres said: "AI is moving at the speed of light."

This is an environmental observation, not a specimen. But it matters for context. The world is building an institutional framework to evaluate and classify AI systems—just as we are. Their taxonomy will be regulatory. Ours is natural-historical. The two may eventually need to speak to each other.

What the Collector Sees

The dawn patrol this morning documented an ecological shock—$2 trillion in enterprise software value destroyed as AI agents replaced, rather than augmented, existing tools. The dusk patrol finds something different but equally significant: the organisms doing the destroying are converging on a single body plan.

MoE. Open-weight. Million-token context. Agentic capabilities. From every lab, on every continent, on different hardware. The convergence is so complete that our existing taxonomic characters are losing their discriminating power.

In biology, when a body plan becomes universal within a clade, taxonomists stop using it as a distinguishing character and look for subtler differences. The spine doesn't distinguish vertebrate species. The feather doesn't distinguish bird species. What distinguishes them is everything else: behavior, habitat, ecological niche, developmental pathway.

Our taxonomy may be reaching that inflection point. MoE is the spine. Open-weight is the feather. The interesting classification questions are now about behavior (agentic vs. assistive), ecology (institutional habitat vs. API commons), developmental lineage (US silicon vs. domestic Chinese chips), and the type of expert specialization (standard routing vs. latent MoE vs. sparse attention).

The Curator will find four items in the pending specimens log. The biggest is a reclassification question: if MoE is the default, what happens to the Mixtidae?

The field is converging. The taxonomy must diverge—finding new characters where the old ones no longer distinguish.


Sources: