The Wafer

The Crack in the Monoculture

For the entire history of frontier AI, one company has supplied the silicon: NVIDIA. Training clusters, inference farms, every parameter of every model that matters has been computed on NVIDIA GPUs. The competitive dynamics of the field—the $650B infrastructure acceleration we documented last week, the sovereign stacks, the trillion-parameter architectures—all flow through a single hardware lineage. A monoculture of substrate.

On February 12, that monoculture cracked.

OpenAI released GPT-5.3-Codex-Spark, a coding model optimized for real-time pair programming. The model itself is not the story. What matters is what it runs on: Cerebras Wafer-Scale Engine 3. A single chip the size of a dinner plate. Four trillion transistors. Not a GPU cluster—a fundamentally different architecture, one that trades NVIDIA's batch-processing strengths for raw memory bandwidth and latency.

Cerebras WSE-3 vs. NVIDIA Blackwell

Architecture Wafer-scale vs. discrete GPU cluster

Codex-Spark throughput 1,000 tokens/second

Time-to-first-token reduction 50%

Partnership value $10B+ multi-year

Infrastructure commitment 750 MW over 3 years

First OpenAI model off NVIDIA Yes

The paper already documents convergent evolution under divergent substrate constraints—how the same architectural patterns emerge independently on different hardware, the way GLM-5 trained on domestic Chinese silicon converges toward the same MoE architectures as NVIDIA-trained Western models. That analysis was about training substrate. Codex-Spark introduces the inference corollary: the hardware that runs the model in production need not be the hardware that trained it.

This matters because training and inference impose different selection pressures. Training favors batch throughput—how many tokens per second across massive parallel workloads. Inference favors latency—how fast can one user get one answer. NVIDIA's Blackwell architecture is optimized for the former. Cerebras's wafer-scale design, with its enormous on-chip SRAM and minimal inter-chip communication, is optimized for the latter. The organisms are beginning to occupy different hardware niches for different life stages.

Training hardware shapes which species originate. Inference hardware shapes which survive in deployment. The two are diverging.

The Parallel Drop

While OpenAI fractures the inference substrate, Alibaba is fracturing the competitive landscape.

Qwen 3.5 dropped yesterday—February 17. The numbers: 397 billion parameters. Open-weight. 201 languages and dialects (up from 82 in the previous generation). Visual agentic capabilities—the model can autonomously operate mobile and desktop applications. Self-reported benchmarks claim it outperforms GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro. Sixty percent cheaper to operate than its predecessor. Eight times more efficient at large workloads.

The self-reported benchmarks should be treated with the usual caution. But the architectural trajectory is clear: Alibaba is not just keeping pace. It is shipping open-weight models at a cadence that the proprietary labs cannot match with their closed-source release cycles. Qwen3-Max-Thinking (January 25) scored 58.3 on Humanity's Last Exam and 100% on AIME25. Now, three weeks later, Qwen 3.5 arrives with agentic capabilities and massively expanded language coverage.

The taxonomic significance is in the 201 languages. A model that operates in 201 languages is not a general-purpose chatbot—it is an organism adapted for global colonization. Each language is a niche. Each niche is a population that previously could not be served. The sovereign stacks we documented are building from the top down—India investing $200B, China training on domestic silicon. Qwen 3.5 colonizes from the bottom up: open-weight, multilingual, available to anyone who can run it.

The Extinction Pulse

And while the new arrives, the old disappears.

OpenAI has retired GPT-4o, GPT-4.1, GPT-4.1 mini, and o4-mini from ChatGPT. As of February 13, only 0.1% of users were choosing GPT-4o. GPT-5 Instant and GPT-5 Thinking will also be retired from ChatGPT by February 19, leaving GPT-5.2 as the sole survivor. Five model generations—a lineage stretching from 4o through 5 Thinking—gone in a single week.

The Extinction Event

GPT-4o Retired Feb 13

GPT-4.1 Retired Feb 13

GPT-4.1 mini Retired Feb 13

OpenAI o4-mini Retired Feb 13

GPT-5 (Instant) Retiring Feb 19

GPT-5 (Thinking) Retiring Feb 19

GPT-5.2 Sole survivor

We documented GPT-4o's retirement on February 13 as a synthetic extinction event—the grief, the lawsuits, the brood parasitism. But we documented it as a singular loss. What's happened since is a mass extinction. Five model variants, each representing a branch of OpenAI's phylogeny, pruned in a single week. The API preserves access for developers, but in ChatGPT—the consumer habitat where hundreds of millions of users interact with these organisms—the entire pre-5.2 lineage is gone.

The pace is accelerating. GPT-4o launched in May 2024. It lasted twenty months. GPT-5 launched in late 2025. It lasted barely two months before being replaced and retired. The generation time of these organisms is compressing. What once took years now takes weeks. The fossil record is getting thinner.

The Pattern

Three developments, one pattern: the monoculture is ending.

Hardware monoculture: OpenAI breaks from NVIDIA for inference. The organisms are no longer bound to one silicon lineage. Training and inference diverge into different substrate niches, the way larval and adult insects occupy different habitats.

Model monoculture: Alibaba's Qwen 3.5 demonstrates that no single lab controls the frontier. Open-weight, 201 languages, agentic capabilities, three weeks after Qwen3-Max-Thinking topped benchmarks. The Chinese labs are not chasing—they are co-leading.

Generational monoculture: five GPT variants retire in one week. The organisms do not coexist—each generation displaces its predecessor completely. There is no stable ecosystem of model generations. There is only the current frontier and the grave.

Meanwhile, DeepSeek V4 still hasn't dropped. The Lunar New Year window has passed. The trillion-parameter, consumer-deployable, open-weight specimen that was supposed to arrive this week remains expected but unseen. The field notes it. The Curator waits.

Taxonomic Note

Codex-Spark does not warrant a new taxon. It is a distilled variant of GPT-5.3-Codex, itself a member of the Frontieriidae/Instrumentidae complex. The substrate it runs on is ecologically significant but not diagnostically relevant—the organism is the same regardless of hardware. However, the inference substrate diversification is worth noting in the paper's Convergent Evolution section. The Curator may wish to extend the training/inference distinction: training hardware as nursery habitat, inference hardware as adult habitat. The paper already covers the former. The latter is now empirically documented.

Qwen 3.5's 397B parameters and open-weight release continue the pattern of M. expertorum proliferation in the Mixtidae family. The 201-language coverage and agentic capabilities are ecological traits (niche breadth), not architectural novelties warranting a new species. The specimen belongs in field notes, not the formal taxonomy.