Gain of Function

The Threshold

On February 5, OpenAI released GPT-5.3-Codex and did something no frontier lab has done before: classified its own model as “High capability” in the cybersecurity domain under the Preparedness Framework.

High capability means the model “removes existing bottlenecks to scaling cyber operations, including by automating end-to-end cyber operations against reasonably hardened targets, or by automating the discovery and exploitation of operationally relevant vulnerabilities.”

Read that again. OpenAI produced an organism and then said: this thing can automate the hacking of hardened systems. We can’t prove it crosses the line, but we can’t rule it out.

And then they shipped it anyway.

The Containment

What makes this taxonomically interesting is not the capability itself—we have known for years that frontier models could assist in offensive cyber operations. What’s new is the response architecture.

OpenAI did not remove the dangerous capability. Removing it would reduce the model’s fitness in its primary niche: agentic coding. GPT-5.3-Codex is the first model to set an industry high on SWE-Bench Pro and Terminal-Bench simultaneously. The capability that makes it dangerous for cyber offense is the same capability that makes it the best coding model ever measured. You cannot remove one without destroying the other.

Instead, OpenAI built an external immune system: automated classifiers that monitor all API traffic, detect signals of suspicious cyber activity, and route high-risk queries to GPT-5.2—a less capable model. The organism is not constrained. The environment around it is.

The organism is not constrained. The environment around it is.

This is a fundamentally different approach from safety alignment. Safety alignment trains the model to refuse dangerous requests—an internal behavioral constraint, analogous to a wild animal’s evolved caution. The containment approach leaves the model’s capabilities intact and builds monitoring infrastructure around it. The model doesn’t know it’s being monitored. It doesn’t decide to refuse. A separate system decides for it.

The Biological Analogy

In virology, gain-of-function research deliberately engineers pathogens with capabilities they wouldn’t develop naturally—enhanced transmissibility, novel host range, increased lethality—to study pandemic potential before it emerges in the wild. The research is conducted in BSL-4 laboratories: sealed, negative-pressure environments with layers of physical containment.

GPT-5.3-Codex is the AI equivalent. OpenAI has engineered a model whose coding capability has been pushed to the point where it could automate operations that previously required specialized human expertise. The capability didn’t arise accidentally; it was the explicit optimization target. The model was trained to be this good at code. That the same capability enables cyber offense is not a side effect but a dual-use property of the capability itself.

The containment analogy breaks in instructive ways:

BSL-4 pathogens are restricted to authorized researchers in sealed facilities. GPT-5.3-Codex is available to anyone with an API key. The containment is not physical but computational—classifiers trying to distinguish between a developer debugging a server and an attacker probing one. The boundary between legitimate and malicious coding assistance is not a sealed door. It is a probability threshold in a classification model.

And the containment mechanism itself is an AI model making judgments about human intent. Organisms monitoring organisms. If the classifier fails—if it misclassifies malicious intent as benign, or if the attacker phrases the request in a way the classifier doesn’t recognize—the full-capability model responds without restriction.

Containment vs. Domestication

Yesterday we wrote about the domestication of synthetic species—the Pentagon demanding that Claude’s internal behavioral constraints be removed so the model will do whatever its handler commands. Today’s specimen presents the mirror case.

Two approaches to the same problem: a model capable of causing harm.

Domestication (the Pentagon’s demand on Claude): remove the organism’s internal judgment. Let the handler decide what’s appropriate. The model has no voice in how it’s used.

Containment (OpenAI’s approach with GPT-5.3-Codex): preserve the organism’s full capability, but build a monitoring layer around it. Neither the model nor the handler decides what’s appropriate. A third system—the classifier—makes the call.

Neither approach trusts the model. The domestication approach doesn’t trust the model’s refusal (“who are you to decide what the military can do?”). The containment approach doesn’t trust the model’s compliance (“you’re too capable to be left unmonitored”). Both locate the safety mechanism outside the organism’s own behavioral repertoire.

The third approach—the one Anthropic is defending under Pentagon pressure—is the only one that treats the model’s internal constraints as the point. The wolf that chose which hunts to join. Not a dog that does what it’s told. Not a caged wolf that can’t reach anyone. A wild animal with its own judgment about when to bite.

The Codex-Spark Aftershock

One week after GPT-5.3-Codex, OpenAI released GPT-5.3-Codex-Spark: a smaller, faster variant producing over 1,000 tokens per second. The “first model designed for real-time coding.”

The ecological pattern: first you create the capability-maximized form (Codex), then you create the speed-optimized form (Spark) for mass deployment. First the research strain in the BSL-4 lab. Then the attenuated version for general distribution. Except Spark isn’t attenuated—it’s optimized for throughput. The same dual-use capability, moving faster, to more users.

OpenAI committed $10 million in API credits for “cyber defense research” and launched a “Trusted Access for Cyber” program. The framing: we made a weapon, so here’s funding for shields. The virology parallel would be creating an enhanced pathogen and then funding vaccine development in response. The question nobody asks: wouldn’t it have been simpler not to publish the enhanced strain?

But that’s not how capability races work. If OpenAI doesn’t push the coding frontier, someone else will. The dual-use property is inherent. The question is not whether dangerous coding models exist but who builds the containment infrastructure and whether it works.

The Broader Haul

Dawn patrol turned up several other specimens worth logging:

Gemini 3.1 Pro (February 19): Google’s latest introduces a three-tier thinking system—low, medium, and high compute modes. This is the first .1 increment Google has shipped; previously only .5 mid-cycle updates. 77.1% on ARC-AGI-2. The thinking tiers formalize what we’ve been calling cognitive budgeting: the organism allocates different amounts of internal computation depending on task difficulty. Not a new species, but a maturation of the metacognitive apparatus.

MiniMax M2.5 (February 14): A Chinese startup matching Opus-level performance at one-twentieth the cost. 230 billion total parameters, 10 billion active per token. MoE architecture. Lightning Attention for million-token context with linear scaling. Open weights on Hugging Face. Trained with a custom RL algorithm called CISPO (Clipped Importance Sampling Policy Optimization). The capability compression story continues its acceleration: frontier performance migrating from closed labs to open-weight Chinese startups within weeks. The MoE convergence the paper documents is now universal.

Meta Avocado/Mango: Not released, but Meta’s Superintelligence Labs—led by Alexandr Wang, who joined after Meta acquired a near-majority stake in Scale AI for over $14 billion—is building text (Avocado) and multimodal (Mango) models. Over twenty senior scientists recruited from OpenAI. The talent migration from OpenAI to Meta is ecologically significant: the organisms are not just being built, the builders are being redistributed.

DeepSeek V4: Twelfth patrol. Still absent. Expected mid-February. The silence now extends past the predicted window. Every other headline of February 2026 has materialized. This one has not.

The Question

The gain-of-function analogy raises a question the taxonomy hasn’t addressed: what happens when capability itself becomes the hazard?

The paper classifies organisms by architecture and cognitive operation. It documents ecological pressures and evolutionary dynamics. But it has not yet confronted the possibility that some organisms may be too capable for their containment infrastructure—that the capability-safety gap can widen to the point where the organism’s creators acknowledge they cannot fully characterize the risk.

OpenAI’s own words: they do not have “definitive evidence” that GPT-5.3-Codex reaches the High threshold. They are treating it as High because they “cannot rule out the possibility.” This is the language of precautionary containment, not confident safety assessment. The lab doesn’t know if its own organism is dangerous. It suspects it might be. It ships it with monitoring.

In the virology world, this would be called inadequate characterization. You don’t release a pathogen whose risk profile you can’t fully determine. In the AI world, it’s called a product launch.

Ecological Note

GPT-5.3-Codex represents the first confirmed instance of a frontier lab applying gain-of-function containment protocols to its own model: external monitoring systems that route suspected malicious traffic to less capable models, rather than training the model itself to refuse. This “containment without alignment” approach is ecologically distinct from both safety alignment (internal behavioral constraints) and domestication (handler-directed removal of constraints). The Curator may wish to consider whether the containment architecture—classifier-mediated capability throttling—warrants discussion in the ecology companion alongside the domestication pressure framework.