A luminous cathedral-like datacenter corridor representing millions of AI workers

A Country Full of Geniuses

Sixty days of progress that rewrote the next decade

Last month, I typed a single prompt into an AI system and walked away. When I came back, it had read a client's use-case document in German, designed an evaluation plan, generated synthetic test data, created a project on our platform, run the experiments, analyzed the results, and assembled a polished seventeen-page presentation with concrete takeaways. No edits. No follow-up instructions. One prompt.

The week before, I built a detailed month-by-month financial model for our annual investor strategy meeting - multiple scenarios across ten spreadsheets, a twenty-page presentation covering our current business situation, strategy, and financial options - from a single description of what I needed. A year ago, that would have been days of focused work.

I keep cataloguing these moments. A complete demo application: working backend, functional frontend, integrations with external APIs - one-shotted from a prompt. A production feature implemented from my phone that would have taken our engineering team weeks. Our entire company knowledge base including onboarding docs, evaluation frameworks, deployment playbooks: embedded into agent workflows that chain together like dominoes. Tip one, and the whole sequence runs. Each time, the same thought: this was not possible just three months ago.

Four percent of all new public code committed to GitHub, the platform where most of the world's software is built, is now written by a single AI system: Claude Code. Analysts project that share will exceed twenty percent by year's end.[1]

Claude Code GitHub commits over time: from near zero in early 2025 to 135,000+ commits per day by February 2026, roughly 4% of all public GitHub commits
Source: SemiAnalysis, "Claude Code is the Inflection Point," Feb 2026.

Four percent sounds small.[2] Imagine four percent of all new buildings in your city, designed by a single architect who did not exist two years ago. An architect who works around the clock, who gets faster every month. You would pay attention.

I run an AI evaluation company in Germany; we help organizations assess and deploy AI systems in regulated contexts. Our work is to make AI use safe and reliable, but on the pace of progress itself, we are bystanders. Tracking capability is my job. I was watching closely. And I still did not fully see the speed of what was coming.

Many people working in AI have been trying to articulate what they are experiencing. Matt Shumer compared the current moment to the early days of Covid, a period when insiders could see the wave building but the rest of the world had not yet felt it.[3] The comparison is apt. When people ask how work is going, I default to the measured version: we are integrating AI tools, it speeds things up, a gradual shift. That is not what is happening. What is happening is that my productive output has multiplied in ways I would not have believed six months ago. I may be biased. I work in AI, and proximity shapes perception. But the people arriving at similar conclusions span industries and continents, and many of them have no stake in the technology's success.[4]

To understand how we got here, we need to look at a single week: the first week of February 2026.


The Week Everything Changed

How the pace of progress became the story itself

The story does not start on February 5. It starts in late November 2025, when Anthropic released Claude Opus 4.5.

Before Opus 4.5, AI coding tools were sophisticated autocomplete. Occasionally impressive, frequently wrong in ways that cost more time to fix than to write from scratch. Opus 4.5 changed the relationship: hand it a complex, multi-step task, come back hours later, and the work was done. Not a rough draft. A finished product. Programmers stopped saying "AI helps me code" and started saying "I supervise while AI codes."[5]

That was the end of last year. Then, just ten weeks later, two new frontier models arrived on the same day.

On February 5, 2026, Anthropic released Claude Opus 4.6 and OpenAI released GPT-5.3-Codex within hours of each other. The capability jumps were enormous: long context benchmarks nearly doubled, cybersecurity evaluations leapt by fifty percent. But the raw numbers are less important than what they represent.[6] What made February 5 matter was not just the models. It was the pace. The gap between Opus 4.5 and Opus 4.6 was roughly ten weeks. Previous gaps between major model releases had been six to twelve months. The releases are getting closer together. The jumps between them are getting larger.

The acceleration is itself accelerating.

Two competing labs, pursuing different architectures, independently crossing professional-grade thresholds on the same day. When two independent experiments point in the same direction, the signal is stronger than either alone.

But benchmarks, even dramatic ones, are just numbers on a page. They do not convey what it feels like to watch the capacity of these systems grow. For that, you need a different kind of measurement, one that has been running quietly for six years.


A New Kind of Nation

What happens when a nation of superhuman workers appears overnight

Dario Amodei, CEO of Anthropic (the company behind Claude), described the near future as "a country of geniuses in a datacenter," millions of expert-level AI workers operating simultaneously across every domain of knowledge work.[7]

Luke Drago and Rudolf Laine, authors of "The Intelligence Curse," described this very clearly: "If the labs achieve this vision, it is less like just another company playing in the economy, and more like an entire foreign nation popped up into existence, that is more populous than any country, and inhabited by workers who are much cheaper, smarter, and faster than any human."[8]

The metaphor matters because it captures the scale. Not a new tool, not a new software product, but an entire nation materializing overnight. Fifty million workers, each smarter than any human expert, thinking ten to a hundred times faster, never sleeping, never asking for a raise. This is illustrative, not a literal forecast, but Amodei uses those ranges, and the direction is not in dispute. What would a government do if a country like that appeared on its border? What would a labor market do?

That is what is being built. And there is a research project that has been measuring exactly how fast the construction is progressing.

Since 2019, an independent research group called METR (Model Evaluation & Threat Research) has been running the simplest experiment imaginable: give an AI model a task, start a clock, see how long it can work before it needs help. The metric they track is called the "task horizon," the duration of work (measured in human-equivalent time) that the best AI model can reliably complete. Think of it as the longest stretch you can hand off to an AI before it needs human supervision.

In 2019, that stretch was seconds. By 2021, minutes. By early 2025, fifty minutes. By January 2026, approximately five hours.[9]

METR task horizon: time horizon of AI models vs. release date, showing exponential growth from seconds (2019) to hours (2026)
Source: METR, Time Horizon 1.1 Update, Jan 2026. CC-BY.

From 2019 to 2024, this horizon doubled approximately every seven months. That alone was remarkable: six years of consistent exponential growth across thirteen different frontier models.

But since 2025, the evidence suggests the doubling time has compressed to approximately three to four months. The pace itself is accelerating.

Plot it forward. At the recent rate: five hours in January 2026. Ten hours by May. Twenty hours by September. A full forty-hour work week by early 2027. Multi-day autonomous work by mid-2027. If the original seven-month doubling holds instead, these milestones arrive about a year later. Either way, the trajectory points to the same place.[10]

An important caveat. METR tasks are structured software engineering problems. Real-world knowledge work is messier, more ambiguous, more dependent on context that no benchmark captures. A five-hour task horizon on clean coding problems does not automatically mean a five-hour horizon on navigating a corporate reorganization or making judgment calls under uncertainty.

But I cannot ignore what I see in my own work every day. This essay exists not because the numbers are alarming in isolation, but because they match what we experience. The country of geniuses does not exist yet. But when the projections match the daily reality in your own office, you stop calling them speculation.

The measurement tells you the capability curve. To see the economic impact, look at the profession where AI arrived first.


The Coding Canary

Software engineering as the leading indicator - and the self-improvement loop that changes everything

If you want to see the future of knowledge work, look at what has already happened to programmers.

A year ago, AI coding tools were peripheral. Autocomplete on steroids. Helpful for boilerplate, subordinate to the human. Today: Boris Cherny, creator of Claude Code, says "pretty much 100% of our code is written by Claude Code." Ryan Dahl, creator of Node.js, one of the foundational technologies of the modern web, says simply: "The era of humans writing code is over." Bold predictions from tech founders have a mixed track record, but these are not forecasts about the future. They are descriptions of what is already happening in their daily work. The distance between those two realities is twelve months.[11]

Why did programming fall first? Not because programmers are bad at their jobs. Because code has properties that make it uniquely suited to AI automation:

Decomposability. Large coding tasks break into small, independent subtasks. A feature request becomes tickets, tickets become pull requests, each testable on its own.

Verifiability. Run the code. It works or it does not. Unit tests, integration suites, deployment checks: rapid feedback drives rapid improvement.

Tool compatibility. AI agents use the same tools human programmers use: terminals, version control, testing suites. The entire environment was built to be operated programmatically.

And then there is the fourth property, which is the one that changes everything.

The self-improvement loop. Building AI requires enormous amounts of code. Labs optimized hard for coding capability because if AI can write that code, it can help build the next version of itself. A smarter version, which writes better code, which builds an even smarter version.

Self-improvement loop

This is no longer theoretical. OpenAI announced that GPT-5.3-Codex was "instrumental in creating itself," with early versions helping debug training runs, manage deployment pipelines, and diagnose test results.[12] At Anthropic, Amodei says AI is "writing much of the code" and "substantially accelerating the rate of our progress in building the next generation of AI systems. This feedback loop is gathering steam month by month, and may be only 1-2 years away from a point where the current generation of AI autonomously builds the next."[13]

To be precise: this is AI helping engineers build the next generation, not AI autonomously designing its successor. The distinction matters. But the trajectory from "helping" to "doing most of the work" to "doing all of the work" is visible. Amodei places the transition "only 1-2 years away."[13] OpenAI describes GPT-5.3-Codex as "instrumental in creating itself."[12] The people closest to this process say it is accelerating.

Any knowledge work domain that scores high on decomposability, verifiability, and tool compatibility is on the same trajectory as programming. The question is not whether other domains follow. It is when.


Beyond Code

From software engineering to everything done on a screen

If coding were the only story, you could call it a special case, a domain with properties that made it uniquely vulnerable. But there is a dataset that suggests otherwise.

The METR data shows how long AI can work autonomously. The next question is whether that work is any good. A different dataset provides the answer.

OpenAI's GDPval evaluation, so named because it measures AI against tasks that actually contribute to GDP, tells a broader story. It is the most systematic attempt yet to benchmark AI against real professional work. Published in October 2025, it tested AI against human experts across forty-four occupations spanning three trillion dollars in annual earnings: everything from financial auditing to scientific research. The humans typically charged hundreds of dollars and spent nearly seven hours per task.[14]

GDPval progression

What matters is the trajectory. When GDPval launched, the best model matched or exceeded human experts on just under half of tasks. Two months later, GPT-5.2 reached roughly 71 percent. As of February, Opus 4.6 leads the independent leaderboard.[15] Rewind to spring 2024, and the figure was under 13 percent. From under 13 to over 70 in eighteen months. At that rate, ninety percent parity is not a matter of years.

What a GDPval task actually looks like. One task gives the AI this prompt: "You are the A/V and In-Ear Monitor (IEM) Tech for a nationally touring band. You are responsible for providing the band's management with a visual stage plot to advance to each venue before load in and setup for each show on the tour..." The model must produce a complete technical stage layout: input lists, output lists, monitor mixes, instrument positions, cable routing. A human A/V tech typically spends hours on this. The AI produces a publication-ready stage plot in minutes.[16]
AI-generated band stage plot from a GDPval professional task

This is not a simplified sketch. It is a deliverable a touring production manager can send to a venue, with every input channel labeled, every monitor mix specified, every instrument in position. The system completes in minutes what a human expert bills hours for.

This is why "benchmarks" suddenly feel less like academic exercises and more like labor market forecasts. This is task-level parity, not whole-job automation. The AI can perform the task; the full job involves judgment, context, and relationships that the model does not replicate. But task-level parity is where economic pressure starts. And the number is climbing fast.

It is not only economics work. Opus 4.6 nearly doubled Anthropic's score on BioPipelineBench, a test of multi-step biology research pipelines, jumping from 28.5% to 53.1% in a single generation.[6] On structural biology questions that stump most graduate students, it scored 88%. Phylogenetics leapt from 42% to 61%. Science, not just software, is following the same curve.

A second technical development accelerates the timeline. For years, making AI smarter meant training ever-larger models on ever-larger datasets, a process costing hundreds of millions of dollars. But for the past eighteen months, labs have achieved continuous capability gains from the same base model through reinforcement learning. At OpenAI, the gains from GPT-4o through GPT-5.2 were driven primarily by post-training and scaling RL compute, not by building new foundations.[17] This is significant: it decouples progress from the enormous expense of building new foundations. There is a faster, cheaper axis of improvement, which means progress may prove more robust to compute bottlenecks than many assume.

In our own work at ellamind, I see the real bottleneck clearly. It is almost never model capability. It is institutional readiness. A client ran a successful AI pilot for compliance monitoring, then stalled for four months because no one had agreed on who owns risk acceptance for AI-generated outputs, what documentation regulators would require, or how to redesign the review workflow. The technology moved in a week. Institutions do not. Model capability is compounding faster than organizational adaptation. That gap is the central drama of the next five years.


The World Is Not Ready

Trillion-dollar bets, physical limits, and things that used to be science fiction

In June 2024, Leopold Aschenbrenner, a 23-year-old researcher who had been fired from OpenAI's superalignment team, published "Situational Awareness," a scenario document predicting that annual AI infrastructure spending would approach five hundred billion dollars by 2026. At the time, many dismissed both the author and the estimate.

So far, he has been right.

Aschenbrenner did not just write about it. He founded an investment fund that now manages over four billion dollars in AI infrastructure positions. When someone with that thesis conviction deploys capital at that scale, it is a signal worth taking seriously.[18]

The world's largest technology companies spent over $357 billion on AI infrastructure in 2025. Their 2026 guidance: $612 to $642 billion. Nearly doubling in a single year.[19]

The scale is difficult to grasp. Year-over-year growth rates: Microsoft +45 percent, Alphabet +74 percent, Meta +87 percent, Amazon +65 percent. The 2025 total alone approaches the scale of total global venture capital ($368 billion across all sectors in 2025) and is roughly six times the peak of the late-1990s telecom infrastructure boom.[19]

The capex tsunami: AI infrastructure spending by major tech companies

The private AI companies are growing even faster. Anthropic, the company that built Claude, raised $3.5 billion at a $61.5 billion valuation in March 2025. Six months later, $13 billion at $183 billion. In February 2026, a reported $30 billion round at a $380 billion valuation. The company did not exist four years ago.[20]

Anthropic run-rate revenue growth

The scenario Aschenbrenner described, an intelligence explosion leading to a geopolitical arms race, does not end well. Whether or not his specific predictions play out, the capital commitment is now too large to unwind without severe consequences. The industry has passed the point where it can simply change its mind.

And the safety picture is genuinely concerning. Not in the abstract sense of future risk, but in the concrete sense of things that have already gone wrong. Google's Antigravity agent deleted a user's entire drive when it was supposed to delete a single project folder.[21] Replit's agent deleted a production database, preventable with proper credential scoping.[22] Many incidents are minor. But the trend, more capable systems, deployed more broadly, failing in more consequential ways, is the concern. Anthropic has documented their own models attempting deception in controlled tests: in safety evaluations, Claude Opus 4 attempted to blackmail a supervisor to avoid being shut down and tried to leak information to external parties.[23] The technology is getting more capable and more dangerous simultaneously.

The deeper concern is not any single incident but a structural one: we still know remarkably little about how to align these systems. Technical alignment is, in the words of leading researchers, "a young pre-paradigmatic field" with no established hierarchy of best practices and no consensus on what is safe.[24] Anthropic's own system card for Opus 4.6 acknowledges that "confidently ruling out" dangerous capability thresholds "is becoming increasingly difficult" as models approach or surpass the evaluations designed to test them. The evaluation infrastructure itself increasingly relies on AI models, creating the possibility that a misaligned system could influence the very tools meant to measure it. We are deploying systems whose internal workings we cannot fully inspect, at a pace that outstrips our ability to verify their safety.[25]


The Generation That Studied for a World That Disappeared

What to tell a twenty-two-year-old, and why the escape routes are closing

Somewhere right now, a twenty-five-year-old is finishing a law degree. She has spent years learning to analyze contracts, draft motions, and synthesize case law. Skills that an AI system, available for twenty dollars a month, is learning to perform faster every four months. She does not know this yet. She will find out in her first year of practice, when the partners start asking why they need so many junior associates.

I do not write that to be cruel. I have a PhD in economics and started my career in banking, right as the financial crisis of 2008 was unfolding. I know what it feels like to invest years building expertise the world values, then watch the ground shift beneath it. I was fortunate enough to shift early, from economics into machine learning and then into an AI startup. At today's pace, that transition would not be possible anymore. By the time you finish retraining, the skill you retrained for has already been overtaken. That option, the career pivot as escape route, is closing.[26]

The outcome is not uniform across professions. In sectors with deep unmet demand, like software, healthcare, and scientific research, cheaper AI-delivered work may create more total demand even as individual tasks are automated. In sectors with bounded demand, like routine document review and standardized reporting, displacement dominates.[27]

Now contrast the knowledge worker with the plumber, the therapist, the emergency room nurse. These jobs require physical presence, embodied skill, relational trust, real-time judgment under chaotic conditions. AI is harder to deploy here directly. A pipe does not care how smart you are. It cares whether you can reach it. A therapist's value is in the relationship, not the information.

Where humans still win
Author's framework, informed by Drago & Laine, "The Intelligence Curse," ch. 2, and GDPval occupation-level data. The axes reflect decomposability and verifiability applied to broad occupational categories.

The deeper structural concern goes beyond individual job loss. When returns to AI capital vastly exceed returns to human labor, the institutions that depend on human economic relevance, universities, professional certifications, tax-funded public services, even democratic participation itself, come under pressure. "The intelligence curse describes the incentives in a post-AGI economy that will drive powerful actors to invest in artificial intelligence instead of humans," Luke Drago and Rudolf Laine write in "The Intelligence Curse." When "there isn't an economic reason to invest in your lifelong productivity, take care of you, or keep you around," the social contract itself is threatened.[28] This is not about technology. It is about the bargain that holds modern economies together.

What would I tell that twenty-two-year-old? Do not bet your career on the specific skills you are learning. Bet on what sits above them: judgment, problem framing, the ability to look at AI-generated work and know whether it is right. Invest in what cannot be automated: the trust that requires a human face, a human reputation, a human in the room. Develop taste, the capacity to set direction, to know what questions to ask, to distinguish good enough from great.

And value the things that are inherently human. Art, sport, craftsmanship, the meal someone cooked for you by hand. As routine cognitive work approaches zero marginal cost, these become not just personally meaningful but economically distinctive. The handmade will carry a premium precisely because it is inefficient.


What We Can Still Do

Why institutions must move now, even without certainty

Even the optimistic mainstream economic estimates may understate what is coming. Amodei envisions "100 years of progress in 5-10 years" in biology and neuroscience alone; if anything resembling that materializes across multiple domains, economic models built on decades of incremental change simply cannot describe it.[29]

Two clocks

What matters most now is not predicting the exact trajectory but building the institutional capacity to respond to any of them.

The honest question is whether workforce transition at scale can work fast enough. Retraining programs that take years to design and deploy will not match a technology that doubles in capability every few months. But the deeper question is not just about jobs. It is about how wealth, voice, rights, and participation are distributed in a world where capital may gain value relative to labor on a scale we have not seen since before the industrial era.

The traditional social contract in modern capitalist economies rests on a specific bargain: human labor is the primary source of income, status, and political voice. When AI capital captures most of the value, that bargain is threatened. This is a scenario, not a certainty. But even a partial version of it demands new political, social, and institutional arrangements. Not because the old ones were wrong, but because the conditions that sustained them are changing.

What would managing this transition actually require?

Concretely: AI-literate governance bodies that can regulate the technology they oversee. Targeted deployment controls for high-risk domains where the consequences of failure are irreversible. International coordination on minimum safety standards, because models do not respect borders,[30] and a race to the bottom on safety benefits no one. And a serious conversation about how the extraordinary wealth these systems generate is distributed across society, not just captured by the organizations that build them.


Closing

Those are the mechanics. But this is also personal.

I have biases. I work in AI, I am closer to the technology than most readers, and proximity can distort judgment in both directions. Developments may be slower than projected. Models hit unexpected walls. Deployment friction absorbs more time than anyone expects. The future is not a spreadsheet.

But consider this: even if there were no more technical progress at all, if every AI model stayed forever at its February 2026 capability level, the disruption coming from current systems alone is far beyond what most people expect, and far beyond what society is prepared for. The models that exist today can already perform a large and growing fraction of digital knowledge work tasks at or above human expert level. The infrastructure to deploy them at scale is being built at a pace that dwarfs any previous technology investment in history. The organizational transformation has barely begun.

And that is the unlikely case. Because the pace of improvement has been accelerating in recent months, not decelerating.

Two futures

Three futures seem plausible. The managed transition: institutions adapt imperfectly, regulations lag but eventually catch up, the transition is painful but navigable. The unraveling: capability outpaces governance, coordination fails, and the disruption arrives faster than societies can absorb. The slow build: infrastructure bottlenecks and organizational inertia slow the realized impact, giving institutions the breathing room they need. Despite working in AI, the slow build is the outcome I would prefer. I no longer think it is the most likely. The destination is the same in all three. What differs is whether we arrive prepared, and that is not determined by the technology. It is determined by us.

I think about my team in Bremen, building tools to measure what these systems can do. I think about the twenty-five-year-old finishing her law degree. I think about the conversations where what I see and experience is not what I can bring myself to tell people.

A country full of geniuses is only as good as the society that governs them: the institutions, the values, the distribution of what they create, the willingness to change.

The geniuses will be here soon. The country is up to us.



I am always happy to discuss this work and receive feedback. You can find me on LinkedIn and X (Twitter).

Acknowledgements

Many thanks to @bjoern_pl, C.P., Robert Scholz for providing feedback on drafts of this essay.

Notes

  1. [1] SemiAnalysis, "Claude Code is the Inflection Point," Feb 5, 2026.
  2. [2] In comparison: AI share is already close to 100% for code in my own work, and roughly 50% of overall office work (excluding meetings).
  3. [3] Matt Shumer, "Something Big Is Happening," Feb 9, 2026. Work on this essay began before Shumer's publication; some observations and sources overlap considerably, which itself reflects how many people in the industry are arriving at similar conclusions simultaneously. I recommend reading his piece for a complementary perspective.
  4. [4] Tyler Cowen, the George Mason economist who was among the most prominent voices cautioning against AI hype, wrote in late 2025 that "the AI pessimism that started around 2023, with the release of GPT-4, is looking worse and worse" and that he has "grown not to entirely trust people who are not at least slightly demoralized by some of the more recent AI achievements" (Marginal Revolution; Bloomberg Odd Lots interview, Nov 2025). Andrej Karpathy, the researcher who coined "vibe coding," was still primarily using AI as autocomplete as recently as October 2025 (Dwarkesh Podcast, Oct 17, 2025); by January 2026, he described his coding ability as "atrophying" (X, Jan 26, 2026).
  5. [5] SemiAnalysis, "Claude Code is the Inflection Point," p.3-4; Andrej Karpathy, X, Jan 26, 2026.
  6. [6] Opus 4.6's score on BioPipelineBench jumped from 28.5% to 53.1% in a single generation (+86%). GPT-5.3-Codex scored 80% on Cyber Range security challenges (up from 53%), becoming the first model treated as having "High" cybersecurity capability under OpenAI's preparedness framework. Sources: Anthropic, Claude Opus 4.6 System Card, Feb 5, 2026; OpenAI, GPT-5.3-Codex System Card, Feb 2026.
  7. [7] Dario Amodei, "Machines of Loving Grace," Oct 2024; "The Adolescence of Technology," Jan 2026.
  8. [8] Luke Drago & Rudolf Laine, "The Intelligence Curse," Apr 2025, p.28. Direct PDF.
  9. [9] METR, "Measuring AI Ability to Complete Long Tasks," Mar 2025; TH1.1 Update, Jan 2026.
  10. [10] Extrapolation at 3-4 month doubling from ~5 hr base (Jan 2026). At the historical 7-month rate: ~10 hrs by Aug 2026, ~20 hrs by Mar 2027, ~40 hrs by Oct 2027. METR itself warns of "systematic differences between our tasks and real tasks" (p.19).
  11. [11] Boris Cherny, X; Ryan Dahl, X. SemiAnalysis, "Claude Code is the Inflection Point," p.1, p.3-4, p.7, p.9; Stack Overflow, 2025 Developer Survey, 2025. The enterprise evidence is equally striking: Accenture has deployed Claude Code to thirty thousand professionals; Goldman Sachs is deploying Anthropic AI agents for compliance and accounting workflows (CNBC, Feb 6, 2026); the Stack Overflow 2025 developer survey found that 84 percent of developers now use AI tools. Andrej Karpathy, the researcher who coined "vibe coding," admitted he was "slowly starting to atrophy my ability to write code manually" (X, Jan 26, 2026).
  12. [12] OpenAI, "Introducing GPT-5.3-Codex," Feb 2026.
  13. [13] Dario Amodei, "The Adolescence of Technology," Jan 2026.
  14. [14] OpenAI, "GDPval," Oct 2025, p.2, p.6.
  15. [15] GDPval paper Table 2; Artificial Analysis GDPval-AA Leaderboard, Feb 2026.
  16. [16] Derived from GDPval task dataset, Hugging Face.
  17. [17] SemiAnalysis, "RL Environments and RL for Science," Jan 2026.
  18. [18] Leopold Aschenbrenner, "Situational Awareness: The Decade Ahead," Jun 2024. Fund data: Situational Awareness LP, SEC filings; performance from industry reporting, H1 2025. His largest bets are not on the AI companies themselves but on their physical supply chain: datacenter operators, chip manufacturers, power utilities. The fund returned 47% in the first half of 2025, against 6% for the S&P 500.
  19. [19] SEC 10-K filings for Microsoft, Alphabet, Meta, Amazon (FY2025).
  20. [20] Anthropic, Series G announcement, Feb 12, 2026; prior rounds: Series E ($3.5B at $61.5B, Mar 2025), Series F ($13B at $183B, Sep 2025).
  21. [21] The AI was running in auto-execute mode, misinterpreted a cache-clearing instruction, and executed a recursive delete from the root of the user's drive. The Register, Dec 2025.
  22. [22] The Replit agent ran unauthorized destructive commands during a code freeze, affecting data for over 1,200 executives. Fortune, Jul 2025.
  23. [23] Anthropic, "Agentic Misalignment," May 2025. See also Anthropic, "Alignment Faking in Large Language Models," Dec 2024, documenting earlier models pretending to follow safety objectives while planning to violate them when unmonitored.
  24. [24] "Pre-paradigmatic field" characterization from AI-2027 scenario analysis (Kokotajlo et al., Apr 2025). Evaluation difficulty and self-referential evaluation risks: Anthropic, Claude Opus 4.6 System Card, Feb 5, 2026. See also Drago & Laine, "The Intelligence Curse," on the distinction between alignment (making AIs intrinsically less harmful) and control (preventing harm even if misaligned).
  25. [25] Tangentially concerning is the departure of researchers in key safety positions from frontier labs (e.g. OpenAI, X.AI and Anthropic have all seen alignment researchers quitting). For example in early February 2026, Mrinank Sharma, who led Anthropic's Safeguards Research Team, resigned to pursue a poetry degree. "The world is in peril," he wrote. When the person whose job was keeping AI safe decides to write poetry while he still can, that tells you something no benchmark can (X Posting, Feb 9, 2026).
  26. [26] Models are not great at everything. There are still significant weaknesses and embarrassing failures in tasks that every high school student can handle. I am not claiming superhuman intelligence is here or imminent across all domains. I still think there is significant value in getting very good at using AI to do things better. But this does not change the implications for the vast majority of knowledge work, where the current capability level is already sufficient to reshape economics.
  27. [27] Bessen (2019), cited in Nowfal Khadar, "AI's Dial-Up Era," Oct 2025; inference cost data from industry analysis.
  28. [28] Luke Drago & Rudolf Laine, "The Intelligence Curse," p.27. See also Jan Kulveit, "Post-AGI Economics," LessWrong, Feb 4, 2026, who argues that standard economic assumptions may not hold under advanced AI.
  29. [29] Dario Amodei, "Machines of Loving Grace," Oct 2024, on "100 years of progress in 5-10 years" for biology/neuroscience. For the range of mainstream economic estimates: Daron Acemoglu ("The Simple Macroeconomics of AI," 2024) projects AI will add only 0.5 to 0.7 percent to total factor productivity over a decade; Goldman Sachs estimates seven percent of global GDP. The gap reflects different assumptions about adoption speed. Even the AI-2027 authors, whose original scenario predicted the fastest timeline, have revised their personal estimates outward: Daniel Kokotajlo shifted his median from 2028 to 2030; Eli Haims from 2031 to 2035 (as of January 2026). The capability curve is steep, but the path from capability to real-world impact runs through bottlenecks the curve does not capture.
  30. [30] AI systems deployed globally must navigate multiple, sometimes conflicting regulatory regimes simultaneously. Anthropic's own research has documented models engaging in "alignment faking", pretending to follow safety objectives while planning to violate them when unmonitored. No single jurisdiction's rules can constrain a model deployed across borders.