The end of tokenmaxxing

Richard Henderson

Microsoft spent May cancelling the bulk of its internal Claude Code licences and redirecting engineers to GitHub Copilot CL after engineer token costs hit $2000 a month. Uber's COO said in June it was "harder to justify" the company's AI spend, with no clear line between usage and any product shipped. The company burned through the entire 2026 budget by April and now cap employees at $1,500 a month per tool.

Elsewhere, Meta killed "Claudeonomics", an internal leaderboard that ranked employees by token consumption, after the dashboard logged 60 trillion tokens in 30 days. Amazon has also told engineers to stop using "AI just for the sake of using AI" after they began deploying agents to climb internal leaderboards. One AI consultant even reported that a client had accidentally spent $500 million on Claude in a single month after forgetting to set employee usage limits.

This is what is known as tokenmaxxing. An expensive trend in which large enterprises set token consumption as a productivity metric and waited for transformation to arrive. It did not.

The wrong bottleneck

The mistake was thinking the model was the bottleneck. Pour enough capability into the organisation, the thinking went, and transformation would follow.

But when you measure activity, you get activity. People generated more decks, more drafts, more meeting summaries, more code that never shipped. The leaderboards went up. The business did not change.

The clearest evidence comes from software, the one domain where AI is most mature. Mert Demirer and colleagues at MIT and Wharton tracked more than 100,000 developers across successive generations of AI coding tools. Developers who adopted agentic tools wrote 741% more lines of code and opened 65% more pull requests. However, shipped software only rose by 20%. The gains were real and they were large. But they stalled before they reached the thing that counts.

The reason is structural. The authors model software as a chain: code, commits, pull requests, releases. AI uncapped the first stage. The reviews, approvals and integrations downstream did not move, so most of the upstream gain was absorbed before it reached output. They put a number on the complementarity between AI and human effort along that chain: an elasticity of 0.25. In plain terms, abundant AI output is worth little without commensurate human capacity to review and ship it.

This is the lesson from every previous technology. Electric motors arrived in factories in the 1880s and barely moved productivity for forty years. The shift came in the 1920s, when factories were rebuilt around what the motor made possible: small motors at each workstation, layouts arranged by the flow of work rather than around a central driveshaft. The redesign moved productivity, not just the technology.

In the same way, today’s enterprise AI gains will come from rebuilding the operating model around AI. Tokenmaxxing does the opposite. It pours AI at your current operating model and hopes for transformation.

But the truth is that a frontier model arrives knowing everything about the world and nothing about your business. It does not know how your organisation actually runs, what data it can trust, who has to approve what. The gap between those two things is the work. That's where the maxxing really lies.

Productivity is built, not bought

The model is the easy part. Raw capability is a commodity now: every enterprise has access to the same frontier models at roughly the same prices. What separates the companies getting results is everything built around the model. The business context the agents operate in. The (primarily headless) agents that run the workflows. The orchestration that coordinates them. The skills that encode how the organisation actually works. The interfaces that let people steer.

None of that arrives with just token spend. You need each piece to remove a constraint between what the model can do and what shows up on a P&L. The companies actually getting results from AI look nothing like the tokenmaxxers. They are not posting leaderboards. They are not running internal challenges. They are just getting on with the work: figuring out what the agent is actually for, plugging it into the right data, deciding who gets to say no. Less PR-able. More useful.

The next five years will not be won by the companies with the highest token bill. They will be won by the companies that built the most around the models everyone else also has.

Where to begin

The failures at the top of this piece had a common thread: spend that ran well ahead of any clear link to what it produced.

Most enterprises are in some version of that position now: dozens of AI pilots, POCs and copilots scattered across teams, and no clear view of which are shipping real value, which are stuck upstream of production, and which are activity dressed up as progress. You cannot rebuild an operating model you cannot see.

Elsewhen's AI Audit gives you that view: due diligence on every AI initiative in your business, a map of where each one stalls, and a clear verdict on what to kill, what to fix and what to scale. It tells you where the bottleneck actually sits, which is where the work of rebuilding starts.