In my first post on this blog, AI Pricing Uncertainty, I worried that companies would deploy and quickly become reliant on AI before fully understanding how the pricing of this technology would evolve. The natural assumption is that prices will fall rapidly, and Silicon Valley has made plenty of noise to push companies and users to adopt AI as fast as possible (surprising, I know).
But things are moving quickly, and the simple pricing structure we've known since the start of the LLM revolution is about to get more complicated. Consumer subscriptions and API access to frontier models remain affordable considering what these tools can produce, but prices have been going up — especially API costs, with both Anthropic and OpenAI aligning their enterprise plans with API token rates this spring. For this reason, and a series of other factors I will cover below, I want to revise my initial prediction with a more nuanced take.
The hard constraints on the hyperscalers are well known and still apply: energy, and TSMC's capacity to produce high-performance chips for everyone at once. There are others too — memory shortages, political resistance to new data centres and to AI generally, and the geopolitical sword of Damocles — any of which could significantly slow the arrival of new capacity.
On the demand side, hyperscalers and investors are betting — and I agree — that we're on the verge of an explosion in demand, as AI's most powerful capabilities are revealed to the public once it has access to a person's or an organisation's specific data and context. iPhone users will get their first real taste when Siri can finally connect messages, email, calendar, and the web, and deliver something personalised. Once that happens, AI will become indispensable to most people, as it already is for some of us. The same will happen for businesses when providers like Microsoft make it easy to deploy agents that can access an organisation's data wherever and in whatever format it is stored, and operate securely within a container. Demand will grow at a rapid pace, probably for decades, as AI is integrated into every sector of the economy — drawing on ever larger sets of data and consuming more tokens in the process.
What I underappreciated in my first post is the tremendous incentive to solve this collision between booming demand and constrained supply. Software improvements have already cut the cost per token — routing queries to the cheapest model that can handle them, for a start — and no doubt many more such optimisations remain to be found. On the hardware side, chips designed specifically for inference, such as Google's TPUv7, are improving data centre efficiency further. Most intriguing of all, Huawei has announced an approach to chip-making that forgoes ever-smaller transistors without sacrificing performance. Even if Huawei cannot deliver on its claims, the point stands: innovation is constant, and it can change the equation at a moment's notice.
The model landscape is shifting too. Not every new model is built to run at the frontier: some run locally, others deliver near-frontier performance at a much lower price. And edge computing is coming back in a big way, preparing for a future where day-to-day tasks and simple operations run free on our laptops and phones.
All of this changes how I think AI pricing will evolve for companies. Because AI is a near-direct byproduct of energy consumption, and the infrastructure that produces it will remain constrained for the foreseeable future, there will be pressure to optimise its use the way we optimise fuel today. Suppliers will be incentivised to differentiate their best models from the rest, and will be able to charge more — perhaps much more — for them if they can prove superior output. Efficiency gains will put downward pressure on frontier prices too — but the frontier is, by definition, whatever runs at the limit of available compute, and so far demand has absorbed every gain the moment it appeared. A business might consider paying top dollar for such models for consulting work, strategy, or other critical tasks. For day-to-day operations, I expect mid-tier models to become commodities, competing on price while offering similar performance, used for cloud operations and anything else that cannot be done on device. In that segment of the market, the durable advantage for suppliers won't be the model at all: it will be the product integration (the "harness"), the ecosystem, and the lock-in that comes with both.
Commodity will not mean interchangeable, though. Getting these models to generate real value will require considerable training and tuning inside each company, and deep embedding into the organisation's data ecosystem — work that makes a deployment very sticky even when the underlying model hardly matters. And that customisation will be a large source of revenue in its own right. SAP built an empire on endless customisation; AI suppliers will do the same.
So here is the revision. My first post argued that prices would rise for the next couple of years, until enough capacity comes online — not indefinitely. I still think that will be the case, but prices won't move in one direction: they will stratify. Frontier capability will get more expensive, the middle will become a commodity (though a sticky one, as covered above), and the edge will cost what it costs to operate.
Two main risks remain, however. The first is competitive: cash-rich players will be able to pay for frontier models in the areas where the raw intelligence of the model matters — strategy, research, complex analysis — and the advantage they buy there could raise uncomfortable questions for competition in our market economies, producing even more winner-takes-all dynamics than the mobile and Web 2.0 era did. The second is about control: companies will need to keep their internal knowledge and IP from leaking into the models they train, and preserve the ability to walk away from an AI supplier without losing the capacity to operate. There will probably be a plethora of good models at different price tiers to choose from. But the dependency that builds up around their implementation — and around the suppliers behind them — is what organisations must think about, and guard against.