Who Captures the Value If AI Inference Becomes Cheap?
The economic question hiding behind AI's falling prices
This post may be too long for your email. To access the full post, click the title above. Like the name of my publication, this is not investment advice and usual disclaimers apply.
I’ve seen a lot of the discourse regarding AI and it seems like people either aren’t really asking the right questions, or they’re turning a blind eye due to AI being the next big thing.
It’s pretty obvious that valuations are stretched relative to historical vaulations but not many people other than Gary Marcus, Ed Zitron, Jim Chanos and maybe a few others have been questioning the economics of AI.
Whether we like it or not, AI is a technology that’s here to stay. It’s been far too disruptive in such a short period of time for it to just disappear into the background. But that doesn’t mean I don’t have questions around inference economics and the general business model of AI later, as you’ll see throughout the article.
For those who are unaware and/or don’t follow tech, AI is much more than just LLMs. AI is seeing real productivity benefits where there are marginal improvements to efficiency and productivity. It remains to be seen if every sector will benefit from these productivity gains. Companies like Google, Adobe and Salesforce are implementing all of this successfully at scale because they’ve got the user base and they can incrementally add value to already sticky products.
On the flip side, the market has gotten ahead of itself. The percentage of companies in the Nasdaq that have thin revenues and/or aren’t profitable is the highest it’s been in a while. There is inevitable speculation for those wanting to find the next Nvidia, for example, the quantum sector is pure speculation as of now. There are companies like Oklo (a nuclear energy company) that went public with a $17bn market cap and no revenue… it’s things like this that cause speculation. One could say “but Google went public and didn’t make a profit yet.” Google were already a profitable company before going public in 2004. It’s not a requirement for companies to make a profit before going public, but at the very least they should have a viable pathway to profit like Amazon did.
Everyone sees the tip of the iceberg, this is generative AI and AI agents which people have been talking about the most since the release of ChatGPT in late 2022 and other LLMs over the past 3-4 years. AI is much bigger and includes the following: machine learning, deep learning, neural networks, computer vision, natural language processing, predictive analytics, speech recognition, agentic AI…
Everyone else seems to be still going on about the circularity in AI which is a valid concern, but it is kind of getting tiring and just to put it to bed, it’s by design if you know and understand the semiconductor value chain.
As a reminder: inference is the process through which a model creates an output.
Bloomberg reported on 21st December that OpenAI’s compute margins improved from 35% in January 2024 to 70% as of October 2025. This reflects genuine progress within the AI space in such a short period of time, especially in the face of rising compute costs due to building out their frontier models.
As per Bryce Elder at FT Alphaville:
If the above data is accurate – which there’s no guarantees, just to reiterate – then it’s only right to question the business model of OpenAI and all the other general-purpose LLM vendors. Eventually one of two things will happen: running costs will either have to collapse or customer charges will rise dramatically.
What they do will become less unique as Google and Microsoft continue creating and improving their respective LLMs. At the current pace it is not a stretch to imagine that soon OpenAI ‘s competitive and differentiating advantage will dissipate.
On the contrary, The Information reported that OpenAI’s operating losses in H1 2024 reached $7.8bn on $4.3bn revenue. Company projections suggested that losses could reach $14bn in 2026, which is just over 2x what their H1 2024 losses would be. Something to be mindful of is that this is an apples-to-oranges comparison since 2026 projections are forecasted for the full fiscal year relative to the H1 2024 losses which is only a half-year analysis. The technology has developed drastically since its first iterations in late 2022 and so has the unit economics of serving individual queries. Notwithstanding, the losses continue to accelerate instead of contract…
This contrast reveals the current state of inference economics and AI companies today. This is the theme I’m going to explore throughout the article. One side (inference economics) appears to be solving itself via rapid technological breakthroughs whereas the other side (business model) is a problem that remains unsolved, even as the technology itself advances. The improving margins from OpenAI and Anthropic implies that core technology constraints are being overcome. The widening losses indicate many well-funded AI companies might not survive long enough to reap the benefits of such improvements.
When we look at the scale of these losses with historical context, it makes more sense. Amazon burned through ~$3bn from 1994-2002 before achieving sustained profitability. Uber accumulated losses of ~$31bn from 2009-2022. OpenAI’s projected path – which is over a smaller time period than Amazon, Spotify, Tesla and Uber – shows cumulative losses projected to reach ~$143bn by 2029, according to the snapshot from Deutsche Bank Research Insitiute. Even if we account for inflation and the capital-intensive nature of AI infrastructure, these numbers represent levels of cash burning that I’ve never seen. All in the name of demonstrating paths to profitability. I’d honestly love to know what OpenAI’s burn rate is…
The above creates the premise for observing AI (inference + deal) economics in 2026. As I documented in my previous analyses, the AI Ouroboros piece highlights the circularity of tech investments whereby Nvidia invests AI companies that then spend said money on Nvidia H100, H200, B200 & A100 chips, constructing self-referential loops that juice up valuations across the sector. My primer on conduit debt structures presented how even profitable hyperscalers use complex off-balance-sheet financing to fund the AI capex boom. The natural next step is to understand the illogicality behind improving margins and growing losses.
The Technology vs Business Story
The quoted compute margin of 70%, while exciting, needs accurate elucidation. It’s far too easy to lie with statistics, especially when looking at them in isolation. This metric specifically measures margins on compute infrastructure costs, meaning for every $1 that OpenAI collects from API usage or ChatGPT subscriptions, 30¢ goes towards GPU time, energy consumption and direct infrastructure overhead. The other 70¢ is profit available for anything else the company does.
J.P. Morgan reports that the cost efficiency frontier has dropped by 99.7%, from $37.50 per million tokens for GPT-4 in March 2023 to $0.14 for GPT-5 Nano by August 2025. The balanced frontier, which optimizes both capability and cost, has improved just as much. SemiAnalysis finds that algorithmic improvements alone can boost efficiency by four to ten times each year. Nvidia, using SemiAnalysis’s InferenceMax v1 benchmarks, shows that software updates for the B200 cut inference costs fivefold in just two months, from $0.11 to $0.02 per million tokens for open-source models. This highlights how quickly efficiency can improve, even after new hardware is released.

The chart also shows an important trend. As costs drop along the efficiency frontier, models are becoming more capable, which often means they need more compute for each query. The gap between the cost efficiency frontier (green dotted line) and the capability frontier (orange dotted line) shows that delivering top performance is still expensive, even as basic capabilities become cheaper. This matters because customers now want and expect the latest performance, not just last generation’s efficiency.
Zeoli’s analysis separates customer-level utilisation from platform-level utilisation, and this difference explains a lot about which AI business models can be profitable. Individual customers usually have low utilisation because their workloads are unpredictable. In contrast, platforms that serve hundreds or thousands of customers can reach 60% to 85% utilisation by spreading out demand over time. Hyperscalers and multi-tenant APIs benefit most from this, unlike AI companies that serve only direct customers or run their own dedicated infrastructure.
Compute margins also fail to capture the infrastructure layer; that also creates additional economic gravity. Chris Zeoli (a partner at VC fund ‘Wing’ & author of Data Gravity on Substack) stated that customer-level utilisation typically sits between 10% and 40% for a lot of AI companies, all the more so when systems are well engineered. This happens since capacity must be provisioned for peak traffic instead of averages. It’s worth noting that demand is sporadic and diurnal, and latency requirements focus on headroom over cost efficiency.
OpenAI’s affiliation with Microsoft likely has a mix of reserved capacity for more predictable workloads as well as elastic capacity for less predictable workloads. As Zeoli documents, any system with strict service level agreements and latency requirements must account for spurts instead of medians, forming innate inefficiency. Sophisticated batching and scheduling cannot fully eliminate such waste when serving user-facing applications with arbitrary demand patterns. If OpenAI achieves 70% compute margins but operates at ~40% infrastructure utilisation because of the irregularity of consumer and enterprise workloads, a simplified view implies effective margins drop to ~28%1 (70% compute margins × 40% infrastructure utilisation). Reality may paint a different picture since infrastructure costs are mostly fixed irrespective of utilisation. Operating at ~40% capacity insinuates paying full costs while generating only ~40% of potential revenue. The fixed-cost burden highlights why improving does not automatically convert to overall profitability.
For AI companies to reach overall profitability, infrastructure utilisation should be high, and revenue should grow large enough that the ~28% margin covers all operating expenses. The math becomes more challenging when one considers what sits outside those compute margins.
What Good Unit Economics Cannot Fix
AI companies rushing to develop AGI can’t lower spending on research just because the newer models became more efficient to serve, if only it was that simple, they’d be profitable already. OpenAI employs a lot of researchers, engineers and safety specialists whose sole objective is to develop future models and its future capabilities over current operations. It’s no surprise they announced “Code Red” recently. As their ambition scales, so does their costs and pressure from their competitors. When rivals raise billions in new funding alongside fast tracking their own research programs. Irrespective of improving unit economics, the pressure to match or exceed competitors spending remains strong.
As per J.P. Morgan data, 44% of US businesses now maintain paid AI subscriptions as of September 2025, up from minor adoption two years prior. Despite that, the same data discloses only 9% of businesses report actual AI adoption after experimentation. This discrepancy between subscriptions and actual deployment is exacerbated further – among firms with more than $1bn in yearly revenue, 55% are piloting AI agents but 42% have deployed them in production. Who knows, are firms more cautious than expected? Is it a case of misunderstanding how to apply AI in the workplace for meaningful results? Or is it something else entirely?
The deviation between subscription spending and production deployment indicates current revenue is more likely reflecting experimentation budgets instead of committed operational spending. And this is where the uncertainty about growth rates sustaining the capital requirements comes from. If they were to model it out further with projections, in the next 2-3 quarters, we could see the gap between models converge.
The same JP Morgan analysis also suggests that AI systems are now outperforming humans in various facets like image recognition, language understanding and reading comprehension. Other areas like predictive reasoning, handwriting recognition and speech recognition. Delivering such capabilities profitably and at scale requires costs that customers can afford as well as margins that investors can justify. The question is (1) how can AI companies deliver new capabilities profitably & at scale whilst maintaining affordability for customers? It sounds somewhat contradictory although not impossible and (2) when does this happen, if ever?
The Test-Time Compute Trap
Improving compute margins are also masking a shift in how AI models achieve superior performance. As per J.P. Morgan’s analysis of METR benchmarks, the time taken by AI models to autonomously complete tasks have been doubling around every 7 months. Frontier models like GPT-5 can now handle more complex tasks, this is evidenced in slow and then sudden increase in time taken. In 2019, it was basically 0 and now in 2025, it can handle tasks that take humans 27 minutes to complete. Claude 4 Sonnet and o3 also show comparable improvements in task completion.

Such progress comes with a large computational cost that some customers don’t notice, and standard pricing metrics obscure. As a result, AI models end up using a lot more inference compute per query, producing extensive internal reasoning chains before generating final outputs – internal reasoning is the bit where an AI model will “think” before giving you an answer; the longer your query, the more inference compute it uses and the longer it takes to give you an answer. The previous sentence is exemplified when a customer types a query to o3 or GPT-5 that is complex (let’s say a long multi-step math problem or writing a long letter), it then performs expensive internal computation during inference to improve the outputs on difficult problems. This results in 10x to 100x more consumption of tokens than a simpler query (like asking to answer a question about using a financial metric or something similar), even though the user only sees the end result. You know what is recursively self-learning? A person. You want to hire a person.
The tech industry celebrating falling inference costs based on achieving the equivalent of GPT-3.5 level performance, but the frontier has moved. Users – whether free or paid – are now expecting GPT-4 level quality and are demanding better reasoning capabilities of o1-class models. Stanford HAI’s 2025 AI Index Report provides evidence that inference costs have dropped 280-fold for GPT-3.5-level performance between November 2022 and October 2024. Then again, models that employ time-test scaling end up consuming multiple tokens to solve complex problems. This makes newer models more computationally expensive than previous generations.
So this means the cost per query may be increasing for frontier models even as the cost per token declines which then creates a dynamic whereby improving technology makes AI economics more challenging, defying traditional business logic. AI companies are now at the precipice of a paradox – they either maintain current while increasing inference costs to deliver better results or hold inference costs constant and risk dropping out of the race against competitors who invest more computation per query. Because of competitive dynamics pushing towards the former, it puts pressure on already challenging unit economics.
The VC Treadmill, Or A Subsidy Window
Venture capital investors can point to tangible progress on unit economics as evidence that AI technology is inching closer toward profitability; this is due to improving compute margins (even though this isn’t applicable to all AI startups). It’s the narrative in the previous sentence that has somewhat supported such exuberant valuations, so much so that we’re in what I’m going to call a “shoot first, ask questions later” kind of market. The WSJ reported on 18th December 2025 that OpenAI is in talks to raise even more money, this time it’s up to $100bn at a valuation of $830bn.
Such a valuation hinges on the belief that OpenAI will eventually get to and sustain profitability at scale. If their compute margins carry on improving - at 70% or even more – eventually revenue will grow to a large enough amount that they can cover their operating costs and generate net profits. There are 2 questions worth asking here:
when does this inflection point arrive?
how much capital will they consume before reaching it? / What is their burn rate?
OpenAI’s competitor, Anthropic, will likely need to continue raising money just to keep current operations at scale. This is in spite of them already raising money from Amazon and other VC funds. The smaller & even well-funded AI startups like Cohere, Mistral and others have a similar fate. Looks like they’ve got to burn even more money in this race to profitability. Now you see why I’ve been asking about OpenAI’s burn rate.
This is how the “treadmill” is created. Every funding round has to be just large enough to extend runway for another 18-24 months. Then the round succeeding that requires even higher valuations to avoid diluting existing investors. In order for these valuations to be justified, the startup in question has to show sufficient progress. In the case of AI startups, that means accelerating spending on research, products and/or market development. Accelerating at such a fast rate inevitably (1) consumes the newly raised capital faster and (2) shortens the runway for the next raise.
This only works so long as capital remains available at buoyant valuations. Tech innovations and improvements from new iterations alongside market growth can continue to support to investor enthusiasm but the system is essentially unstable. What if investor sentiment shifts? What if the public market equivalents decline significantly? What if the competition intensifies so much that OpenAI are now in a precarious position where being ‘the leader’ requires unsustainable spending?
Multiple scenarios may elicit funding constraints, they include but are not limited to the following: rising interest rates, or general macroeconomic stress; multiple startups engaging in prolonged price wars which could end up delaying profitability timelines; technology developments hitting (unexpected) hurdles which may push profitability further back, and in turn test investor patience. Valuations of public market comparable, if worse than expected, could also reset private market valuations and that may fuel dilution or down-round dynamics.
Google’s Singularity
Google is the only tech company that uses its own products across the entire AI stack and doesn’t require the use of outside products. This is what makes Google unique compared to other tech companies, in my opinion. They design their own TPUs (and Broadcom + TSMC manufactures them), operate Google Could for compute in its data centres, consumes training data via apps within its software ecosystem like Search, YouTube, etc, has a leading model in Gemini 3 as of writing this and can deploy their AI capabilities across its own apps like Google Search, Gmail, Maps, Workspace and more. The advantage now comes from owning every layer from silicon to training infrastructure to runtime to distribution, and even device. Fewer dependencies mean tighter margins captured per user and per developer. And this is why I believe Google is very well positioned.
The TPU architecture, now in its 7th generation has now made waves in the AI space, so much so that I’m even writing about it. This has been discussed ad nauseum so I won’t get into the overly technical side of it – Babbage, SemiAnalysis and Rihard Jarc all explain this in much more detail than I could. The purpose-built approach that Google are taking enables them to optimize their AI stack in a way other tech companies can’t (mainly those purchasing external GPUs from Nvidia), per Bloomberg Intelligence. As a result of Google optimizing Gemini 3 for TPU chips built for inference, they control both the silicon design + the model architecture via Titan, enabling co-optimisation that delivers superior performance per watt and performance per dollar.

People are now understanding why OpenAI called “Code Red” against Google. Gemini has consistently gained market share relative to ChatGPT over the last 3 months. Gemini’s market share is now 21.5%, up from 12.9% 3 months ago and just 5.7% a year back. As people in this TMT space are finding out, it is far easier for Google to build a ChatGPT competitor than it is for OpenAI to build a competitor to the Google ecosystem.
As Rebound Capital noted in his Google deep dive:
Foundation Models: The latest Gemini 3 models are either better or as good as OpenAI’s models (based on workloads).
Infrastructure: Google has its own GPUs (called TPUs), which give it a cost advantage. This allows Google to offer services at a lower cost and, in some cases, free versions with more features (e.g., Gemini’s free versions have a larger context window than ChatGPT’s) to users.
Distribution: Google has multiple touch points with customers - something that OpenAI cannot replicate in a hurry. Users can be introduced to Gemini on their new Android phone or through Google Workspace. AI Overviews in Google Search also reduce reliance on ChatGPT.
Two things I also haven’t seen many finance people who cover the TMT sector (or Google more specifically) talk about – perhaps I haven’t read older tech-focused posts yet – is (1) the role Google’s acqusition of DeepMind, and the role that plays in them leading as of now, and (2) that Google have also been quietly implementing Titan over the past 11-12 months which is a new AI architecture, built alongside a new MIRAS framework.
Whilst a lot of people keep talking about how Google is taking market share from OpenAI (as explained by the chart above), I don’t see/hear enough people giving credit to DeepMind. When people talk about the “best tech acquisitions ever,” I believe that Google’s acquisition will be in that conversation if it isn’t already. Google’s ~$650m acquisition of DeepMind back in 2014 is a 4D chess play, it is the reason why they’re currently leading in AI. OpenAI used Deepmind breakthroughs in an area (large language models) that looks unpromising for AGI and threw a lot of computing power at it. Deepmind had understood the limitations of LLM's and therefore did not pursue that research aggressively. Perhaps a bit naive from a share price perspective, but it ended up being the correct call for the long term goal of developing an AGI (if we ever get there). If Google had not published the transformer paper and kept it as a trade secret... there would probably be no ChatGPT.
As Maria, author of AI Realist, states:
This architecture integrates a separate neural memory directly into the Transformer. Unlike standard models, this neural memory learns on the fly, meaning it can memorize new information instantly and use it for subsequent answers. Previously, this was the third rail of deep learning: any attempt to fine-tune a model live would shift the weights and damage the model’s core abilities, causing “catastrophic forgetting.
Google’s work signals a new era for domain adaptation. Titans’ memory approach allows networks to learn during inference, theoretically overcoming the limitation of frozen weights and knowledge cut-offs.
The problem is: the research is still in the lab and is not directly adaptable for us yet. The approach requires a surgical decomposition of the base model, projecting attention layers of memory onto every layer of the transformer. Because the base model must train togetherwith this memory module, it is computationally expensive and impractical for most real-world applications today.
This is the major limitation: it’s not a plug-and-play solution; it demands deep, expensive architectural integration.
Their research paper was released on 4th December 2025 which I’ve linked above. Titans is around a year old now which is the crazy part, they’ve since followed it up with Hope (which is similar due to having some shared mechanisms but lighter computationally and more flexible). This is big if it’s true. Memory and continuous learning are perhaps some of the biggest bottlenecks holding back strong AI, among other stuff. Current stuff is narrowly capable, but still brittle. Solving continuous learning and memory seems a non-negotiable if they want it to shift to high-level machine intelligence.
Aforementioned in relation to Zeoli’s framework on utilisation, there’s a compounding effect regarding the cost advantages from better hardware alongside the advantages from better aggregation. Google already serves billions of queries daily across its platforms. This “singularity” comes with massive scale, and in turn creates utilisation dynamics that pure-play AI companies cannot replicate easily.
As of writing this, Alphabet hit $100bn in quarterly revenue for the first time in Q3 2025 and are also projected to hit north of $350bn revenue in FY 2025. Google, amongst other hyperscalers like Microsoft and Amazon, can afford to lose money on inference for a longer period than the neoclouds or pure play AI firms as both Microsoft and Amazon’s business units – Office + Azure and AWS + Amazon’s retail operations – generate a lot of revenue. Their vertical integration from chips to end user products puts them in a very interesting position.
All of this is not to say that Google doesn’t face challenges. Their biggest one, in relation to AI, is maintaining pole position which isn’t easy. We’ve seen how fast things can change in the AI race within a year. In the latter stages of last year and earlier this year too, people were hardly talking about Google in the AI race – so much so that people called Google “a dead company” a few months back… I couldn’t believe my ears when I heard that. Google’s historical approach of not being the first mover; or being more research oriented rather than product velocity has meant that it’s started from subpar positions but that’s worked out in its favour, so the impacts are negated. Another understated risk (and also one to watch) is the possible integration of advertisments in Gemini. If it’s banners around a screen or something, it’s not the end of the world but if it’s adverts integrated in the model? I can see subscription rates dropping. Another one would be integrating AI into other existing products that don’t need it like Gmail, or another Google app which may end up cannibalising the advertising model that funds Google’s operations. This is also applicable to all LLM providers like OpenAI, Anthropic and more.
Nvidia’s Deal Economics and the Acquihire Acceleration
As I stated in Big Tech’s AI Consolidation By Other Means:
While these deals are worth keeping an eye on, it is not obvious to me that they will be stopped anytime soon. The regulatory apparatus is still catching up to these new structures, and by the time they do, the AI landscape may have already been reshaped.
The deal below, broken down succinctly by Gennaro Cuofano shows us the breakdown of such a deal which I’ll discuss what I believe the rationale is below.

Deal Structure (and other financial details):
Deal value: $20bn (based on 2025 projected revenue of ~$500m, this has an implied EV/revenue multiple of ~40x)
Non-exclusive license agreement (not an acquisition on paper)
Groq continues as standalone entity (~10% of employees stay)
90% of employees join Nvidia (the real “acquihire”)
No equity changes hands (avoids acquisition reporting thresholds)
Raised $750m at a $6.9bn valuation [in Sep 2025] then achieved a $20bn exit [in Dec 2025] for a return multiple of 2.9x in 3 months, and an annualized return of ~1,400%.
AI Deal Comparisons (2024-2025):
Groq-Nvidia: $20B (licensing + acquihire) / $500M target revenue = 40x
Meta-Scale AI: $14.8B (49% stake, also an acquihire) / Scale’s estimated revenue ~$750M = 19.7x
Microsoft-Inflection: $650M (acquihire only) / negligible revenue = N/A
- Amazon-Anthropic: $8B total (investment) / Anthropic’s estimated $1-2B revenue = 4-8x
Target as a Percentage of Acquirer Market Cap:
Groq-Nvidia: 0.44% of ~$4.5T
Meta-Scale: 0.5% of ~$3.0T
Microsoft-Inflection: 0.02% of $3.2T
Amazon-Anthropic: 0.4% of $2.0T
Time from Last Raise to Exit:
Groq: 3 months
Inflection: ~12 months
Scale: Deal announced during operation
First, what did Groq actually announce? They shared that they have a non-exclusive licensing agreement with Nvidia for Groq’s inference technology. In the same announcement, they said that Jonathan Ross, the founder, Sunny Madra, the president, and some other team members are moving to Nvidia as part of the deal. This is the acquihire part. Groq also said it will stay independent, named Simon Edwards as CEO, and confirmed that Groq Cloud, one of their products, will continue. This isn’t a typical acquisition. Instead, it’s more like a transfer of capability, almost like a brain transplant, but without a clear change of control.
Let’s break down what’s happening, since the details matter here. A license means one company pays to use another company’s technology. Non-exclusive means the seller can offer the same technology to others. So, on paper, this isn’t a takeover. But there’s another part that makes it feel like one: the acquihire. An acquihire is when the real asset being acquired is the team. Key leaders and key engineers matter somehow more than the company’s revenue, more than the company’s product. Historically, such a buy has only occurred through a full acquisition.
As highlighted above, the $20bn valuation represents an astounding premium, even by historical standards. To understand the magnitude of this, one should contextualise it against Groq’s capital history. In Aug 2024, Groq raised $640m at a valuation of $2.8bn Fast forward 13 months, to Sep 2025 where they raised $750m at a post-money valuation of $6.9bn.
Investors receive 85% of the $20bn (or $17bn) upfront, another 10% (or $2bn) gets paid by mid-2026, and the final 5% (or $1bn) gets paid by end-2026.
Nvidia’s $20bn offer represents a 2.9x multiple on the Sep 2025 valuation, and ~7x valuation on the Aug 2024 valuation. It’s not often you get an appreciation like this so fast; it is the fastest 2.9x in VC history and one of the most prominent VC deals from a returns perspective. Such a deal indicates two acute market realities: firstly, it emphasises the intense scarcity value of proven, production-grade AI silicon teams and secondly, it highlights Nvidia willingness to deploy its cash reserves to either build and/or preserve its moat via neutralizing threats before they can achieve escape velocity.
This deal is, perhaps coincidentally, 2.9x bigger than its previous largest acquisition of Mellanox Technologies in 2019. Unlike this deal, it was a traditional merger aimed at acquiring interconnect technology.
It’s not surprising to me that Nvidia used the [reverse] acquihire structure to explicitly avoid a full corporate merger. This structure allows Nvidia to reframe their acquisition rationale from ‘we acquired a competitor’ to ‘we are now entering a commercial partnership.’ As a result of allowing GroqCloud to continue operations & Groq as an independent entity, Nvidia can then use this as a way to help them bypass the Hart-Scott-Rodino (HSR) antitrust review process that normally freezes such mergers for at least 18 months.

As this article is highlighting, there’s a major shift in the economics of AI. There’s even a term analysts are using to describe this shift called “Inference Flip.” For the first half, generative AI in 2020-2025 was mainly focused on training AI models. The top tech companies like Anthropic, Google and OpenAI were buying billions of dollars’ worth of Nvidia’s H100 & B200 GPUs to train massive foundational models. The process requires great parallel compute capability & memory bandwidth to process petabytes of data over weeks or even months. Nvidia’s architecture, optimized for high-throughput parallel processing, was effortlessly suited for this kind of workload which grants the company a near monopoly.
However, as these models move into production, the workload shifts to inference characterized by different constraints:
Latency Sensitivity: Users expect instantaneous responses. A delay of even 500 milliseconds can degrade the user experience in voice agents or real-time coding assistants.
Sequential Processing: Generative AI models (LLMs) produce text one token (word part) at a time. The generation of the next token depends on the previous one. This sequential dependency makes it difficult to parallelize the workload effectively on traditional GPUs.
Cost Per Token: As businesses deploy AI at scale, the operational cost (OpEx) of running the model becomes more critical than the capital cost (CapEx) of training it.
Groq’s LPU technology was specifically designed for the inference phase. As a result of the acquisition, Nvidia is using this deal as a way to hedge against potential monopoly in the sense that its general-purpose GPUs may be undercut by specialized inference chips regarding speed and efficiency. The deal signals Nvidia’s “training era” is maturing, and the “inference era” is just beginning.
To understand the Nvidia-Groq deal, it’s important to look at the current regulatory climate in Washington. Led by Chair Lina Khan back in 2022, the Federal Trade Commission (FTC) has taken a tough approach to consolidation in the tech sector, especially when it comes to ‘vertical mergers’ where a dominant platform buys a company it depends on or supplies.
Nvidia has been affected by this regulatory environment before. In 2022, it dropped its $40 billion bid to buy Arm Holdings after facing heavy scrutiny from the FTC, the UK’s CMA, and the EU commission. This experience showed Nvidia that buying a company outright, especially a “mega deal,” can be risky.
The “reverse acqui-hire” model is a direct adaptation to this reality. By technically leaving Groq alive as an independent company, Nvidia argues that the transaction does not meet the threshold of a merger that substantially lessens competition. The Hart-Scott-Rodino (HSR) Act requires companies to file pre-merger notifications for deals over a certain value. While the $20 billion payment certainly triggers a filing, the argument is that because it is a “license” and not a “merger,” the review process is different.
If There Is a Bubble, Where Does It Reside?
I’ve been alluding to this in a few comments and even a note on December 23rd, where I said that there’ll be “valuation dispersion”. I’m not the only one who’s been saying it either.
Strategists at Morgan Stanley have noted (by DeItaOne on X):
Companies in the AI field could face ‘valuation dispersion’ over the coming years. Companies that use AI to improve their productivity and lower costs are likely to “re-rate higher while others fall behind,” the strategists say.
In simpler terms, if gains from the AI theme extend into 2026 – which they likely will – it won’t be a rising tide lifts all boats situation. It’s likely we’ll see the winner(s) and loser(s) more clearly, whoever they may be.
I do agree with Morgan Stanley’s strategists to a degree as I do think that the wheat will eventually have to be seperated from the chaff; the companies that struggle to find legitimate business use cases and/or implement AI poorly will struggle. Where I disagree is that I also think the firms that take shortcuts and go about this in a lazy manner (cutting junior roles and hoping AI will make up the difference) will face a day of reckoning when they need middle level talent in a few years’ time. At least in the finance space, the top firms may be able to get rid of a graduate analyst program (or its equivalent entry level role) and be fine. But most companies aren’t the top firms and there’ll be some damage. Whether this pattern plays out in other sectors is to be seen.
As stated in the ‘Google’s Singularity’ section, albeit quite briefly, some of the hyperscalers (Google, Microsoft, Amazon) adopt unique positions where they can rely on other business units to subsidise their AI buildout. This is why they are less concerned with AI not reaching profitability as fast as companies like Oracle, who are now seen as the “weakling” in this AI theme.
This advantage in turn, influences how they approach markets. They can price aggressively to increase market share and not have to worry about near-term losses. They can invest in infrastructure at scales that pure-play AI companies cannot match, and also bundle AI capabilities into its products and do it in such a way that creates switching costs and customer lock-in. A probable result is that the hyperscalers are likely to emerge as winners from the AI transition regardless of whether AI companies achieve profitability.
Neoclouds/specialized cloud providers are the area to watch in my opinion, as I think this is where the bubble resides if there is one. This is companies like Applied Digital, CoreWeave and 1-2 others but mainly the ones mentioned. Ever since I published my ‘AI Ouroboros’ piece, I’ve maintained (and still do maintain) the belief that some kind of shakeout will come, no one knows when or how soon. And in ‘AI Ouroboros,’ I mentioned that:
“the growth was enabled by lots of debt, which Michael Intrator (their CEO) called “the fuel for this company” in a CNBC interview. To finance its chip purchases, CoreWeave turned to Blackstone, its biggest financing partner, and other lenders eager to jump into the AI space.”
Regarding the use of debt, CoreWeave had to ask for extraordinary covenant relief from its banks. This is a sign of a company under stress. On top of that, there’s significant insider selling. I’m not implying that CoreWeave as a company will dissipate but people shouldn’t be glossing over this either. They can’t afford to pay 15% but they also can’t afford to pay 9%. They shouldn’t have debt like that for a relatively young company
They have a fragile revenue base for now, which may not matter since we’re in a “shoot first, ask questions later” type of market but it will eventually get called into question. CoreWeave do not have a contract with Google as of writing this, but do have ones with Nvidia, Meta and Microsoft (on behalf of OpenAI) – if any of these companies, especially Nvidia report lower earnings or end up losing their monopoly on GPUs for whatever reason, CoreWeave’s revenue base will be questioned. This is even more important as the company operates on a model where it secures multi-year, hardware-backed contracts with customers. On top of that, they often use contracted revenue as collateral for debt financing to build our more data centre capacity. This could easily unravel if one of the hyperscalers decide to withhold investment for various reasons.
The smaller pure-play companies face the most precarious positions. Cohere, Mistral, and numerous other venture-backed AI startups lack the scale advantages of larger competitors and the strategic value that makes hyperscalers willing to sustain losses. They must either achieve differentiation that supports premium pricing, find acquisition exits before capital markets tighten, or reach profitability quickly enough to avoid down rounds. As far as I know, there are no more than 5 companies that make enough money (referring to both revenue & net profit) from AI at scale, one of them has a section dedicated to it within this article (Google), and the other one is selling the shovels (Nvidia). We shall see if anymore join Google & Nvidia in profiting from AI at scale.
Closing Thoughts & Important Questions for AI in 2026 (and beyond)
Holden Spaht, a managing partner at technology private equity firm Thoma Bravo, has also questioned the economics of AI in his opinion piece in the FT, mainly talking about how AI startups are poorly positioned and argues that established software providers that can integrate AI into the business operations efficiently are better positioned to benefit. He states:
These start-ups typically use standard AI models (like ChatGPT or Claude) and distinguish themselves by writing software on top in order to target specific tasks. Their value proposition is that specialised solutions can compete without worrying much about how businesses function as complete systems.
But this is a vulnerable strategy. Like dotcom companies that burned cash without sustainable business models, many of today’s AI start-ups burn money paying for access to AI models while lacking advantages that competitors can’t copy. Their core bet is that AI can solve specific business problems so efficiently that companies will willingly go beyond their integrated systems to get superior performance in one area — in short, “best-in-class” beats “good-enough-but-all-in-one-place”.
[…]
These structural challenges show why we believe established software platforms such as Salesforce, SAP, Microsoft — and portfolio companies like Anaplan and Coupa that we invest in at Thoma Bravo — will be more durable despite the risks of AI disruption to their businesses.While AI start-ups struggle to defend narrow territory, broader platforms build advantages that grow stronger over time. Rather than just making one specific area better, they work on strengthening the broader business of a customer. This model produces intelligence for both the company and the provider, building capabilities. And SaaS companies have years of making improvements.
[…]
The idea would be to force a battle of commoditisation: the foundation model companies want to commoditise software and the software companies want to commoditise the AI ingredients of the software platforms. But this competition story, too, reveals why I believe established platforms will ultimately win.
I do think his argument has merit but at the same time, he may be talking his own book so I’m taking this with a grain of salt. Proprietary platforms are at extremely high risk of being ‘enshittified’. The ‘enshittification’ inculdes but is not limited to the implementation of adverts. A brief look and you’ll find endless threads already alleging this of OpenAl, Claude and Gemini. It’s almost guaranteed to happen as they seek to claw back their investment by monetising LLMs. What becomes increasing attractive is running an optimised open source model local to your computer or phone for complete control of your data and consistency of experience (apps/front ends like LM Studio making this technically trivial now). You know you’ll never be enshittified, its free(!) and you don’t have to share your source code with companies.
Currently enterprise AI chatbots and platforms are marginally useful but largely disappointing because they currently fall short in reliably creating true business transformation (ref: MIT pilot study). The issue is they don't sufficiently understand specific business processes and wider business transformation is required to enable AI which takes time. Based on what I’ve written, I’m led to believe that the larger direction of AI is going to focus on one subject applications built by startups. Hopefully, not just enveloped around one of the large frontier LLMs, but fed by their own specialized AI models. Who needs an AI that knows everything and can do everything? The future could well be hundreds of thousands of specialized edge models that know only ONE thing - and know it well. Branded by startups. Such models don’t require the energy and compute necessary to be “the everything” model. They will live comfortably as edge models; likely in reduced format in our pockets on our smart phones.
As subsidies end and true costs surface, AI services will become expensive and eventually become commoditised. The current race to the bottom on simple tasks will continue, but complex reasoning and giant contexts will carry premium prices that reflect real compute costs.
As James Wang stated in Ten Years of AI in Review:
Most startups die and are doomed.
However, the real question is whether or not we will see some kind of enduring advantage for, say, AI model companies. Google suddenly causing “Code Red” at OpenAI is just another day in AI headlines. The next time OpenAI or one of the Chinese companies releases their next frontier model, it’ll suddenly be “Code Red” at every other company.
Ultimately, without some kind of true barrier to entry, most AI companies will not enjoy some kind of amazing future where they dominate. OpenAI may be a strong consumer brand in ChatGPT. Anthropic has a pretty good API business and focuses on coding. But as the Chinese AI companies IPO’ing have shown, there’s actually not as much revenue as you’d think in their businesses and certainly far less earning potential. They all have the same models, data, and compute.
The above from James aligns heavily with my views on AI inference and what it means for the overall theme going forward. But just because startups die, and most don’t make it for whatever reason I do think the pending shakeout (or whatever you want to call it) will ultimately be a good thing as most of the potential within AI lies in the “boring” part – the part where all the dust settles and tech (and even non-tech) companies will have to really think about legitimate use cases and implementation properly.
I do have some questions.
If achieving high utilisation rates via customer aggregation is the key to profitable inference, and only hyperscalers can reach these levels, does the pure-play AI business model exist at all?
If or when inference costs approach zero and models eventually become commoditised, what creates sustainable pricing power? Brand, distribution, proprietary data, or vertical integration?
If Nvidia (or Big Tech in general) can acquire competitors for amounts that are essentially a rounding error, why would late-stage investors continue funding pure-play AI infrastructure companies rather than treating them as acquisition targets early on?
At what point do investors (if they haven’t already) demand proof of profitability over proof of technological innovation? Can companies continue raising at increasing valuations while compute margins improve and losses widen?
In an environment where hyperscalers possess structural utilisation advantages, can use acquisitions to eliminate competitors and may not need immediate AI profitability due to strategic value of other business unit(s), how many independent AI can exist profitably by 2028? Two, five, twenty or a different amount?
Coming back to the title, who captures the value of AI inference when it becomes cheap? And why [chosen AI company/ies]? The company/ies that can answer this question will most likely be the one(s) who win in this race to profitability.
We’ll see who will capture the value of AI inference if/when it becomes cheap. The ‘who’ is the easier part of the question, despite it being a not-so-easy question to answer. The harder part is the why. As the days, weeks and months go by, this answer will become clearer with time.

Further reading (and listening):
Who wins if AI models commoditise? — With Mistral CEO Arthur Mensch (Big Technology Podcast)
How should we think about the economics of AI? (Paul Krugman - it’s a video podcast)
Is it a bubble? (Howard Marks)
The “28%” is back-of-the-napkin math. The point is to illustrate what low utilisation margins mean for AI profitability. It’s an oversimplified view, not a fact. The actual numbers most likely differ.


















The clear implication, if one were a betting man , is to bet on the hyperscalers and Nvidia. The economic scale , breadth of technical talent and trust they provide is exemplary.
Well researched and well thought out analysis that gives all readers more to think about. I like the way you lead each reader to come to their own individual conclusion (unlike some others, not to be named, in the business who scream their opinions in the megaphones). Well done !!