March 13, 2026
My dive into the Western AI ecosystem. Lengthy because deep and thorough.
This AI boom looks totally different from any other. Is it really?
What is perhaps the most transformative technology investment boom ever is assembled by a handful of powerful players: one EUV supplier (ASML), one dominant leading-edge fab (TSMC), four chip suppliers (Nvidia, AMD, Google-TPU and Amazon-Trainium) and a small group of hyperscalers (Amazon/AWS, Microsoft, Alphabet-Google and Oracle) and model labs (mainly OpenAI-ChatGPT, Google-Gemini and Anthropic-Claude).
A monopoly supplies a near monopoly which supplies another near monopoly feeding an oligopoly serving another oligopoly.
There are two groups of builders (spenders), both involved in a race to quickly grab AI market share:
1. The LLM builders are outdoing each other to become the model of choice for corporates and individuals. Not only must their LLM be among the best, they must also offer potent applications to secure user adoption.
2. The hyperscalers are in a footprint race to secure a share of exponentially expanding compute needs.
The enablers (earners) must keep pace with the builders as a group, but, because everybody is supply-constrained amid a fast-paced race, they can also influence how each individual builder grows through product allocations.
1. The chip suppliers are scrambling to ship enough chips to light data centers up as soon as possible. So far, there are no dark GPUs, meaning that compute/chip demand is still outpacing data center/chip supply. Nvidia is the dominant supplier with best-in-class chips and services (90% share of the GPU market). AMD tries to remain competitive on the compute side and is used by builders to keep Nvidia “reasonable”. Broadcom is a design and fabless partner for major hyperscalers’ alternatives to Nvidia GPUs (e.g. Google’s TPUs, Meta’s MTIA).
2. TSMC is the only fab at the leading edge for AI GPUs (3nm/2nm chips) and CoWoS (Chip-on-Wafer-on-Substrate—an advanced semiconductor packaging technology developed by TSMC). TSMC controls prices and allocations.
3. ASML has an absolute monopoly being the only supplier of EUV systems needed to make sub-7nm chips.
While technical compute is traditionally measured in FLOPs (floating-point operations), GW is used for the following reasons:
● Because energy is the bottleneck for scaling, it has become the standard unit for discussing infrastructure capacity.
● A data center’s total power capacity (its “IT load”) dictates how many GPUs it can hold. For example, a 1 GW (1,000 megawatt) facility can support between 500,000 and 1.2 million NVIDIA GPUs, depending on the specific model and the efficiency of the data center’s cooling and power infrastructure. Because 1 GW represents the total facility power needs(including cooling and networking), the number of GPUs is calculated based on the “power density” of the server racks.
For example, a single Blackwell GB200 NVL72 rack draws roughly 120 kW and contains 72 GPUs. At 1 GW, you could physically power approximately 8,300 of these racks, totaling around 600,000 GPUs. According to SemiAnalysis, the total cost for a data center operator often rises to $3.9 million per rack when accounting for high-speed networking, storage, and specialized infrastructure required for liquid cooling. So a $50B one GW data center would cost $32B to equip.
AI currently accounts for roughly 25% to 30% of total global data center power, or about 30–35 GW. The remainder is used for traditional cloud services, storage, and corporate networking.
The energy demand of AI has 2 vectors: training (building the brain) and inference (using the brain). Training for GPT-4 (2023) required an estimated 50 to 60 GWh of energy.
The energy used for inference demand is tiny, but when scaled to billions of usage, it rivals the consumption of entire nations. An AI-generated Google search answer uses roughly 10x to 30x more energy than a traditional keyword search. If ChatGPT were to handle every search currently done on Google (roughly 9 billion per day), it would require approximately 10 TWh (Terawatt-hours) of electricity per year.
While training currently consumes a large share of AI power, inference is projected to represent more than 40% of global data center power demand (over 90 GW) by 2030.
S&P Global estimates total capex by the top five US hyperscalers will jump 38% to exceed $600B in 2026. Growth in 2027 is now seen at 20% to $725B; probably conservative: most forecasts were off by 20%+ in 2025 and hyperscalers have consistently raised their forward capex spending each quarter in 2025 as demand kept exceeding expectations.
Data center demand for training AI is expected to grow at a CAGR of 22% over the next five years, reaching more than 60 GW by 2030. But as inference workloads become more dominant—expected to grow at a CAGR of 35% over the next five years and reach more than 90 GW by 2030—data centers are adapting to support inferencing at scale, focusing on real-time, low-latency processing.
Large inference demand is quickly coming from healthcare, automotive, retail & e-commerce, finance, manufacturing, IT & telecom, aerospace & defense, and others.
Manufacturing is projected to grow at the highest CAGR due to the increasing implementation of AI-powered quality control, predictive maintenance, computer vision and robotics.
The robotics segment will account for 28% of inference demand in 2026 as it heavily relies on real-time decision-making, computer vision, and sensor data interpretation, all of which require robust inference capabilities. The proliferation of automation in industrial and service sectors supports this dominance.
Natural Language Processing (NLP) is expected to witness high CAGR due to surging demand for voice assistants, chatbots, and language translation tools.
Estimating inference compute demand implies forecasting adoption and usage, not only by corporations and individuals but, most importantly, by agents.
Ciena explains how inference explodes:
As AI becomes more multimodal, more context-aware, and more deeply embedded across digital platforms, inference is emerging as a dominant driver of future network demand. (…)
Beyond new AI apps, the bigger story is what happens when AI is embedded into existing digital platforms. Search, email, productivity software, maps, advertising, and social media already reach billions of users. Adding AI to them doesn’t create gradual adoption—it creates instant scale.
Google’s rollout of AI Overviews is a clear example. Within a year of introducing Gemini-powered capabilities into search, the feature was being used by more than 2 billion people every month. AI enhancements are now spreading across most of the product portfolios at Google, Microsoft, and Meta, as well as other leading digital platforms. The result is explosive growth in inference volumes.
Google reported that the number of AI tokens (basic unit of text) it processes monthly increased 50x year over year in early 2025—and then doubled again just two months later. (…)
That compute growth and geographic distribution alone increase the importance of resilient, high-capacity connectivity between sites, even before considering how inference workloads themselves are evolving.
Inference workloads aren’t just getting bigger—they’re getting smarter and significantly more complex. Reasoning models break complex tasks into multiple internal steps before producing an answer. That hidden reasoning can require 3–10x more compute per query than traditional “instant” models.
These models also rely heavily on deep search. A single user query may trigger dozens of background retrievals—pulling in web pages, PDFs, images, or videos to support multi-step reasoning and research tasks. Even when users only see a short response, the network may be moving megabytes of data behind the scenes. Platforms currently set usage limits to contain the cost of these advanced capabilities, but demand is growing quickly. As they scale, they will materially increase inference-related data movement.
Another quiet but powerful trend is the rapid expansion of model context windows. Context windows define how much information a model can process in a single inference session—conversation history, documents, instructions, and retrieved content. Over the past two years, frontier models have expanded context sizes at an extraordinary pace, roughly 30x per year.
As inference scales, AI compute is becoming more geographically distributed. Models must be synchronized across regions. Usage data and learning signals must be shared. Complex inference workflows increasingly span multiple sites with complementary capabilities. All this drives substantial growth in DCI bandwidth, while also increasing the number of AI inference data centers to be connected.
Today, typical inference data center interconnect (DCI) links already operate at multiple terabits per second per route. Over the next five years, conservative assumptions from our analysis suggest these requirements could grow 3–6x, pushing per-route capacity into the tens—or even hundreds—of terabits per second.
The macro effect is usually Jevons paradox: as per-token cost falls and tools get better, people deploy agents to do more tasks, which still pushes total inference demand higher.
Agentic AI will increase inference demand sharply because it turns “one prompt → one response” into multi-step loops that call models (often multiple models) repeatedly, keep larger context, and invoke tools, so tokens, latency-sensitive requests, and network traffic all rise.
● An agent decomposes a goal into steps (plan → search → read → decide → act → verify), and each step can trigger one or more model calls, so tokens per “task” rise vs a single chat turn.
● Agents often run with RAG/long-context so they can ground actions in documents and tool outputs, which increases “prefill” compute and memory footprint per task.
● Once agents sit inside workflows (support, coding, operations), they run continuously in the background, so inference becomes a persistent load rather than episodic Q&A.
Why AI’s next phase will likely demand more computational power, not less
Deloitte predicts that “inference” will account for two-thirds of all AI computing power by 2026. (…)
On average, market estimates suggest that the autonomous AI agent market could reach $8.5 billion by 2026 and $35 billion by 2030. Deloitte predicts that if enterprises orchestrate agents better and thoughtfully address the associated challenges and risks, this market projection could increase by 15% to 30%—or as high as $45 billion by 2030. (…)
Deloitte predicts that the global cumulative installed capacity of industrial robots could reach 5.5 million by 2026 (…). We could see an inflection point by 2030, with annual new robot shipments doubling from current levels to reach one million a year, driven by the following growth catalysts:
● (i) labor shortages in specialized industrial applications in developed countries and
● (ii) exponential advancements in computing power and the emergence of specialized foundational AI models.
Physical AI (robots, vehicles, industrial systems, and sensor-driven machines), agentic AI and generative video are poised to transform compute industries with massive acceleration in growth and usefulness.
Edge inference is running a trained AI model on compute close to where the data is generated or where the user/device is, instead of sending the data to a far-away centralized cloud data center for inference.
In practice, “edge” can mean two common deployment patterns:
● On-device: inference runs directly on the endpoint hardware (phone, PC, camera, sensor, car/robot controller), enabling real-time responses and sometimes working with limited/no connectivity.
● Near-user edge servers: inference runs on servers in the same city/region or at a nearby point of presence to reduce latency vs a hyperscale region.
People use edge inference mainly to get lower latency, reduce bandwidth/round-trips to the cloud, and keep sensitive data local.
Edge inference increases total inference volume while shifting some compute away from hyperscale data centers—but it usually does not reduce overall compute demand; it spreads it across devices, near-edge sites, and the cloud.
Deloitte explicitly argues that moving to inference at scale won’t mean “less in the data center” in 2026; both edge and data-center compute rise.
● Low-latency, privacy-sensitive tasks (phones, PCs, robots, vehicles, cameras) can only work if inference runs locally or near the user, so edge enables demand that wouldn’t exist if everything had to round-trip to a cloud region.
● When inference is cheap/instant on-device, products call models more often (autocomplete everywhere, background assistants, always-on perception), pushing total inference operations up.
● Edge adds a second demand curve—Neural Processing Units in PCs/phones and embedded GPUs in robots/vehicles.
Think of total inference demand as:
● Cloud inference (big models, batching, enterprise back-ends).
● Near-edge inference (regional low-latency points of presence).
● On-device inference (personalized, offline, privacy-critical).
Many compare the growth in compute demand to the internet but there are key differences:
● Inference demand can scale faster than adoption because each user can generate many queries per day, and “agentic” workflows can generate far more than humans do.
● The internet S-curve was connectivity adoption, the number of users. The Inference S-curve is workflows automated × queries per workflow × model size/latency targets. Users are not only humans but also autonomous agents potentially magnifying usage way beyond human needs. Think of robots or cars “conversing” non-stop among themselves, or your personal agent(s) staying abreast or your activities.
Initially, the internet was a nice-to-have tool. It took a while, large investments and several key applications for it to become a must-have service.
AI is already seen as a must-have, recognized for its potential as a competitive weapon for both revenue generation and cost reduction on the corporate side and for job/wage retention for employees. At a minimum, everybody is in survivor mode to protect against the savvier adopters.
Highly user friendly AI-generated applications such as Claude Cowork and Claude Code (autonomous coding) are helping the creation of revenue enhancing and cost control tools, fueling and magnifying inference as capabilities increase and broaden.
(Citadel Securities)
Last December, Oaktree’s Howard Marks, in his Is It a Bubble? memo, asked: “how can anyone say how many data centers will be needed? And how can even successful companies know how much computing capacity to contract for?”
Well, two of the most successful AI enablers recently felt the need to address this very question to confidently budget their own heavy, long lead-time capex.
In its 2025 annual report published February 25, 2026, ASML management revised its previous cautiousness.
At first, we believed that AI would drive demand from only a limited portion of our customer base. At the end of the year, we saw that new and significant demand for AI was starting to fuel capacity build-up across our broad customer base – a powerful trend that we believe will continue in 2026 and beyond.
TSMC also needed conviction before boosting its own capex, as Yahoo Finance reported:
For TSMC, worrying about an AI bubble makes a lot of sense. Building new leading-edge semiconductor foundries takes years and consumes many billions of dollars in capital. If TSMC is overly optimistic and overbuilds, its profitability could suffer tremendously.
“I’m also very nervous about it. You bet,” said [CEO C.C.] Wei in response to an analyst question about the trajectory of AI demand. Wei continued, saying that if the company wasn’t careful with its capital spending, it would be a “big disaster” for TSMC.
While TSMC continues to take a conservative approach, Wei has spent the past few months talking to TSMC’s customers and the customers of those customers. The aim of these discussions was to gauge whether AI demand was real. In other words, is AI actually helping these businesses?
Wei’s conclusion is that “AI is real,” calling it an “AI megatrend.” TSMC’s customers are chip designers like Nvidia and AMD, while its customers’ customers are hyperscale cloud providers and other buyers of AI accelerators. The financial status of those hyperscalers also gave Wei some confidence. “They are very rich,” Wei said.
Another data point is TSMC’s own use of AI. Wei noted that the company is using AI to improve the productivity of its fabs, achieving 1% to 2% gains at essentially no cost.
Given the company’s newfound confidence in the durability of AI demand, TSMC said it expects to spend between $52 billion and $56 billion in capital expenditures this year. For comparison, TSMC’s capex in 2025 was around $40 billion. The company also expects capital spending to increase further over the next few years. (…)
Wei noted that strong demand from the AI industry looks like it will go on for many years, calling it “endless.” (…)
Incidentally, ASML is also using AI extensively as explained in its annual report:
Artificial intelligence is presenting new ways to extend our innovation roadmap. As well as enhancing lithography efficiency and precision, we expect it to accelerate R&D and help streamline customer support through smarter diagnostics.
Our AI strategy in R&D spans software and hardware development and is designed to enable faster innovation and improve quality. In hardware development, we use AI with the aim to shorten development cycles. For example, deep learning surrogates using advanced neural network models can accelerate physical simulations.
This has the potential to reduce the simulation time for computational fluid dynamics for a particular component in the EUV source from 24 hours to just seconds, which drastically increases the number of design configurations to be explored for optimization.
AI is already embedded in our products, particularly in computational lithography and metrology and inspection – it’s used to improve both the speed and accuracy of our optical proximity correction products, for example.
Our lithography systems generate an enormous stream of real-time data from over 100,000 actuators and sensors. Harnessing this data is essential for predictive control strategies, for example by using AI to better and faster predict compensation schemes for reticle heating.
In software development, AI tools have shown significant productivity gains and have been rolled out to thousands of developers. Working closely with Mistral AI, we’re adopting customized solutions, trained with ASML source code and documentation.
The automation of workflows and repetitive tasks is helping to free up engineers for higher-value activities. For example, a new AI agent system now updates over 90,000 work instructions used by around 15,000 engineers, improving turnaround and quality.
We’re also expanding AI into fields such as legal and compliance, human resources and finance with the goal of improving efficiency – for example, by streamlining workflows and assisting in the creation, management and analysis of documents.
We see meaningful potential for AI to strengthen resilience, quality control and efficiency across our supply chain. For example, robotic visual inspection uses AI to detect defects on critical modules before they impact production. Predictive analytics help improve planning and logistics, demand forecasting and risk management.
We believe the successful migration of our enterprise resource planning system is key to preparing our data and processes to fully leverage AI in the future.
From other various sources:
● We are currently seeing AI in action during the US/Israel war with Iran. Reporting on recent US-Israel operations describes AI-enabled analytics (including “digital twin”/real-time decision products and AI used to analyze drone/sensor data) as part of how forces identify and prioritize targets. AI is also being used for “non-trigger” tasks that still matter: scenario planning, intelligence briefings, and other decision-support workflows that help commanders process rapidly changing conditions.
● AI-related autonomy shows up most clearly in drones: research discussing the Iranian–Israeli confrontation describes Iran deploying suicide drones that use AI for autonomous target detection/engagement via image processing and pattern recognition.
● That same trend pushes both sides toward faster detection, tracking, and counter-drone workflows, because the operational tempo rises when platforms can navigate and identify targets with less human input.
● OpenAI monthly revenues were $3.6B ARR (annualized run rate) in 2024, $12.5B in mid-2025, $21.4B at the end of 2025 and $25B currently.
● Anthropic monthly revenues were $1B ARR in 2024, $5B in mid-2025, $9B at the end of 2025 and reportedly $20B currently.
● Anthropic added $6 billion in run-rate revenue in February-March 2026 alone, largely driven by the explosive adoption of Claude Code, which reached $2.5 billion in annualized billings within months of launch.
● As of early 2026, OpenAI serves 900 million weekly active users and 1 million business customers and operates 7 million ChatGPT workplace seats.
● The standalone Gemini app surpassed 750 million monthly active users by February 2026. Products built on Google’s Gemini generative AI models increased 400% year-over-year as of late 2025. Revenue from products built on Gemini, Imagen, and Veo surged more than 4x (400%) in 2025. Nearly 350 enterprise customers were processing more than 100 billion tokens each per month by December 2025.
● Anthropic’s Claude recorded a record 11.3 million active users on March 2 according to Similarweb, more than doubling so far in 2026.
● Mistral AI scaled from $20M in early 2025 to over $400M ARR by February 2026, with a target of $1.2 billion by year-end.
● Cohere reached $240 million ARR in 2025, up 287% YoY.
● Perplexity AI hit $148 million ARR by late 2025 and projects $656 million for 2026.
In the same December 2025 memo, Howard admitted:
Perhaps most importantly, the growth of demand for AI seems totally unpredictable. As one of my younger advisers explained, “the speed and scale of improvement mean it’s incredibly hard to forecast demand for AI. Adoption today may have nothing to do with adoption tomorrow, because a year or two from now, AI may be able to do 10x or 100x what it can do today.
On February 26, he penned an addendum memo to address “significant changes that have taken place in AI over the [last] three months” (my emphasis):
First, there’s the pace at which developments in AI are occurring. That speed is unlike anything we’ve seen before now, and this has implications that have never existed. AI is growing at speeds that greatly outpace the technological innovations of the past. (…)
Nothing has ever taken hold at the pace AI has. It’s able to change the world at a speed that approaches instantaneous, outpacing the ability of most observers to anticipate or even comprehend.
In the past, infrastructure was built for a new technology, and it often took years for that infrastructure to be fully utilized. In the case of AI inference, however, demand already exists and is growing rapidly, and I’m told AI is supply constrained.
The second important thing that’s happened has been an incredible leap ahead in AI’s capabilities. (…)
The most significant thing that distinguishes AI is something we’ve never dealt with in connection with prior technological developments: AI’s ability to act autonomously. (…) AI was at Level 1 [Chat AI] in 2023 and Level 2 [tool-using AI] in 2024, but it’s now at Level 3 [autonomous agents]. And the difference is a big one:
The distinction between Level 2 and Level 3 might sound subtle. It isn’t. It’s the difference that determines whether AI is a productivity tool or a labor substitute. And that difference is what separates a $50 billion market from a multi trillion dollar one. (…)
Let me make the pace of improvement concrete, because I think this is the part that’s hardest to believe if you’re not watching it closely.
In 2022, AI couldn’t do basic arithmetic reliably. It would confidently tell you that 7×8 = 54.
By 2023, it could pass the bar exam.
By 2024, it could write working software and explain graduate-level science.
By late 2025, some of the best engineers in the world said they had handed over most of their coding work to AI.
On February 5th, 2026, new models arrived that made everything before them feel like a different era.
On February 5th, OpenAI released GPT-5.3 Codex. In the technical documentation, they included this: GPT-5.3-Codex is our first model that was instrumental in creating itself. The Codex team used early versions to debug its own training, manage its own deployment, and diagnose test results and evaluations.
Read that again. The AI helped build itself.
AI is different from other technological innovations not only in magnitude, but in kind. In addition to its remarkable capabilities and speed of development, AI has an element of autonomy that no other technology has ever had. (…)
So, is it a bubble?
● Is the technology a fad or an illusion? Here I say with conviction that it’s a very real thing, with the potential to vastly alter the business world and change much of life as we know it.
● Is application of the technology a distant dream? Clearly, the technology is already in demand and being applied on a large scale. Since AI seems amorphous and little understood, I think its potential is more likely to be underestimated today than exaggerated.
So, no bubble on compute demand. If anything, most forecasts will prove conservative.
Deloitte’s 2026 State of AI in the Enterprise survey captured insights from more than 3,200 business and IT leaders around the world. Surveyed companies went from fewer than 40% to around 60% of workers now equipped with sanctioned AI tools. (my emphasis)
● 25% of leaders now reporting that AI is having a transformative effect on their companies—more than double from 12% a year ago. Trust and investment are also surging, with 84% of organizations increasing their AI investments and 78% of leaders reporting greater confidence in the technology. Yet, most companies are only at the edge of large-scale AI-driven transformation.
● While only 25% of respondents said their organization has moved 40% or more of their AI experiments into production to date, 54% expect to reach that level in the next three to six months.
● Today, 34% of companies are starting to use AI to deeply transform their businesses, 30% are redesigning key processes around AI and the remaining 37% are only using AI at a surface level with little or no change to underlying business processes.
● Nearly 3 in 4 companies plan to deploy agentic AI within two years.
● Physical AI is rapidly becoming integral to operations worldwide, with 58% of companies already using it to some extent and adoption projected to hit 80% within two years. While manufacturing, logistics, and defense lead the way globally, markets in Asia Pacific are leading adoption, driving widespread integration of robotics, autonomous vehicles, and drones—setting the pace for the next wave of industrial automation.
Howard then asked: “Are the people building AI infrastructure behaving unwisely? As I pointed out in December, in every example of sweeping technological innovation, the headlong rush to build infrastructure has vastly accelerated the adoption of the innovation and caused a lot of capital to be “malinvested” and destroyed. There’s no reason to assume this time will be different.”
Maybe, but not yet:
1- Previous “sweeping technological innovation” all required investing well ahead of usage. Now, there is no “if you build, they will eventually come” but rather “they are coming, so build asap”.
● Amazon CEO (Q4’24 conf. call): “AWS could be growing faster” if not for “capacity constraints” across its data centers.
● Alphabet CFO Anat Ashkenazi (Q1’25 conf. call) “I’ve stated on the Q4 call that we exited the year in cloud specifically with more customer demand than we had capacity, and that was the case this quarter as well.” In the Q2’25 call: “We expect to remain in a tight demand-supply environment going into 2026.”
● Amazon CEO Andy Jassy (July, 31 2025): “…we have more demand than we have capacity right now. I don’t believe that we will have fully resolved the amount of capacity we need for the amount of demand that we have in a couple quarters. I think it will take several quarters.”
● Amy Hood, Microsoft CFO (January 28, 2026) “Our customer demand continues to exceed our supply. Therefore, we must balance the need to have our incoming supply better meet growing Azure demand.
● OpenAI CFO Sarah Friar (January 18, 2026): “Looking back on the past three years, our ability to serve customers—as measured by revenue—directly tracks available compute: Compute grew 3X year over year or 9.5X from 2023 to 2025: 0.2 GW in 2023, 0.6 GW in 2024, and ~1.9 GW in 2025. While revenue followed the same curve growing 3X year over year, or 10X from 2023 to 2025: $2B ARR in 2023, $6B in 2024, and $20B+ in 2025. This is never-before-seen growth at such scale. And we firmly believe that more compute in these periods would have led to faster customer adoption and monetization.”
2- SemiAnalysis keeps a tab of announced global data center projects (ex-China). Their numbers show compute supply growth of 38%, 55% and 42% in 2026, 2027, 2028 respectively, a 44% CAGR.
Current SA’s supply growth numbers for 2029 (+23%) and 2030 (+14%) could prove low if more projects are launched this year and next.
SA’s assumed demand forecast (+37% through 2028) shows supply exceeding demand by nearly 20% in 2028, or 30GW, and 14% (32GW) in 2030.
But this takes no account of likely delays due to power, chip or memory shortages, supply chain bottlenecks for electrical equipment/labor, and intensifying local opposition to data center construction during the next 2-3 years. Some of the announced projects are already delayed:
● Oracle has reportedly pushed back completion dates for some data centers being developed for OpenAI to 2028 from 2027, citing labor and material shortages.
● Microsoft reportedly pulled back or delayed development sites in the U.K., Australia, North Dakota, Wisconsin, and Illinois. The company also paused construction on portions of a $3.3 billion AI hub in Mount Pleasant, Wisconsin, to review future plans.
● Amazon faces court-ordered delays of at least one year for a major campus in Fauquier County, Virginia, due to local lawsuits. Multiple other Amazon proposals in Virginia have been withdrawn or postponed following community pushback.
● In late 2025, capacity under construction fell by 29% in Northern Virginia, 15% in Oregon, and 14% in Silicon Valley as developers ran into power and permitting hurdles.
Hyperscalers are developing ways and means to bypass grid bottlenecks, pursuing “behind-the-meter” solutions, acquiring energy firms or investing in nuclear power restarts to generate their own on-site electricity. But these are either marginal or down the road solutions.
Additionally, SA sees the overall constraint shifting from power to silicon by 2027 with developing shortages of memory and logic.
While supply gets constrained, adoption/demand will keep growing creating a favorable pricing environment for model providers. But hyperscalers and enablers will need to navigate a politically delicate and shifting supply chain environment to keep pace.
In all, from a demand/supply balance point of view, the next 2-3 years look very favorable thanks to very strong growth in compute demand coupled with challenging supply chains. Demand beyond 2028 is “unpredictable” but it will need to stay above 30% per year given current hyperscaler plans.
In reality, the principal AI bottleneck will always be advanced chip availability.
As the global leader in chip manufacturing (90%+ market shares), TSMC’s strategic initiatives have far-reaching implications for the entire supply chain. TSMC’s most advanced production nodes and packaging lines are currently on allocation with demand for AI-related hardware currently well above TSMC’s total available capacity.
Various sources inform us that:
● 2nm (N2) node capacity is completely booked through the end of 2026.
● 3nm (N3) node’s monthly capacity is reaching its limit to satisfy massive orders from Nvidia (Rubin/Blackwell Ultra) and Apple (iPhone 17).
● For Advanced Packaging (CoWoS), Nvidia has reportedly “gobbled up” approximately 60% of total CoWoS capacity for 2026, leaving Google’s TPU v7 delivery at only 50%–75% of its target due to these constraints.
● TSMC has notified clients of single-digit price increases every year through 2029 due to persistent tight supply.
● Clients like Apple, Nvidia, and Qualcomm have been securing their allocations through advanced contractual commitments that often involve upfront financial guarantees.
In its February 2026 10-K, Nvidia writes:
To secure future supply and capacity, we have paid premiums, provided deposits, and entered into long-term supply agreements and capacity commitments, which have increased our product costs and this may continue. These risks have increased and may continue to increase as our purchase obligations and prepaids have grown and are expected to continue to grow and become a greater portion of our total supply.
The 10-K reveals purchase obligations of $95.2B, up nearly 6x from the previous year, “of which substantially all will be paid through fiscal year 2027” and warns that these are “expected to continue to grow and become a greater portion of supply.”
The AI world is one where bigger is very much better. Nvidia’s dominance provides it with critical best allocations and opportunities. Hyperscalers’ size and financial strength allow them to secure sites and supplies (power, equipment, labor) with long-term commitments at favorable terms.
The current limitation on data center compute is not the silicon die itself, but the CoWoS packaging that stitches the GPU to the memory.
TSMC CEO C.C. Wei has committed to ~60% annual growth in packaging capacity to eliminate the rationing phase by the end of 2027.
Not that it will completely solve the AI bottlenecks: in 2027-28, the scarcity shifts to High Bandwidth Memory (HBM). Manufacturers like SK Hynix and Micron are struggling to match TSMC’s pace.
Not to mention the “power ceiling” that will overhang the industry through 2030.
The risk for everybody in this small circle of AI leader is that compute demand falls short of expectations, leaving most players with excess capacity post 2027.
Everybody being at risk means that everybody is focused on making compute demand happen which also requires compute supply to happen. The proverbial chicken and egg situation.
The weak links are a few important players with little or no sources of cash/cashflows:
● OpenAI and Anthropic, major model providers competing with Google’s Gemini.
● Oracle and various smaller data center builders.
● AMD acting as the primary counterweights to Nvidia’s dominance.
The apparent circularity in AI-related orders/financing/ownership reflects not only the tightness in both demand and supply and the sheer size of necessary commitments in order to be a meaningful player, but also the interests of various strong players in keeping as many options open while ensuring that the weaker players stay healthy and contribute to the supply chain to keep the train moving swiftly.
AI is a world for giants as the BIS recently put it:
US giants are now designing and producing chips, running cloud infrastructure, building data tools, training AI models and operating user facing AI applications.
Publicly listed US-based giants have also expanded their venture financing and deal-making activity in the five key markets. Deal-level data and descriptions of target companies allow us to measure giants’ deal-making activity relative to all AI producers in each market.
In the latest periods, these giants accounted for nearly 70% of all deals in the market for AI models and 33% of all deals in the market for AI applications. In the market for data tools, infrastructure and compute, their share was also sizeable, at about 30%, 32% and 15%, respectively. (…)
Moreover, these companies are making deals across a wider range of AI markets than before, particularly in the market for user-facing AI applications.
(BIS)
(Deloitte)
Nvidia is particularly active in deal making. Its savvy CEO is using its huge annual free cashflow to partner in a VC-way with many companies to strengthen and shape the AI ecosystem, from LLMs (OpenAI) to chip designers (Grok, Thinking Machines Lab), data centers (Nebius) and its suppliers (Coherent, Lumentum) among many others.
The most recent OpenAI equity raise has significantly solidified the AI ecosystem:
● OpenAI got a large cash infusion, including a $50B investment by Amazon which is now financially backing both OpenAI and Anthropic, two of the above mentioned cash burning weak links.
● Amazon got a “massive infrastructure commitment, $138B over 8 years, that positions AWS as a primary compute provider for OpenAI’s training and inference workloads. Under the expanded agreement, OpenAI has committed to consuming approximately 2 gigawatts of Trainium capacity through AWS infrastructure. That commitment spans both current Trainium3 chips and next-generation Trainium4 silicon, expected to begin delivery in 2027.” (Forbes)
● Amazon also gets to distribute both OpenAI and Anthropic models.
AMD did its own structural deals with OpenAI and Meta: both have agreed to buy 6 gigawatts’ worth of AI chips from AMD valued at more than $100B each that could result in OpenAI and Meta eventually each owning 10% of AMD.
The deals are meant to counter competitors like Nvidia, AMD locking OpenAI and Meta into using its chips for as long as possible. OpenAI and Meta secure AMD revenues and bet that AMD shares will appreciate enough to make financial sense for all this.
Oracle, the other weak link, has built its AI participation on debt. On March 6, Bloomberg informed us that
Oracle Corp. and OpenAI have scrapped plans to expand a flagship artificial intelligence data center in Texas after negotiations dragged over financing and OpenAI’s changing needs.
The collapsed talks created an opening for Meta Platforms Inc. to step in and consider leasing the planned expansion site in Abilene, Texas, from developer Crusoe, according to people familiar with the matter.
Nvidia Corp., the leading AI chipmaker, helped facilitate Meta’s discussions with the developer, said the people, who asked not to be identified because the talks are private. (…)
Nvidia became involved to ensure its products would still fill the expanded data center rather than that of rival Advanced Micro Devices Inc., said the people. Nvidia paid a $150 million deposit to Crusoe and began helping court Meta as a tenant for the expansion, the people said.
The financially strong players are at the ready to protect the ecosystem and their own market shares.
Howard Marks had 2 more questions:
1- “Will the investment in AI infrastructure produce an adequate return? Since we don’t have full knowledge of AI’s business potential or its impact on profitability, this question can’t be answered. As I wrote in my December memo, there’s certainly great enthusiasm for AI businesses. We’ll know in 10 years whether the resulting profits justified it.”
2- “Are the valuations assigned to AI businesses irrational? The so-called hyperscalers, for whom AI is one important part of a great business, may be overvalued or undervalued, but it’s unlikely that today’s prices for enormously profitable companies like Microsoft, Amazon, and Google are going to turn out to have been ruinously excessive. Established pure AI plays like OpenAI and Anthropic have yet to be listed publicly; we’ll see what kind of valuations their IPOs result in. Finally, the startups to which multi-billion-dollar valuations are being assigned – some of which have yet to describe their strategies or announce products – can only be viewed as lottery tickets. Most people who participate in lotteries end up with worthless tickets, but the few winners get very rich.”
On the first question, the biggest uncertainty lies with the models: actual demand but also pricing and margins. The bigger risks are with the two pure plays, OpenAI and Anthropic. The other ones have diversified sources of profits and cashflows. What kind of moat and margins will they each secure, particularly against the eventual Chinese competition on both models and applications.
Data center providers’ business models are also unproven so far, while enablers such as Nvidia, TSMC and ASML only have their current fat margins at risk.
Howard is spot on regarding valuations. Also on his conclusion: “No one should stay all-out and risk missing out on one of the great technological steps forward. A moderate position, applied with selectivity and prudence, seems like the best approach.”
This is a very high growth sector with mostly strong and smart players. The numerous supply chain impediments should prevent significant overbuilding for several years, keeping prices solid and margins healthy overall.
For my money and risk profile, it seems best to focus one’s AI investments on the proven well managed, well financed companies with a solid moat and strong pricing power: ASML, TSMC, NVDA, GOOG and AMZN, providing their multiples make some sense for one’s particular situation and risk profile.
EV/EBITDA, P/E AND EBITDA margins as of March 13, 2026 (via Koifin):
TSM
NVDA
AMD
AVGO (Broadcom)
GOOG
AMZN
MSFT
META
ORCL
3 thoughts on “The AI Supercycle: A Deep Dive”
A very useful report. Well done. Thank you.
Thank you David for your continued financial support to the blog. Very much appreciated.
Thank you ! For this thorough and interesting research. Great work!