Inference Is the New Cloud. Baseten’s $1.5 Billion Round Is the First Real Proof.

The race everyone thought was about training turned out to be about inference. Baseten raised $1.5 billion in a Series F on June 22, led by Altimeter Capital, Conviction, and Spark Capital — with Sands Capital, Wellington Management, IVP, Greylock, and Nvidia joining — at a $13 billion valuation. The company processes more than one billion AI inference calls every day across 87 clusters and 18 cloud providers. Revenue grew 20 times year-over-year.

One billion calls per day is not a compelling metric by accident. It’s the number Baseten chose to lead its announcement because it quantifies the same thing AWS leads with: usage at the infrastructure layer. Infrastructure businesses don’t lead with product features. They lead with traffic. Baseten is not positioning as an AI company. It is positioning as a cloud.

The inference market is structurally different from training in ways that matter for where value eventually settles. Training happens once, expensively, in a handful of hyperscale data centers. Inference happens billions of times per day, at the edge of every application that uses AI. Every chatbot response, every code completion, every fraud signal, every document summary is an inference call. The training market is already consolidating around players with capital to acquire frontier model capability. The inference market is open. The bottleneck isn’t building the model — it’s serving it, reliably, at cost, at scale.

The valuation trajectory makes the repricing visible. Baseten’s $300 million Series E in January 2026 valued the company at $5 billion. Five months later, the Series F lands at $13 billion. That $8 billion step-up reflects one thing: the market is repricing inference as infrastructure, not tooling. Infrastructure multiples are durable in ways that tooling multiples are not. The model layer will commoditize. The serving layer won’t.

For founders building in Brazil and LatAm, this matters practically. Every AI application in the region — credit scoring agents, customer service systems, document processors, fraud models — is an inference workload. The cost and latency of that workload determines whether the application is economically viable. Baseten’s platform running across 18 cloud providers means regional builders now have access to inference infrastructure at a scale and reliability threshold that didn’t exist two years ago. The application-layer opportunity is real. The question was always whether the infrastructure underneath it was ready. This week’s round suggests the answer is yes.