About

A public notebook for the AI memory thesis.

The premise is simple: high-bandwidth memory, not just accelerator compute, may be the binding constraint in the next phase of AI infrastructure. The work here tries to size that claim honestly enough that you can argue with the assumptions, not just the conclusion.

What this is

Memory Analyst is a living research site about AI memory demand, HBM supply, context growth, model size, and the bottlenecks that show up when inference gets bigger and more stateful.

It’s a thesis essay built on a public model and a live calculator. Every number on the essay can be opened, changed, and broken in the calculator. Nothing is meant to be taken on faith.

The argument, in one breath

AI demand isn’t one curve. It’s three engines climbing together — more usage, more state per task, and bigger models — and several of them multiply. Read on annual rates, yearly HBM demand grows about 5× this decade while the world’s annual output barely doubles. By 2030 demand runs roughly 3× what the industry can build in a year, and no plausible efficiency win closes a production-rate gap.

A sold-out, high-fixed-cost market is how that shortage becomes pricing power and margin. Whether the tightness breaks the cycle or just delays the next one is a separate, open bet. The site is careful about which is which.

Why it exists

Most AI-infrastructure talk collapses into one vague idea: more AI means more chips. Directionally true, not precise enough to trade on.

The harder question is where the bottleneck actually lands. Memory demand depends on tokens, resident weights, context state, KV cache, routing, optimization, utilization, and supply quality. Those pieces behave differently, so the model keeps them apart instead of mashing them into one multiplier.

How the calculator is built

  • Weights — the resident model footprint held across serving replicas.
  • KV cache — the live memory of prior tokens, kept so the model doesn’t recompute the whole conversation each step.
  • Context bucket — a single dial rolling up active sessions, resident context, KV bytes, residency, and KV efficiency.
  • Supply split — rest-of-world modern HBM kept separate from China gross supply, adjusted by a modern-equivalent share.

Demand adds across buckets; the terms only multiply inside each one. Tokens × model × context is a category error, and the calculator refuses to do it.

Disclaimer

Memory Analyst is an independent research site for discussion and education. Nothing here is investment, legal, tax, accounting, or procurement advice, and nothing should be read as a recommendation to buy or sell any security, private investment, memory product, GPU, contract, or related asset.

The essay, calculator, charts, and outputs are scenario analysis built from public information, estimates, simplifications, and user-selected assumptions. They may be wrong, stale, incomplete, or inconsistent. HBM supply, AI demand, model architecture, export controls, pricing, yields, packaging, and serving efficiency can all change quickly. Do your own work and check primary sources before making any decision.