Treating language models as commodities

The first version of almost every LLM-backed application is the same: one model, some system prompting, enough to get a demo working. It’s a reasonable way to start. The prototype is a question; the architecture can come later.

The trouble is that “later” arrives faster than expected. You want to benchmark a new model and realize your prompts contain provider-specific formatting that doesn’t port cleanly. A team exceeds its API quota and there’s no way to know who consumed it or why. Your costs double when a provider reprices, and the visibility to understand why doesn’t exist yet. You try to add a fallback during an outage and find that calling two different providers requires two entirely different integrations.

None of that is accidental.

The provider incentive

LLM providers have strong reasons to make switching expensive. The obvious tools are proprietary APIs and bundled services: tie your users to your fine-tuning, your embeddings, your function-calling format, your streaming semantics, and switching stops being a one-day project.

The subtler form is pace. The field moves fast enough that every provider can credibly claim their new capability has no equivalent elsewhere. The enterprise chases features rather than building on a stable abstraction. This isn’t unique to AI. It’s the standard playbook for any infrastructure company in a hot market. But the rate of change makes the lock-in more expensive than usual.

Most lock-in punishes you if you want to change. AI lock-in punishes you because you will have to.

If you built tightly around a specific model in early 2024, the model you should be running in late 2025 is probably not the same one, and may not even be from the same provider. The window of “this is the best option for my use case” has been shortening every quarter. Your architecture should know that.

What actually breaks

The problems that accumulate from provider coupling aren’t usually architectural crises. They’re slower than that.

The first sign is testability. When running the same prompt against three models to see which performs best on your actual data takes a day of engineering work, you stop doing it. Which means you stop improving. A team that can’t benchmark continuously will drift toward whatever was working six months ago.

The second is observability. Without a central layer capturing every call, you can’t answer “which team, which feature, which model version is responsible for this cost spike?” by the time you’re asking. You end up with a bill and a mystery.

The third is brittleness. When a provider has an outage (and they will), you discover whether you built routing logic or relied on hope. Fallback logic that was never written stays unwritten until it’s urgent.

The control plane

This is the problem model gateways are solving. They give you a stable surface to program against while the market underneath it shifts.

The pattern is straightforward. Your application talks to one API. That layer (whether you run LiteLLM yourself, or use Portkey, OpenRouter, or Helicone) translates the call to whatever providers you’re routing to. On top of that translation you get routing rules, fallback logic, key management that doesn’t require distributing raw provider credentials to every service, and the observability that tells you what’s actually happening.

The operational consequence is that switching a model becomes a configuration change, not a code change. Benchmarking a new provider becomes an afternoon, not a project. Staying within budget becomes a dashboard, not a quarterly audit.

The objection I hear is that gateways add a hop, add latency, add another thing to maintain. All true. The question is what you’re comparing it to. The alternative isn’t zero overhead: the accumulated weight of bespoke integrations, undifferentiated engineering work, and the renegotiation with a vendor who knows the switching cost is high.

Treating models as interchangeable infrastructure, rather than products you’re coupled to, isn’t a prediction about which model or provider will win. It’s a structural bet that the field will keep moving faster than any single vendor relationship can keep up with.

That bet has been right every year so far.

Treating language models as commodities

The provider incentive

What actually breaks

The control plane

Related

Beating the benchmark was the easy part

Scientific coding is the frontier

The world has to grade itself