When the second copy stopped being free

Ask why software became the most profitable business in modern history and you will get a lot of answers. Network effects, recurring revenue, defensibility, the talent. They are all real, and they are all downstream of one fact so basic it usually goes unmentioned. Once you have written a piece of software, serving it to one more person costs almost nothing.

That fact is the foundation under everything else. It is why a software company can earn an 80% gross margin while a grocery store earns 26%. It is why a startup can give its product away to millions of people and still raise money on the giveaway. It is why, in software, getting bigger usually means getting more profitable, not less. For forty years that property held through every change in how software was built and delivered. It is now, for the first time, in question, and the reason is the most celebrated software of the moment.

Every answer an AI model produces costs real money to produce. Not a rounding error. Not a fixed cost you pay once and forget. A variable cost, paid per use, that scales with how much the thing gets used. Software spent four decades escaping exactly this, and AI has quietly brought it back.

What the second copy cost

The cleanest statement of the old economics is twenty-eight years old. In Information Rules (1998), the economists Carl Shapiro and Hal Varian put it in seven words: “information is costly to produce but cheap to reproduce.” The first copy of a program absorbs enormous fixed cost in salaries, years, and false starts. Every copy after that is close to free. Erik Brynjolfsson and Andrew McAfee later compressed the same idea for the digital era: software is “free, perfect, and instant.” Free to copy, identical every time, delivered anywhere at once.

Almost everything people admire about software economics falls out of that one property.

Gross margins run high because the cost of delivering the product to the next customer is tiny. The standard benchmark for a healthy software company is a gross margin somewhere between 70% and 85%, and the publicly reported averages bear it out, with system and application software around 72% against 38% for hardware and 26% for grocery retail. Each new dollar of revenue arrives as mostly profit, available to spend on building more or selling harder.

Freemium works because giving the product away to someone who will never pay costs the company almost nothing. The free tier is a near-costless funnel. Winner-take-most dynamics emerge because there is no diseconomy of scale to slow the leader down: no factory that gets more expensive to run as it grows, no shelf space to fight over. And the property investors prize most, operating leverage, is the same fact seen from the side. Spread a large fixed cost over more and more users whose marginal cost is near zero, and margins expand as you grow. When Marc Andreessen argued in 2011 that software was eating the world, the phrase doing the quiet work in his essay was “high-margin, highly defensible businesses.” The margin came first.

It is worth being precise. The marginal cost was never literally zero. Real software has support costs, payment processing, some compute per request. That is why the margin is 75% and not 100%. But it was near zero, and near zero was enough to build an industry on.

The cloud moved the bill, not the rule

The obvious objection is that software has always cost something to run. Servers are not free; bandwidth is not free. True. But follow where that cost sat, because the history matters for understanding why AI is different.

In the mainframe era, beginning with the IBM System/360 in 1965, compute was a capital asset. You bought or leased the machine, and the cost was yours, on your balance sheet, upfront. Client-server computing in the 1980s and 1990s kept the same shape: the buyer racked the servers and staffed the room. Then in 2006 Amazon launched S3 and EC2, and the model inverted. Capital expense became operating expense. You stopped buying the machine and started renting it by the hour.

What that shift did not do is give software a meaningful marginal cost. It relocated the fixed cost, from the buyer’s data center to the vendor’s cloud bill, and made it elastic. The cost of serving one additional request stayed minuscule. A serverless function on AWS Lambda is billed at twenty cents per million requests, which is two ten-millionths of a dollar per call. Storage and bandwidth for an ordinary web app are measured in fractions of a cent. So even after the cloud transition, SaaS gross margins held right where they had always been, in the 70-to-85 range. The cloud changed who paid the fixed cost and when. It left the central quirk untouched: the next user was still essentially free.

This is the part to hold onto. The zero-marginal-cost property survived every platform shift the industry went through, from mainframe to client-server to cloud. It was robust. It felt like a law of nature because nothing had managed to break it.

The one place it already broke

One category broke it before AI did, and looking at that category tells you what the broken version looks like.

Video streaming never had software’s margins. Every hour a subscriber watches consumes real, metered bandwidth, and the content itself is an enormous recurring cost. Netflix’s per-subscriber contribution margin has run somewhere in the mid-forties percent, roughly half what a pure software company earns, because nearly half of each subscriber’s payment goes back out as content and delivery cost. The delivery piece is the instructive one. Serving video over commercial content-delivery networks cost Netflix on the order of cents per gigabyte, and at Netflix’s scale those cents compounded into one of its largest variable expenses. So Netflix did the thing companies do when marginal cost actually bites: it vertically integrated. It built its own delivery network, Open Connect, and put its boxes inside internet providers to claw the per-stream cost back down.

The lesson is not that streaming is a bad business. It is that the moment each use consumes metered infrastructure, three things happen together. Gross margins compress toward the 40-to-60 range. The cost of serving your heaviest users becomes a strategic problem. And you start spending capital to own the thing that drives the marginal cost. For two decades that pattern was confined to media, a special case, easy to file away as “well, video is different.”

AI takes that same pattern and brings it to everything else.

A meter on every answer

When a model answers a question, it runs a forward pass across billions of parameters on a GPU somewhere, and that compute is metered. The ten-thousandth query costs about what the thousandth did. There is no point at which the work of answering becomes free, because the answering is the product, and the answering always costs.

The numbers are already visible in the businesses built on top of models. ICONIQ’s 2026 survey of around three hundred software companies put the average gross margin of AI products at 52%, up from 41% the year before, and well under the 80%-plus that defined mature SaaS. Inference alone ate about 23% of revenue. For every dollar these companies brought in, roughly a quarter went to paying for the model to think before any other cost was counted. That is a cost line traditional software simply did not have.

The extreme cases make the shape vivid. When OpenAI’s o3 model posted its breakthrough scores on the ARC-AGI reasoning benchmark in late 2024, the efficient configuration cost on the order of twenty dollars per task. The maximum-compute configuration was estimated to run into the thousands of dollars per task, burning tens of millions of tokens chewing through possibilities. A human, for comparison, solves one of those puzzles for about five dollars. The model can win, but it pays by the question, every time, and sometimes it pays a great deal.

Software’s superpower was never the code. It was that the second copy was free.

You can watch the old pricing model break against this in real time. Cursor, the AI coding tool, launched at a flat twenty dollars a month with generous access to premium models. As it grew, the underlying model costs did not fall the way software costs are supposed to. They rose, as users pulled in each more capable model in turn. A flat fee against a rising variable cost is a bet you lose on your best customers, and in 2025 Cursor was forced onto usage-based pricing, with heavy users now told to expect somewhere between sixty and two hundred dollars a month. The detail that matters is the one flat pricing cannot survive: the customers who got the most value were the same customers who ran up the largest bills. Value and cost moved together. In old software they were decoupled, which is exactly what let you charge everyone the same price and profit on the heavy users. That decoupling is gone.

The race that hasn’t been called

Here is where it would be easy to overreach, and where the honest answer is more interesting than the dramatic one. It is tempting to say software’s golden economics are finished. They are not finished. They are in a race, and the race is genuinely undecided.

On one side is deflation, and it is ferocious. The cost of a given amount of machine intelligence has been falling faster than almost anything in the history of technology. Epoch AI, which tracks this carefully, finds the price of reaching a fixed capability dropping by a median of roughly fifty times a year, and faster than that for many tasks since the start of 2024. The venture firm a16z, describing the same phenomenon, noted that a model of GPT-3’s quality fell from sixty dollars per million tokens to six cents in three years, a thousandfold drop. If that continues, the meter on each answer keeps ticking down toward the rounding error software used to enjoy.

On the other side is inflation, and it is just as real. The models that are getting the attention are reasoning models, and they reason by spending tokens, not a few more but orders of magnitude more, thinking through a problem before they answer. Cheaper tokens invite more token use, which is the oldest pattern in resource economics: when something gets cheaper, we consume so much more of it that total spending rises rather than falls. And the frontier keeps moving toward harder problems that cost more to solve, not less. So while the price per token collapses, the tokens per task climb, and the total bill climbs with them.

The precise version, then, is this. The cost per unit of intelligence is racing toward zero. The cost per completed task, and the aggregate amount everyone is spending on inference, are going the other way. Which curve wins, and when, nobody can honestly say yet. There is real evidence the margin problem is solvable. By one analysis, Anthropic pushed the gross margin on its inference from the high thirties to above seventy percent through software and hardware optimization alone, generating the same answers far more cheaply. That looks like a path back toward software-like economics. It is also not yet the whole industry, and the frontier keeps raising the cost of the questions worth asking.

What is no longer true is the thing that used to be automatic. Marginal cost in software is no longer assumed to be zero. It is now a number you have to manage, a number that might converge back toward nothing or might not, depending on engineering you have to actually do.

What comes apart

Treat near-zero marginal cost as the load-bearing assumption it was, and you can see what strains when it weakens.

Pricing is the first thing to move, and it is already moving. Per-seat licensing was built for a world where a user, once signed up, cost nothing more to serve, so you might as well let them use the product as much as they liked. When every use has a cost, unlimited use at a flat price is a structural loss. The industry is shifting toward usage- and outcome-based pricing, with industry forecasts putting at least 40% of enterprise software spending on that model by 2030. The shift is uneven and contested, and some large vendors are holding the seat-based line for now. But the direction follows directly from the economics. Cursor was simply early to feel it.

Freemium gets more expensive, because a free user is no longer almost free. Every person you give the product to now carries a real cost of goods, which makes giving it away to millions a different decision than it was. The growth playbook that assumed distribution was costless has to be re-underwritten line by line.

Moats move. When the marginal cost is real, being able to serve each unit more cheaply than your competitor becomes a durable advantage, the way Netflix’s own delivery network was. Efficiency per token, and owning more of the stack that produces the answer, start to matter alongside the quality of the product itself. The model layer, as I have argued before, looks increasingly like a commodity input, and the advantage accrues to whoever turns that input into useful work most efficiently.

And the assumption that scale means margin expansion weakens. More users can now mean more cost, not just more revenue spread over the same fixed base. This is the question underneath Sequoia’s much-discussed “$600 billion question”: the gap between the hundreds of billions being spent on AI infrastructure and the revenue that has to eventually justify it. Hyperscaler capital spending is running somewhere between six hundred billion and eight hundred billion dollars in 2026, and the majority of AI compute now goes to inference rather than training, to serving the product rather than building it. That is a lot of fixed cost betting that the marginal cost of each answer falls far enough, fast enough, for the old economics to reassert themselves.

For forty years, “scale” and “margin” were effectively the same word in software. You grew, and because the next user was free, you grew more profitable in the same motion. AI has pried those two words apart. They may come back together, if deflation outruns the appetite for more expensive thinking. But they are no longer the same thing by default, and a generation of intuitions about why software is such a good business was built on a default that no longer holds.