The Jevons Paradox: Why My AI Costs Exploded

AI allows you and your team to do lot more. Even if the AI's unit cost falls down, you won't see a reduction in cost as usage explodes by 10X

The Jevons Paradox: Why My AI Costs Exploded
Article: Jevons paradox

I built a chatbot for my team to polish client emails. I used Gemini Flash — free tier, daily refresh, no cost. For twenty to thirty queries a day across five people, the free allocation would last forever.

Within a week, my team had found a hundred uses I never planned for. Summarising long email threads. Rewriting internal memos. Checking grammar on casual messages before sending. The free allocation was gone within hours of the working day starting. I had to move to paid — or find an alternative.

I investigated alternatives. Ran tests within free API limits before committing to anything. Settled on Llama via Groq as the primary model: $0.05 per million input tokens, $0.08 per million output. Flash Lite as a backup for edge cases: $0.10 input, $0.30 output. I never moved to paid Flash.

The output quality difference between Llama and Flash for email polishing? Noticeable but irrelevant. And that second part — irrelevant — is the thing worth understanding.


My team went from poor emails to above-average emails. That jump was the entire value of the tool. Flash might produce output fifty percent better than Llama. Flash Lite maybe twenty percent better. But when the baseline improvement is already that large, the marginal difference between models stops mattering. The team does not need the best email. They need an email that is good enough to send without embarrassment.

The right model is not the most capable one. It is the cheapest one that clears the quality bar the task actually needs.

Most operators never ask that question. They default to the model they know, assume free tiers will hold, and discover the real cost only when the allocation runs out or the bill arrives.


There is a 19th-century reason why the free tier burned so fast.

In 1865, economist William Stanley Jevons studied steam engines. As engines became more efficient, he expected coal consumption to fall. Instead it exploded. Cheaper energy meant more people found reasons to use energy. Efficiency created demand faster than it reduced cost.

AI follows the same pattern. Inference costs — what you pay every time your team runs a query — dropped 280 times in 2024. Did spending drop? No. Usage grew faster than costs fell. Google’s token processing jumped from 480 trillion to 1.3 quadrillion tokens in five months. The cheaper the tool, the more uses people find for it.

This is not a management failure. It is predictable. When you make something frictionless, your team will find uses you never planned for. The only protection is knowing what the task actually needs before the allocation runs out.

AI Cost Explore


I used to think AI decisions involved three variables — the same triangle that governs every project. Time, cost, quality. Pick any two. AI collapses that triangle. Time drops out. Across cloud models, response speed is similar enough that it rarely changes what you choose. You are left with one decision: how much quality does this specific task require, and what is the cheapest way to get there? Here is the rule I now use.

  • Costly models for complex work. Coding, deep research, reasoning tasks where a wrong output creates rework that costs more than the model bill. I use Claude for O9X development and anything requiring multi-step judgment. The annual cost is Rs 36,000. A bad output here creates downstream problems that cost far more to fix.
  • Cheap models for volume work. Anything a capable intern could do with clear instructions. Drafting reports from templates. Summarising long documents. Polishing routine emails. Llama on Groq handles this. The quality difference between Llama and a premium model for these tasks is real. For most tasks, it does not matter — because the baseline improvement is already doing the work.
  • Self-hosted for confidential data. If the data cannot leave your machine, the model cannot either. I run small local models on my laptop for one-off reports involving sensitive information. Slower. No cost. Secure. This one required no experimentation — the logic was obvious from the start. The third path came naturally. The first two took months of wrong assignments before the rule became clear.

The mistake most operators make is treating all AI work as one category. It is not. Premium models are not a default. They are a last resort after you have asked whether something cheaper clears the bar.

For email polishing, the bar is: good enough to send without embarrassment. Llama clears it at $0.05 per million input tokens. Flash would clear it better at six times the cost. The extra quality buys nothing because the task does not need it.

Ask what the task actually requires. Then find the cheapest model that delivers it. The gap between those two numbers is where most operators are losing money without knowing it.


What I am still figuring out: how long I can hold the Rs 36,000 line for Claude before the volume of complex work forces a tier upgrade. Usage compounds. The email tool taught me that.

Figuring it out

The discipline is not finding the right model once. It is resisting the pull to route everything through the most capable one because it is familiar. That instinct is expensive. And it will feel invisible until the free tier runs out at ten in the morning.