Blog

Gemini 3 Is Impressive. The Pricing Might Kill It.

Google's Gemini 3 family brings real benchmark gains, but the new API price floor looks painful for bootstrapped, high-volume products built around Flash economics.

May 22, 2026

AIAnalysisGoogle

Google’s Gemini 3 family is here, and the benchmarks are real. Gemini 3 Pro tops LMArena at 1501 Elo. Gemini 3.5 Flash, announced at I/O 2026, combines frontier intelligence with speed. Deep Think mode hits 41% on Humanity’s Last Exam without tools. The capabilities are not in question.

The pricing is.

The core question: whether Gemini 3's capabilities clear its new API price floor.

I run two production products on the Gemini API. nwslyr (this site) runs on Gemini 2.5 Flash. portada-engine, a 76-market news network, runs on both Gemini 2.5 Flash Lite and Gemini 2.5 Flash. Between the two products, I process tens of thousands of API calls per month. My combined API costs sit between $5 and $8 per month. That number is the result of deliberate cost engineering: deterministic categorization instead of AI classification, decoupled caching, split API keys by region. Every dollar matters because these are bootstrapped products with zero outside funding.

When Google announced the Gemini 3 family, I looked at the pricing page the way any production API user would: not with excitement about new features, but with a calculator.

Here is what I found.

The Numbers, Side by Side

All prices are per 1 million tokens, standard tier, from Google’s official Gemini API pricing page as of May 22, 2026.

Flash Lite (the budget tier):

Gemini 2.5 Flash Lite: $0.10 input / $0.40 output

Gemini 3.1 Flash Lite: $0.25 input / $1.50 output

That is a 2.5x increase on input and a 3.75x increase on output. The budget tier got 2.5 to nearly 4 times more expensive overnight. Meanwhile, OpenAI’s GPT-4.1 Nano still sits at $0.10/$0.40, exactly where Gemini 2.5 Flash Lite is today.

Flash (the workhorse tier):

Gemini 2.5 Flash: $0.30 input / $2.50 output

Gemini 3 Flash: $0.50 input / $3.00 output

Gemini 3.5 Flash: $1.50 input / $9.00 output

Going from 2.5 Flash to 3.5 Flash is a 5x increase on input and a 3.6x increase on output. For context, Gemini 3.5 Flash at $1.50/$9.00 is now more expensive than Claude Haiku 4.5 ($1.00/$5.00) and GPT-5.4 Mini ($0.75/$4.50). Google’s “fast and affordable” tier now costs more than the competition’s budget models.

Pro (the heavy tier):

Gemini 2.5 Pro: $1.25 input / $10.00 output

Gemini 3.1 Pro: $2.00 input / $12.00 output (and $4.00/$18.00 beyond 200K context)

Pro got a more modest bump, but it was already expensive. At $12 per million output tokens, it is closing in on Claude Sonnet 4.6 ($15). At the long-context rate of $18 output, it exceeds Sonnet and GPT-5.4 ($15) while offering no free tier.

Google Has Not Explained Why

Google has not published any statement explaining the price increase across the Gemini 3 family. No blog post, no developer FAQ, no pricing rationale document.

The closest thing to an explanation comes from outside Google. Simon Willison noted that “all three of the major AI labs appear to be probing the price tolerance of their API customers.” He is right. GPT-5.5 launched at 2x the price of GPT-5.4. Claude Opus 4.7 is roughly 1.46x Opus 4.6 per token. Every lab is moving prices up. Google just moved the furthest, fastest.

One analysis from NxCode framed it as the cost of progress: “the price increase pays for the thinking budget and the agentic posttraining. Whether you accept that depends on your unit economics.” That is probably the most honest read of the situation. The models are more expensive to run. Whether the improvements justify the cost to the people paying for them is a separate question, and Google has not bothered to make that case.

The Deprecation Clock

This is not a hypothetical concern. The deprecation dates are published.

Gemini 2.0 Flash and 2.0 Flash Lite shut down June 1, 2026. Ten days from today.

Gemini 2.5 Flash and 2.5 Pro are scheduled for deprecation on June 17, 2026. Twenty-six days from today.

Gemini 2.5 Flash Lite has a July 22, 2026 shutdown date. Two months from today.

Every affordable Gemini model will be gone by midsummer.

Developers saw this coming and asked Google directly. In December 2025, a developer posted on Google’s AI forum asking about the pricing impact of the 2.5 Flash deprecation. Google’s community response was that “there is no indication that the planned deprecation will affect the pricing of other models.” The developer pushed back: “As a developer of a service that relies on Flash 2.5, I need to know the future of the Gemini pricing strategy. Flash 3.0 is pricier than 2.5. When 2.5 is deprecated, it means my costs to operate the service will have to go up? Will there be any alternative?” No further response from Google.

In March 2026, another developer asked which stable production models would replace 2.5 Flash and 2.5 Pro, noting the available replacements were still in preview. No substantive answer from Google.

There is no Gemini 3 model at the old 2.5 Flash Lite price point. The cheapest option in the 3.x family is 3.1 Flash Lite at $0.25/$1.50, which is 2.5x/3.75x what 2.5 Flash Lite costs today. The floor went up, and Google is not talking about it.

What This Means for Real Workloads

My nwslyr API bill on 2.5 Flash runs $1 to $3 per month. Four daily editions, 35 sources scanned per cycle, every headline and deck rewritten to strip clickbait. If I migrated to 3.5 Flash today, that bill would jump to roughly $5 to $11 per month. For a single site, that is survivable. But it is a 3x to 5x increase for capabilities I do not need. My editorial pipeline works. The briefings are accurate. The sources are curated. Nothing about Gemini 3.5 Flash solves a problem I actually have.

Portada-engine is where the math gets worse. Seventy-six markets, three brand families, three languages. The entire architecture was designed around Flash Lite and Flash economics. Monthly Gemini spend went from $300 on broken calls down to $4.80 at 34 markets after I replaced AI categorization with deterministic rules and restructured the caching layer. At 76 markets, the cost model holds because both Flash Lite and Flash are cheap. If I had to move to 3.1 Flash Lite and 3.5 Flash at 2.5x to 5x the cost, the math stops working. Not catastrophic in absolute terms, but it breaks the unit economics that justified the architecture.

And this is the critical point: these are optimized workloads. I already did the cost engineering. I already stripped out unnecessary AI calls. I already moved categorization to deterministic rules. The savings came from architecture decisions, not from having a cheaper model. When the model itself gets 3x to 5x more expensive, architecture cannot save you.

Where Is the ROI?

The AI industry talks about capabilities like they exist in a vacuum. Benchmarks go up. Scores improve. New features ship. But the question that matters for anyone running a business on these APIs is simpler: does the new model do something my current model cannot, at a price that makes the upgrade worth it?

For most production workloads I can think of, the answer right now is no.

If your product runs on 2.5 Flash and works, 3.5 Flash does not offer a capability that justifies a 5x input price increase. Better reasoning is nice. Improved multimodal understanding is nice. But “nice” does not show up on a P&L statement. The features in Gemini 3 are incremental improvements for the vast majority of API use cases: text processing, classification, summarization, content generation, data extraction. These tasks worked fine on 2.5. They worked fine on 2.0. The models were good enough for production six months ago.

The businesses that actually need Gemini 3 Pro’s PhD-level reasoning or Deep Think mode are a tiny fraction of the API customer base. Most API usage is high-volume, cost-sensitive work: the exact workloads that just got priced out of the upgrade path.

Two Markets, One Price Sheet

If you have the budget, of course you want Gemini 3. Better reasoning, better multimodal understanding, better agentic capabilities. Nobody is arguing the technology is not worth having. The question is who can actually afford to have it.

The AI conversation right now is dominated by companies with funded AI teams and research budgets. They will absorb these prices. They will run Gemini 3 Pro and Deep Think and build agentic workflows and write conference talks about how transformative it all is. That is real, and good for them.

But that is not where mass API adoption lives.

Most businesses trying to integrate AI right now are cash-strapped. Their budgets are slim. Many of them can barely justify hiring one person to work on AI, let alone absorbing a 3x to 5x increase in the API costs that person’s work depends on. These are the companies that actually drive API volume: small teams, lean products, tight margins. They are not choosing between Gemini 3 Pro and Claude Opus. They are choosing between Gemini 2.5 Flash Lite at $0.10/$0.40 and not using AI at all.

Google built its developer base on being the affordable, reliable API. Flash Lite at $0.10/$0.40 was a competitive weapon. It let indie builders, startups, and bootstrapped products run real AI workloads without venture capital. That pricing created an ecosystem. Developers built architectures around those economics. Products launched. Businesses formed.

Pulling that floor up by 2.5x to 5x is not just a pricing change. It is a signal. It says the era of accessible API pricing was a customer acquisition strategy, not a permanent position. And developers who planned their cost models around those prices are now holding the bill.

I am not moving to Gemini 3. Not at these prices, not for these workloads. When the 2.5 deprecation hits in June, I will do the math and compare against Claude Haiku, GPT-4.1 Nano, and whatever else exists at that point. Platform loyalty is a luxury reserved for companies with margins to spare.

The capabilities in Gemini 3 are real. The benchmarks are real. But the ROI at these prices, for the workloads that actually drive API volume, is not there. And if the numbers are not there, the features do not matter.