Pimp My SaaS
Adoption is real. So is the gap between AI that changes how the work gets done and AI that just added a chatbot.
What you need to know: Adoption is real. So is the gap between AI that changes how the work gets done and AI that just added a chatbot. The $500 million Claude bill, Starbucks pulling its inventory tool, and what actually separates the two.
If you watched Pimp My Ride [1], you remember the formula. Someone hands over a beat-up car. The crew gives it a paint job you can see from orbit, a row of screens in the dash, maybe a popcorn machine in the trunk. The reveal is spectacular. What you found out later was that a lot of those builds barely ran, because the budget went to the parts that filmed well and not to the engine.
Go back and watch one of those reveals now. The paint that looked wild in 2004 looks cheap. The screens are the size of a calculator. The whole thing is cringey in a way it was not at the time. It was passable for its moment. The moment passed.
A good amount of AI product work in 2026 is running the same play, and it is still in its moment. Take a product that already exists, bolt on a chatbot or an “agent” or an auto-summary button, then publish a launch post saying the whole thing has been transformed. Sometimes the work underneath really did get better. Often it is the same van with a louder stereo and a new badge that says AI. Right now, in May 2026, the badge still impresses. The interesting question is how long that lasts, and what these builds look like once the reveal is over and someone has to explain the spend.
The adoption numbers make it more complicated than a blanket dismissal. Stanford’s 2026 AI Index found generative AI reached 53 percent of the global population within three years, quicker than the personal computer or the internet managed at the same point after launch. [2] McKinsey’s most recent State of AI survey put organizational use at 88 percent, up from 78 a year earlier. [3] People are using these tools. The real split is how much of what got shipped this year will age like an engine swap and how much will age like a dashboard TV in a 2004 Escalade.
Two stories from this month draw the line.
The dashboard was glowing. Nobody watched the gauges.
Start with the one that broke today. An AI consultant told Axios that one of their clients spent roughly half a billion dollars on Claude in a single month. [4] Not on a failed acquisition. On usage. The company handed employees access to Anthropic’s tools and never set spending limits, so people ran whatever they wanted and the meter ran with them. The company has not been named. The number is large enough that the speculation has narrowed to a handful of very big firms.
It would be easy to read that as a story about Claude being expensive, and that is the wrong lesson. I build on these tools every day, and Claude Code is good at the work, strong enough on real software tasks that engineers reach for it constantly. The failure here was not the tool. It was that nobody built the system around it. No usage policies. No routing rules for which model handles which task. No constraints on what counts as a valid use case versus someone checking the weather on the most expensive API available. One CTO told Axios their staff were doing exactly that. [4] The tool worked. The integration architecture did not exist.
It is not an isolated case. Microsoft has been pulling back internal Claude Code licenses as per-engineer costs climbed into the hundreds and thousands of dollars a month. [5][6] Uber reportedly burned through its entire 2026 AI budget by April [6], and its operations chief said the spending is getting harder to justify when the return is hard to measure. [5] Amazon shut down an internal leaderboard that ranked employees by AI usage after people started running pointless tasks to climb it, a habit that picked up the nickname “tokenmaxxing.” [7] The common thread is not that the technology does not work. It is that nobody set up the governance layer: the policies, the monitoring, the cost controls, the basic question of what this tool is supposed to be doing and for whom. The AI was present. The integration was not.
A tool that looked the part and could not do the job
The other failure mode is quieter. On May 21, Reuters reported that Starbucks retired the AI tool its workers had been using to count inventory, nine months after rolling it out across North America. [8] The system, built by a company called NomadGo, used LiDAR sensors and a tablet camera to scan shelves and tally syrups, milks, and other beverage stock. On paper it was exactly the kind of thing a turnaround CEO wants: a piece of the “Back to Starbucks” plan, technology aimed straight at the product shortages that had been dragging on sales.
In practice it kept miscounting. It confused similar kinds of milk, or missed items altogether. A promotional video Starbucks itself put out managed to show the tool failing to register a peppermint syrup bottle while it counted the ones next to it. The internal memo announcing the end of it was blunt: automated counting was being retired, and staff would go back to counting it the way they count everything else. [8] Starbucks framed the move as standardizing for consistency at scale, which is the corporate way of saying the fancy version was not reliable enough to keep.
Pulling the tool was the right call. Deploying a computer-vision system across North America before it could reliably tell oat milk from whole milk was the mistake; retiring it once that became clear was the correction. The half-billion-dollar company did not get that far. That number did not come from internal controls catching the problem. It came from an AI consultant describing a client to Axios. [4] Without that conversation, the meter would presumably still be running, and there is no telling what else was going unchecked before it surfaced. One company looked at a bad result and acted on it. The other was not looking at all. Only one of those is governance.
The numbers behind the vibe
The survey data tells the same story as the case studies. McKinsey found that while 88 percent of organizations use AI somewhere, the majority are still piloting or experimenting, only about a third have begun to scale, and roughly 6 percent qualify as high performers actually capturing meaningful enterprise value. [3] A widely cited MIT study last year was harsher, estimating that 95 percent of enterprise generative AI pilots produced no measurable impact on profit and loss. [9]
McKinsey’s own read on what separates that 6 percent is the part worth sitting with. The companies seeing real value are the ones that redesigned how work gets done around the tool, rather than adding a feature and moving on. [3] They changed the process, set the constraints, built the monitoring, and stuck with it long enough to measure what happened. The bottleneck is rarely the model. It is organizational: integration, governance, and follow-through.
That is the difference between the paint job and the engine swap. It is measurable. It just does not photograph well.
The regulators are starting to grade the paint
Overstating what your AI does is no longer only a credibility problem. The SEC began bringing “AI washing” cases back in 2024, settling with two investment advisers who told clients they were using AI in ways they were not, and the agency has said it intends to keep policing it. [10] The FTC ran its own sweep, Operation AI Comply, and its chair at the time said there is “no AI exemption from the laws on the books.” [11] Claiming a capability you cannot back up has moved from embarrassing to actionable.
So the incentives are finally lining up against pure theater. If you say the product does something, you had better be able to show that it does, to a customer, an auditor, or a regulator who asks a follow-up question. The reveal is not enough anymore. Somebody is going to look under the hood.
What the next phase rewards
None of this is an argument against AI. The adoption numbers are real, the tools are genuinely useful, and the companies that work out the operating model are pulling away from the ones that did not. It is an argument about which work matters now.
The launch post and the demo are mostly finished as differentiators. Everyone has the chatbot. What matters now is the work that makes it actually run: deciding what a tool is for, setting up the integration so it fits a workflow instead of floating next to one, putting limits and monitoring around it so a single month does not cost half a billion dollars, measuring whether it changed the job or only the screenshot, being willing to pull it when it does not, the way Starbucks did. That discipline is not new and it is not exotic. It has older names: quality assurance, compliance, governance. AI did not retire any of them. It made them load-bearing.
A well-built product can look as good as it wants. That was never the issue. The issue is whether the build accounted for the whole car or only the parts that film well. The Pimp My Ride reveals looked unbelievable in 2004. Go watch one now. The ones that were only built for the camera became cringey the moment the camera left. A lot of AI product work shipped in 2026 is on the same clock.
Sources
[1] MTV UK, “Pimp My Ride’s Most Outrageous Modifications | Pimp My Ride,” YouTube, September 2, 2019. https://www.youtube.com/watch?v=nCz22KQPin8
[2] Stanford HAI, “The 2026 AI Index Report,” April 2026. https://hai.stanford.edu/ai-index/2026-ai-index-report
[3] McKinsey & Company, “The State of AI in 2025: Agents, Innovation, and Transformation,” November 2025. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
[4] Fast Company, “One company spent half a billion dollars on Claude in a single month: Report comes as AI costs climb,” May 29, 2026. https://www.fastcompany.com/91550884/claude-ai-costs-climb-company-spent-half-a-billion-dollars-in-a-single-month-report
[5] The Decoder, “One company reportedly spent $500 million on Claude in one month after failing to cap AI usage,” May 29, 2026. https://the-decoder.com/one-company-reportedly-spent-500-million-on-claude-in-one-month-after-failing-to-cap-ai-usage/
[6] Investing.com, “The AI Token Pricing Crisis Behind OpenAI and Anthropic’s Revenue Race,” May 2026. https://www.investing.com/analysis/the-ai-token-pricing-crisis-behind-openai-and-anthropics-revenue-race-200680777
[7] Tom’s Hardware, “Mystery company accidentally blew $500 million on Claude AI in a single month, failed to put usage limit on licenses for employees,” May 29, 2026. https://www.tomshardware.com/tech-industry/artificial-intelligence/mystery-company-accidentally-blew-usd500-million-on-claude-in-a-single-month-failed-to-put-usage-limit-on-licenses-for-employees
[8] Reuters, via CNBC, “Starbucks scraps AI inventory tool across North America,” May 21, 2026. https://www.cnbc.com/2026/05/21/starbucks-scraps-ai-inventory-tool-across-north-america.html
[9] Boston University Questrom School of Business, “Moving Beyond AI Pilots: What Organizations Get Wrong,” April 2, 2026, citing the 2025 MIT Project NANDA study. https://www.bu.edu/questrom/?p=26501
[10] Thomson Reuters Institute, “AI washing meets marketing rule, as SEC fines two advisers for their AI claims,” March 26, 2024. https://www.thomsonreuters.com/en-us/posts/investigation-fraud-and-risk/ai-washing-enforcement/
[11] The Race to the Bottom, “Artificial Intelligence or Illusions: The SEC’s Crackdown on Misleading AI Claims,” March 31, 2025. https://www.theracetothebottom.org/rttb/2025/3/31/artificial-intelligence-or-illusions-the-secs-crackdown-on-misleading-ai-claims