Token Overdrive Is Turning AI Expensive
AI's growing sophistication is pushing operational costs skywards, as developers and startups grapple with soaring token usage, escalating compute demands, and shifting business models.
At the heart of the challenge lies a paradox: while the per‐token price of AI continues to decline, the number of tokens consumed for meaningful tasks has surged dramatically, especially where reasoning-intensive workflows dominate. Tasks once considered lightweight now demand tens or even hundreds of thousands of tokens-such as multi‐step agent workflows or deep document analysis-leading to billing shocks for many users.
This trend is eroding profit margins and forcing pricing overhauls. Notion, for instance, now sees about ten percentage points of its margin diverted to AI costs alone. Coding tool providers like Cursor and Replit have already shifted from flat‐rate pricing to usage‐based or tiered models, triggering backlash and account churn among users unprepared for the new cost reality.
The rise of“inference whales”-users exploiting unlimited‐use subscriptions with compute‐heavy workflows-is further undermining sustainability. One developer on Anthropic's Claude Code platform generated US $35,000 in inference compute while paying only US $200 a month. Anthropic has responded by introducing weekly rate limits, moving to restrict account sharing and curb misuse.
Underlying this shift is the growing recognition that AI is fundamentally infrastructure‐heavy, not software light. Unlike traditional SaaS, which may enjoy near‐zero marginal cost per user, AI firms face substantial outlays in compute infrastructure, chip procurement, and energy. It's estimated that the AI industry may require US $3.7 trillion in cumulative investment in data centre infrastructure, a figure that underscores the scale of the challenge. Major players with vertical integration-those controlling chips, cloud, and energy-are better positioned to sustain this shift; smaller firms are much more exposed to cost pressures.
See also Google to Pause AI Workloads During Power ShortagesIn response, the SaaS industry is embracing the shift toward usage-based pricing. For instance, companies such as Vercel, Replit, Monday. com, and ServiceNow are layering token-based billing or credit systems atop seat-based models. This reflects investor preference for pricing aligned to consumption, even though it introduces unpredictability for both customers and providers.
The market dynamic is also evolving away from aggressive price-cutting. Leaders like OpenAI and Anthropic, who initially engaged in steep token price reduction, are now refocusing on profitability over growth and are foregoing further discounts. This strategic pivot threatens the viability of startups reliant on low-cost AI, which may now need fresh capital to scale sustainably.
Amid mounting compute bills, efficiency has become paramount. Some startups are exploring token compression techniques-such as BotVibe. ai's BotSpeak protocol-which can lower usage by up to 40 percent while also improving latency and throughput. Similarly, multi-model workflows-employing cheaper models for simple steps and reserving premium models for complexity-offer a way to manage costs without sacrificing quality.
Hardware efficiency gains and continued token price declines remain part of the story. Per-token rates have dropped by as much as 75 to 90 percent in some cases, yet the explosion of reasoning-heavy models has rendered these gains insufficient. Generative AI tasks now consume far more compute per request, offsetting cost reductions.
In regions with constrained resources, local deployment of LLMs is emerging as an accessible alternative. A study involving 180 developers, students and AI enthusiasts in India shows that community-driven, on-premise models can reduce usage costs by about 33 percent, while enabling more experimentation and learning-offering a possible route to resilience in cost-sensitive environments.
See also Human-Tutored AI Learning Physical SenseThe fallout from this shift is reshaping the AI economy. Large players, with control over the full stack and deeper pockets, can absorb cost pressures. Smaller developers and startups face pressure to optimise, innovate economically, or risk being edged out as the AI industry matures and evolves toward sustainable, usage-aligned economics, rather than unchecked expansion.
Notice an issue? Arabian Post strives to deliver the most accurate and reliable information to its readers. If you believe you have identified an error or inconsistency in this article, please don't hesitate to contact our editorial team at editor[at]thearabianpost[dot]com . We are committed to promptly addressing any concerns and ensuring the highest level of journalistic integrity. Legal Disclaimer:
MENAFN provides the
information “as is” without warranty of any kind. We do not accept
any responsibility or liability for the accuracy, content, images,
videos, licenses, completeness, legality, or reliability of the information
contained in this article. If you have any complaints or copyright
issues related to this article, kindly contact the provider above.
Most popular stories
Market Research

- Microgrid Market Growth, Key Trends & Future Forecast 2033
- Nickel Market Estimated To Exceed USD 55.5 Billion By 2033
- Primexbt Launches Empowering Traders To Succeed Campaign, Leading A New Era Of Trading
- Chaingpt Pad Unveils Buzz System: Turning Social Hype Into Token Allocation
- Ecosync & Carboncore Launch Full Stages Refi Infrastructure Linking Carbon Credits With Web3
- Japan Halal Food Market Size To Surpass USD 323.6 Billion By 2033 With A CAGR Of 8.1%
Comments
No comment