SPOTLIGHT: AI tokens
What they are, what they cost, and why it matters
AI TOKENS: What they are, what they cost, and why it matters
Brought to you by the
A token is the basic unit an AI model uses to read what you're asking and to write what it says back.
Tokens are the building blocks of AI language: They can be whole words, partial words, or just punctuation. The sentence "Hello, how are you?" breaks into six tokens: "Hello", ",", "how", "are", "you", "?".
AI models don't "skim" anything: Unlike a human reader who might skim a long document for the relevant parts, an AI model reads everything you send it, every word, every comma, every extra space. (Imagine, if every time you got dressed in the morning, you had to take out and review each and every article of clothing you owned!)
The types of AI tokens: There are two kinds of tokens, and both cost money. Input tokens are what you send: your question, your instructions, any documents you paste in. Output tokens are what comes back. You pay for both directions.
There's a third type, in addition to input and output tokens: When an AI model processes a prompt, it stores a representation of the tokens it already worked through, called cached tokens. On a subsequent request that includes the same content, the model can pull from that stored state instead of reprocessing it from scratch. Cached tokens cost significantly less than fresh input tokens.
Tokenization is how AI models translate data into tokens: If the tokenization process is efficient, the amount of computing power required to complete the process is reduced.
Some models generate far more tokens than others: Newer "reasoning" AI models think through problems step by step before answering, generating extra tokens behind the scenes as they work. That thinking process can require more than 100 times the compute of a standard response.
Every AI token processed by an AI model costs money. Often, the costs associated with AI tokens add up faster than most organizations expect.
Token rates are not fixed: All major AI APIs, including OpenAI and Anthropic, charge per token on both sides of the conversation. The rate isn't fixed. More powerful models and more complex requests cost more.
The price range is wide: Simple tasks on basic models can run a few cents per million tokens. Premium models handling complex work can cost anywhere from $20 to over $100 per million. Anthropic's flagship Claude Opus 4.6 is priced at $25 per million output tokens.
Output tokens are more expensive than input tokens: What you send in and what comes back are not priced equally. Output tokens typically run 3x to 4x more expensive than input tokens, which means a "bloated" prompt costs you twice: once for the unnecessary input, and again for the rambling response it tends to trigger.
More sophisticated models are pricier: Running a query through OpenAI's reasoning model o1 costs six times more than running the same query through GPT-4o.
Cached token pricing can get complicated: Sometimes, cached tokens are priced differently. For example, Anthropic breaks down cached tokens into different types, and prices them each differently.
AI vendors are charging for their products in increasingly varied and complex ways. Knowing the difference matters when signing contracts and drawing up forecasts.
There's no single standard for how AI gets priced: Models in use today include subscription, usage-based, outcome-based, performance-based, freemium, tiered, hybrid, and agentic seat pricing, among others.
Freemium pricing is commonly used to drive user adoption and upsell to premium plans: “Freemium” pricing offers basic access to an AI product for free with advanced features, usage limits, or business capabilities gated behind a paid tier – like the basic versions of ChatGPT or Claude.
Pricing can be opaque, and hard to compare: Hybrid pricing layers a base subscription on top of usage charges, which means buyers are often managing more than one billing structure per tool. Microsoft Copilot appears to cost $30 per user per month, but that price only applies if the user already has a Microsoft 365 subscription, making the real all-in cost substantially higher. Agentic seat pricing charges per AI agent deployed, the way traditional software charges per employee login.
Some vendors are walking away from token pricing entirely: Adobe is moving its new enterprise AI suite to a model where customers pay based on what the AI actually accomplishes, such as how many ad campaigns it completes, rather than how many tokens it processes. That kind of outcome-based pricing puts more risk on the vendor, who now has to demonstrate real results, and requires more rigorous measurement on both sides of the contract.
AI is pushing up software prices overall: Across vendor categories, AI has pushed software prices up by 20% to 37%, regardless of whether customers are getting meaningfully more functionality. Some researchers are calling it an "AI tax." Vendors are folding AI features into existing products and raising prices without giving customers a way to opt out, then crediting the increase to AI innovation.
The promise of enterprise AI has always been clear: automate more, serve customers better, reduce cost. What was never advertised in the fine print was the bill."
Token bloat is what happens when an AI system processes far more information than it actually needs. It is one of the most common (and least visible) sources of runaway AI costs.
Where does token bloat come from? Token bloat is the extra cost generated by unnecessary information fed to a model that doesn't make the output any better. It's often the result of anxiety. AI users and developers worry that if they don't give the model everything, it will get things wrong, so they paste in entire documents, database files, and old error logs, most of which the model doesn't need.
The problem can compound itself: A small amount of waste compounds fast. A prompt that is 20% less efficient than it should be can produce costs 200% higher than expected, as the agent hallucinates, self-corrects, and burns through tokens just trying to make sense of the noise.
Token bloat can be a surprise to organizations: Most organizations don't see token bloat until the monthly invoice arrives, by which point there's no way to trace which tool, feature, or prompt was responsible.
At scale, the cost of token bloat can become staggering: Wasting a few hundred tokens per interaction barely registers at small usage volumes. At enterprise scale, the same inefficiency can become a serious financial liability.
Token costs have become a real budget line item, with results ranging from competitive advantage to blowing past budget before the year is half over.
Organizations are seeing unexpected costs associated with AI usage: In 2025, organizations spent an average of $1.2 million on AI-native applications, more than double what they spent the year before, according to Zylo's 2026 SaaS Management Index. Nearly 8 in 10 IT leaders say they have been hit with unexpected charges tied to consumption-based AI pricing, according to Zylo.
What happens when AI spending increases, but revenue doesn't? Venture capitalist Chamath Palihapitiya says his software startup's AI spending has more than tripled since late 2025, heading toward $10 million annually. The problem, he says, isn't the spending itself but that costs are growing 3x every three months while revenue isn't keeping pace.
Some companies are blowing through their annual AI budgets: Uber burned through its entire planned AI budget for 2026 within the first few months of the year, after encouraging engineers to use AI coding tools aggressively and tracking usage on internal leaderboards. The company's R&D spend rose 9% to $3.4 billion in 2025 and is expected to keep climbing.
Uber's experience illustrates how fast adoption can outrun budgets: The company actively encouraged engineers to embrace AI tools and measured their usage publicly, which drove widespread adoption and an equally widespread spike in costs.
If you build it, the revenue will (hopefully) come? Finance leaders are currently more permissive than usual about AI spending, willing to absorb costs they can't fully justify yet on the bet that efficiency gains will eventually catch up.
Experts warn AI token costs won't stay in the engineering department for long: As more workers in legal, sales, and other knowledge-work functions start using AI agents, their compute costs will climb right alongside those of the IT and engineering departments.
Token pricing looks manageable in a demo. A few cents per thousand tokens. A flat monthly subscription that seems reasonable per seat. But the math at enterprise scale tells a different story."
At a number of tech companies, how much an engineer spends on AI tokens has become a badge of honor, with the underlying belief that more tokens means more productivity.
Running up token charges could get you a … round of applause? Databricks CEO Ali Ghodsi held up a single engineer who ran up more than $7,000 in token charges over two weeks as a model employee, and got a round of applause from the entire engineering department for it.
Or, your token consumption might mean extra vacation time: Customer support startup Sendbird has built a formal ranking system around token consumption, from "Beginner" up to "AI God," a title reserved for engineers burning through at least 100 million tokens per day. Perks for heavy spenders include gift cards, swag, and eventually extra vacation time.
Experts warn that "token volume alone doesn't measure productivity": While token volume signals AI-generated work, what matters most is how much of the AI-generated work actually makes it into a finished product.
Finance leaders are starting to notice: The creator of one developer tool recently observed that CFOs are waking up to the reality that some engineers are now generating an extra $2,000 a month in AI charges on top of their salaries.
That $500,000 engineer, at the end of the year, I’m going to ask him, ‘How much did you spend in tokens?’ … If that $500,000 engineer did not consume at least $250,000 worth of tokens, I am going to be deeply alarmed."
Token costs can be managed, but it requires intention. Most organizations are not yet approaching this systematically.
Send models only what they need: The core principle of token cost management is sending models only what they actually need. That means pulling in relevant information dynamically rather than dumping entire documents into the context, asking for specific output formats rather than open-ended responses, and using lighter models for tasks that don't require heavy reasoning.
Give the model parameters: Simply telling the model how to format and how long to make its response can cut token usage by 60-80%.
Prompt caching is one of the most effective cost controls available: When a system prompt or document stays the same across many requests, the major AI providers can store it so it doesn't have to be reprocessed each time. Prompt caching can cut costs by 50% to 90%, depending on the model.
The Token Tax: Why AI's Hidden Cost Problem Is Becoming a Board-Level Issue, via LinkedIn
Understanding Token Bloat: How Poor Prompts Increase Bills?, via Cloudatler
Token Bloat: Managing LLM Output Size and Cost in Cloud Environments, via Medium/Saif Ali -
What Are AI Tokens? The Language and Currency Powering Modern AI, via NVIDIA
Prompt Caching with OpenAI, Anthropic, and Google Models, via PromptHub
Adobe Plans Outcome-Based Pricing for New AI Product Suite, via PYMNTS
The 'AI Gods' Spending As Much As They Can On AI Tokens, via Forbes
Box CEO Flags Soaring AI Costs As Usage Expands Beyond Engineers, via Yahoo Finance
The cost of compute: A $7 trillion race to scale data centers, via McKinsey
Uber's Anthropic AI Push Hits A Wall, via Yahoo Finance
AI Data Centers Are Sending Power Bills Soaring, via Bloomberg
Rising AI software costs put CFOs in the middle, via Yahoo Finance/CFO Dive
AI Pricing: What's the True AI Cost for Businesses in 2026?, via Zylo
Chamath Palihapitiya said his software company is moving away from Cursor, via Business Insider
What are input and output tokens in AI?, via engineering.com
Understand tokens, via Microsoft
What are cached tokens?, via Anthropic
Prompt Caching Documentation, via Anthropic
Cached Tokens Explainer, via MindStudio