Large language models (LLMs), the technologies that power most generative and agentic AI solutions, are powerful. But they can also be very expensive.
To make matters worse, predicting and tracking LLM spending can be challenging, due largely to the fact that there is typically no way to know exactly how much a query will actually cost until it’s complete.
The good news is that there are effective ways for IT leaders to rein in unnecessary LLM costs. CIOs must identify how LLM spending can bloat AI budgets and learn how to spot the signs that their business is paying more for LLMs than it needs to. Only then can they take actionable steps to mitigate unwarranted LLM expenditures.
What paying for an LLM gets you
LLMs are the life force powering virtually every modern generative or agentic application.
When a chatbot needs to respond to a user’s question, it submits the question to an LLM to generate a response. When an AI agent is tasked with implementing a feature within a software application, it uses an LLM to evaluate existing application code, then produce new code compatible with it. When an employee uses AI-powered search to find information in a knowledge base, an LLM is working behind the scenes to interpret the user’s search terms and create a response that identifies relevant documents. From an operational perspective, the ability of LLMs to handle open-ended tasks or queries like these is a great thing. It’s what makes a single AI product capable of addressing a wide range of use cases in a flexible, scalable way.
From a financial perspective, however, LLM activity can present some real challenges. This is because every time an AI application or agent interacts with an LLM, there is a cost — and when your business’s AI applications and services are engaging with LLMs millions of times per day, the spending adds up.
How much does an LLM cost?
The cost of using an LLM is determined by two main factors:
-
Token price: Businesses that sell access to LLMs (like OpenAI and Google) price their services based primarily on how many tokens their customers consume when interacting with their LLMs. Currently, major AI vendors charge anywhere from about $0.25 to several dollars per million tokens consumed, with more advanced models having higher token prices. Some vendors price input tokens (meaning tokens associated with data fed into an LLM) separately from output tokens (which are consumed when LLMs generate data).
-
Tokens consumed: Every time an LLM handles a request, it processes a certain number of tokens. Longer, more complex queries require more tokens. A rule of thumb is that every 75 words of text processed by an LLM requires about 100 tokens; however, this is a very rough guideline and it doesn’t account for non-textual processing work by AI models, like image and video interpretation or generation.
So, to figure out how much you’ll pay to use an LLM, you have to know both your per-token cost and how many tokens you’re using. The former variable is easy enough to ascertain in most cases because AI vendors usually are transparent about their token pricing. Predicting how many tokens you’ll consume is where things get tricky because it’s often impossible to know ahead of time exactly how many tokens an AI application will expend when completing a given task.
If you’re off by just a small amount, that error will quickly compound when applied to thousands of daily AI tasks. Just like that, a planned budget can prove obsolete.
Real-world examples of LLM costs
Despite this unpredictability, it’s possible to get a very rough sense of how much LLMs cost for various tasks.
Here are some examples, based on pricing data tracked by YourGPT:
-
Producing a 1,000-word document in response to a 50-word prompt costs around $1.35 using popular general-purpose models, like Open AI GPT-5.
-
Generating 100 lines of code costs roughly $2.00.
-
Creating a 1000×1000 pixel image (which requires around 1300 tokens) costs about $0.20.
These fees are small on an individual basis. But you don’t need to be a CFO to understand that they can add up quickly within an organization that uses LLMs all day long to produce text, code and multimodal media.
On top of this, businesses are increasingly deploying AI agents, which can lead to even higher LLM spending because it’s common for an agent to interact with an LLM multiple times to complete a single task. For instance, a software development agent might use an LLM to interpret an initial prompt, then generate code in response to the prompt, test the code, generate additional code to fix the bugs discovered during testing, and finally validate the code again.
Each of these engagements requires token usage, and the total cost could easily climb into the hundreds of dollars for generating just a small amount of code. At scale, that spending can become staggering; reports are already circulating of individual developers racking up LLM bills as high as $150,000 per month when using AI agents to help them produce code.
What about private or self-hosted LLMs?
It’s important to note that not all AI applications depend on third-party LLMs. Businesses can, if they choose, develop and deploy their own self-hosted LLMs. In that case, there are no token charges because there is no third-party AI vendor to impose them.
That said, deploying private LLMs is a relatively uncommon practice due to the complexity of creating and operating LLMs, not to mention the massive infrastructure necessary to run a powerful, large-scale LLM.
Even when companies can and do run their own LLMs, instead of connecting to third-party models, they still face major costs. They have to pay for the servers that host the models, as well as the electricity consumed by those servers (and the cooling systems that keep the servers from overheating).
The point here is that even if your company were to deploy a private LLM — which is probably not practical in the first place — it would still end up facing a large bill. The only difference between this approach and using a third-party LLM is that the bill would take the form of infrastructure and power spending, rather than token costs.
The challenges of managing LLM spending
Beyond the relatively high prices of LLMs, businesses face several challenges specific to LLMs and AI usage that further complicate their ability to rein in LLM spending:
-
Cost unpredictability. As noted above, it’s typically very difficult to estimate exactly how many tokens it will take to complete a given task using an LLM, so you often don’t know the cost until you’ve already incurred it.
-
Dynamic pricing. Token pricing can change anytime, making it challenging to forecast LLM costs over the long term.
-
Limited user spending awareness. AI end-users within an organization often have a limited understanding of how LLMs are priced or how user activities impact total spending.
-
Lack of FinOps tools for LLMs. While FinOps (the practice of managing cloud spending in general) offers mature solutions for keeping track of and optimizing spending on other types of services, FinOps tooling that is tailored specifically for LLMs currently remains quite primitive.
Given these challenges, even companies that have a solid track record of managing technology costs in other domains might struggle to avoid unnecessary or unexpected LLM spending.
Effective tactics for controlling LLM costs
Fortunately, although there is no simple formula to follow for managing and optimizing LLM costs, actionable steps are available for reducing spending without undermining the value that LLMs create.
Key tactics include:
-
Choosing lower-cost LLMs: Token costs can vary widely between different LLMs, with more powerful models typically costing more. Not every task requires the latest, greatest model, however. To save money, organizations can submit prompts to lower-cost models when the prompt complexity is limited, or when there is greater tolerance for inaccurate responses.
-
Comparing LLM vendor pricing: Pricing for LLMs can also vary between AI vendors, even when the models are comparable in quality (especially at present, when AI companies vying to capture market share may underprice some of their models in a bid to attract users). Thus, shopping around to find the best pricing for the type of model you require can help to cut costs.
-
Response caching: Response caching is the practice of storing an LLM’s response to a given query, then reusing the response when the LLM receives similar queries. This avoids the output token cost required to generate a new response each time.
-
Prompt libraries: Prompt libraries are collections of validated or “approved” prompts that are known to be efficient in terms of token costs, that human users or AI agents can draw from when interacting with LLMs.
-
Prompt compression: External tools can compress or “trim” prompts by stripping out extraneous information prior to submitting them to an LLM. By reducing input tokens, this practice can save businesses money, especially in cases where users are not adept at optimizing prompts on their own.
-
Query batching: Some LLMs offer discounts of as much as 50 % off standard token costs when customers submit queries in batches. This approach isn’t viable for LLM use cases that require immediate responses to prompts, but it can be a great way to save money when it’s feasible to submit a series of queries to an LLM at the same time. For example, if you want to generate documentation, you could submit a batch of prompts — one for each topic you wish to document — instead of submitting the prompts one by one.
-
Limiting token allowances: When interacting with LLMs via APIs, it’s typically possible to configure the maximum number of output tokens that a model is allowed to use when serving a request. This creates the risk that a model may generate an incomplete response because it hits the token limit, but it also prevents situations where spending on an individual response runs out of control.
Bottom line
Ultimately, LLMs only create business value if the productivity gains they enable outweigh the cost of accessing or operating LLMs. That’s why it’s critical for enterprises to approach LLM selection and usage in a cost-effective way, by being strategic about how they leverage LLMs.

