Many companies are using more than one AI on the enterprise side, yet consumer software applications typically embed only one. For example, Microsoft Office applications on personal and family subscription plans offer only Copilot but the company includes OpenAI, DeepSeek and other AI models in its model catalog for Azure AI Foundry. Recently Microsoft announced that people will soon be able to run DeepSeek R1 locally on Copilot + PCs, too. Weirdly, they announced that despite being in the midst of investigating DeekSeek’s potential abuses of Microsoft’s and partner OpenAI’s services. But it’s not just Microsoft that appears conflicted about distributing AI models and tools. Many other companies are, too. What the derp is going on here?
“As tech giants race to build larger language models, enterprises are quietly revealing an uncomfortable truth: LLMs are becoming commoditized workhorses, not differentiated solutions,” says Brooke Hartley Moy, CEO and founder of Infactory, a generative AI-based fact-checking firm.
So, what does that mean in the scheme of things? Companies are using large language models (LLM) as utilities instead of as panaceas.
“Companies are building sophisticated AI stacks that treat general-purpose LLMs as foundational utilities while deploying specialized AI copilots and agents for coding, design, analytics, and industry-specific tasks. This fragmentation exposes the hubris of incumbent AI companies marketing themselves as complete solutions,” Moy adds.
Meanwhile, AI tools embedded in consumer software are commonly and quietly beefed-up with additional AI models underneath in the quest to deliver a true brand differentiator.
And together that’s why using or offering multiple AI models are trending across tools and applications. But why isn’t one AI model enough?
LLMs Getting Better or Smarter?
One would think that LLMs are improving or getting smarter with each new whirlwind release of new features. But are these models really getting smarter or are they illusions under wrap — uh, wrappers?
Wrappers are code or programs that are literally wrapped around other programs. There are a variety of reasons for doing that. In the case of AI tools, wrappers typically add functionalities to the underlying application like a generative AI chatbot. In some cases, wrappers work so well that they appear to be smarter AIs when actually they just have more or better features.
LLMs themselves are not getting very much smarter with each new upgrade or model release although they are getting better at what they do. Even so, one is quite often not enough to get work done at professional levels.
“The only time it makes sense to use a single, giant, monolithic GenAI model is when you do not know what you are doing because the inputs and goals of the end user, and the outputs and actions to be taken are extremely varied,” says Kjell Carlsson, PhD, head of AI strategy at Domino Data Lab.
“In almost all instances, you can get better performance — cheaper, faster and potentially more secure and more accurate — by leveraging multiple models in tandem. This can take the form of using multiple GenAI models together,” Carlsson adds.
This inconvenient truth isn’t lost on incumbent generative AI providers. Take the search engine Perplexity AI, for example. It was developed over its own models and later added a fine-tuned model combining the speed of GPT-3.5 and the capabilities of GPT-4. Later still, it adds open-source models. Today it is driven by GPT-4 Omni. Claude 3.5 Sonnet, Sonor Large, Grok-2, and both OpenAI’s O1 and DeepSeek’s r1 reasoning models.
Offering a mix of LLMs tends to establish differentiation in solutions more so than a single model can muster. But there’s a price to pay for mixing and matching LLMs too.
“While there’s a benefit to harnessing multiple models, it can also be challenging without the right orchestration. Companies need holistic tools for training, governing, and securing their AI — or risk getting lost in weeds,” says Maryam Ashoori, senior director of product management, watsonx at IBM.
Multimodal Models to the Rescue – or Not
But what of the multimodal models like ChatGPT (GPT 4o), Sora, Gemini, and Claude 3.5 Sonnet — the Swiss army knives of the AI world? Those AI models can work with different types of inputs or outputs — in combo or alone such as text, code, images, video, and voice — like newfangled multitools. Can’t they do everything?
“Multimodality may sound like a remedy for generative AI’s shortcomings in multifaceted processes, but this, too, is more effective in the context of purpose-specific models,” says Maxime Vermeir, senior director of AI strategy at ABBYY. “Multimodality doesn’t imply an AI multitool that can excel in any area, but rather an AI model that can draw insights from various forms of ‘rich’ data beyond just text, such as images or audio. Still, this can be narrowed for businesses’ benefit, such as accurately recognizing images included in specific document types to further increase the autonomy of a purpose-built AI tool. While having multiple generative AI tools may sound more cumbersome than a single catch-all solution, the difference in ROI is undeniable,” Vermeir adds.
But that’s not to say that the behemoth LLMs aren’t useful.
“A big one like Claude, Gemini, or ChatGPT is usually good enough for more tasks, but they can be expensive. It is typically easier to have smaller specialized models that are cheaper to operate, and that you can run on a single machine on-premise,” says RelationalAI’s VP of research ML, Nikolaos Vasiloglou.
“You can always merge two or more specialized LLMs to solve a more complex problem. On the other hand, in many tasks. especially in the ones that require complex reasoning, the small ones cannot reach the performance of the bigger ones, even if you combine them,” Vasiloglou adds.
Why Employees and Other Users Are Using More Than One AI
Employees and consumers may or may not be aware of multiple models underneath their favorite generative AI chatbot. But either way, the savvier users are going to mix AIs on their end of things too.
“It’s common because different models have been trained differently and excel at different tasks,” says Oriol Zertuche, CEO at Cody AI. “For example, Anthropic’s Claude is exceptional at writing and coding, ChatGPT is great for general purpose tasks and speaking to the internet, while Gemini is multimodal with an impressive context length of over 2 million tokens, enabling it to handle video, audio, PDFs and more. Others, like Gemini 1.5, are just okay at everything, so can be used as general purpose GenAIs.”
“This mirrors how businesses use different tools for different tasks, where each one serves a specific purpose. For example, email can be used for internal communication, but there are now many collaboration platforms that enable more immediate and effective communication,” Zertuche adds.
Then there’s the need to pull outputs from specialized models and combine them in other software to produce a unified work such as a research paper, an advertisement, or an ebook.
There’s also a business case for using AI’s according to how well they are suited for specific domain use. For example, models and tools that are specialized in medicine, academic research, film production, finance, or marketing are optimized for tasks, rules, and vocabularies unique to those domains. Even so, one model or tool isn’t likely to be enough.
“By combining models like OpenAI’s o1 for strategy, Anthropic’s Claude for creative writing and Google’s Gemini Deep Research, marketers can achieve a balance of creativity, precision, adaptability, and innovation to scale their impact. Using multiple models also avoids vendor lock-in, ensures access to cutting-edge advancements, and allows for task-specific optimization, which can enhance both efficiency and impact,” says Lisa Cole, CMO at 2X.
Serving a Mess of AIs Daily
Oh, how quickly the AIs pileup after all this activity! In the South, the saying “make a mess of something” comes to mind. It means combining whatever you have on hand to make a meal. AI being embedded in everything is leading to a “mess of something” in companies but the result doesn’t necessarily satisfy everyone’s hunger.
“In every CRM or Event Platform or CMS there seems to be their own generative AI that leads to a different LLM. Some of the issues that arise have to do with convenience. The other issue is data age. AI models can start and end with data that differs per the model. Some have information that is over 3 years old, some have information from the last 6 months,” says Dan Gudema, co-founder of PAIGN AI, a tool which “uses seven AI models to create blogs, images, social posts for lead generation for small businesses.”
Adding to the mess is that all the embedded AIs may be using the same models — or not.
“It’s important to distinguish between using multiple models in the same Generative AI tool — for example, switching between GPT4 and o1 models within ChatGPT — and using different Generative AI tools,” says Verax AI CEO Leo Feinberg.
“Using the different language models in the same tool has multiple reasons, the main ones being that every model has its strengths and weaknesses and therefore different types of queries to ChatGPT may be handled better or worse depending on the model. Using multiple Generative AI tools — which are often powered by different models behind the scenes as well — has somewhat different reasons,” Feinberg adds.
The different reasons behind using different generative AI tools range from user preference to project needs. In any case, there are a lot of AIs lurking about and being used here and there in almost every home, vehicle, and company.
A mess of AI somethings, indeed. So, what happens next?
“We have seen a consolidation in the market with a view of one supermodel, now we are seeing fragmentation and the introduction of purpose-specific models,” says Cobus Greyling, chief evangelist at Kore.ai, an AI agent platform and solutions producer. “For instance, smaller models focused specifically on reasoning, coding, models following a more structured approach or excelling at reasoning. That’s why, model orchestration will become increasingly important in the near future.”