Have you ever found yourself on a website, unsure how to navigate through the pages, links, forms and menus to get what you want done? Maybe it’s the front page of your healthcare provider’s site. There’s a lot of stuff there, but maybe it’s not so clear which thing you’re supposed to click on to book an appointment.
Or maybe it’s your favorite entertainment site. There are lots of movies listed by genre, and some recommendations based on past viewing, but how do you find that 80s Brat Pack movie with the title you can’t quite remember?
For three decades now — since Tim Berners-Lee first gave us the World Wide Web, with its Hypertext Markup Language — the paradigm of web usage has remained largely the same. We read through pages. We click on links. We fill out forms and navigate menus. This interface has defined how we access information, buy products and communicate with each other. But that era is ending — not because the web is disappearing, but because the web front end is about to be rewritten.
We now have something better, and that something is the use of AI agents.
The advantages of multimodal AI
These agents are not the fully autonomous, do-anything agents of some science-fiction future. We’re not talking about Tony Stark’s J.A.R.V.I.S. from Iron Man. We’re talking about agents that are readily buildable with today’s technology. Rather than being do-anything agents, these are task-specific, designed and engineered to accomplish a single task or a small, related family of tasks. Rather than being fully autonomous, these agents are assistive and conversational with a human — us — in the loop.
For example, to book that appointment with your doctor, you would engage in a conversation with a doctor’s appointment-booking AI agent. You would let it know your preferences, and it would offer available slots. After some back and forth, you’d settle on a date and time, and it would book the appointment and send you a confirmation.
Rather than navigating a website, it’s like you have a conversation with a person over the phone. But better.
It’s better because while a phone conversation is limited to voice, your conversation with the agent could include graphics. For example, the agent might show you a calendar with available slots highlighted, maybe superimposed on your personal calendar.
And the movie-selection agent — with a short conversation, maybe with some actor images — would have no trouble finding that Brat Pack movie from the 80s for you.
So these agents are not only assistive and conversational; they are also multimodal, using speech, text, graphics and interactive elements. They can also be hyper-personalized. For example, that appointment-booking agent might already know that you prefer to schedule appointments on Tuesday mornings.
Re-envisioning the front end of the web
Think of these agents as your new and improved front-end interface to the web. The browser-based web is giving way to something more dynamic: an interface that engages in conversation, understands context and completes tasks. These agents won’t replace the web, but they will change how we use it.
This shift has major implications. In the 1990s, a new industry was born around the design, building and maintenance of websites. Website user experience design became a coveted new skill set. We are now entering a similar phase with AI agents. A new economy will form around designing, building and managing these agentic interfaces. Designing great conversational interfaces is a whole new skill set. Those who have those skills will be in great demand. Businesses will compete on not just content or products, but also the quality of their agentic interfaces.
Big tech companies are already positioning themselves. The big foundation-model providers (e.g., OpenAI, Anthropic and Google) are developing general-purpose agents aimed at becoming a one-stop AI platform — the AI front end to the entire web. But creating great conversational interfaces isn’t easy, and domain expertise will be a critical differentiator.
In our examples, the healthcare provider and the entertainment company should be able to provide better conversational interfaces for their customers and their use cases than a generic do-anything agent. The businesses that embrace customer intimacy will have an edge.
To do this, businesses need to consider two things: the front-end user experience, as well as the back-office infrastructure. On the front end, those building these AI agents need to consider design and user experience with the same thoughtfulness and attention to detail as the best websites and mobile applications today.
When it comes to the underlying infrastructure supporting this new web that is centered on agents, we will need more than just GPUs. While the first wave of chatbots and GenAI applications required GPU-centric architectures, AI agents are full applications with multiple components. AI will no longer be the app; rather, it will be a part of the app. This means we will need a more balanced infrastructure stack: GPUs for model execution, CPUs for traditional compute, storage for context and retrieval systems like RAG, and robust networking to connect with remote APIs, MCP servers, users and devices. To operate and scale effectively, these systems also demand modern orchestration capabilities such as Kubernetes.
In short, the technology is here and available to us, but it needs to be thoughtfully applied to this new web paradigm to allow us to achieve the promise of AI agents.
We are not moving into a science fiction world of sentient machines. We are moving into a world where helpful, context-aware software becomes the norm. The transition won’t happen overnight, but it’s already begun. The web we know isn’t going away, but it is being fundamentally transformed. Done right, this transformation could change all our lives for the better, enabling us all to interact more with the physical world and each other while agents roam the digital one.

