Previous post | All posts in series
The journey of so many users of generative AI begins with the blank message box. And then it ends there too! The entire world of options is open before a user, and it is too much. It’s easier to go back to Word, Outlook, and Excel then it is to stare down infinite possibility.
Nearly as bad, it’s essentially guaranteed that the first thing a new user tries ends up in a failure. They probably will not be specific enough and get a generic slop answer, or they will try something that the AI didn’t have access to and get a hallucination instead of answers.
Agents address these problems: they give users something to try that gives great results, and they help with useful tasks. Agents turn expensive software licenses into dramatic productivity gains for all information workers.
Agents in Microsoft 365 Copilot are simply a collection of related functionality that are easy to access. When selected, they change the functionality of Copilot Chat. This isn’t always more capability; they often have less data and tools access than the base mode of Copilot. This sounds pretty basic, so what good are they?
AI chat would be too difficult to use without agents, and it would be much less effective. Agents serve three major purposes within Microsoft 365 Copilot: wayfinding, prompt engineering, and accessing external systems.
Chatting with a generative AI is very powerful, but it is also difficult to learn. The first purpose that agents serve is a type of user education.
There’s little more humbling than watching a new user open up Copilot Chat in a user study. The empty chat box and submit button provide very little to “grab onto.” Beyond the suggested prompts, there’s little on the screen that guides the user to what they can do.
This is the first purpose of agents: a familiar handle that is already in the user’s navigation bar (because they were installed by their organization for them, or they clicked a share link from a teammate). When the agent interface is implemented well, there will be a familiar product or purpose visible on the screen, and the user is going to start there.
We call the screen that opens when clicking on an agent to be “focused view.” It continues user education with conversation starters, the prompts to try as suggested by the developer. Once the user is successful with an agent, they can use it again. By using an agent, they are also learning to use the rest of Copilot Chat better.
It’s focused in another way, which is to a specific domain or purpose through knowledge selection and prompt engineering.
The number of tasks that Copilot can help with is astronomical. Due to the immensity of opportunity, you have to be quite specific about what exactly you want to do. Learning to work with AI through an iterative conversation is a good model, but it’s even better to succeed with a short prompt, immediately.
Prompt engineering is pretty hard though, and it takes time. Agents allow for someone with skill in prompt engineering to share their work with their colleagues or customers. Some of the agents I create only require one-word prompts, because I put in the time to make the instructions as effective as possible.
Engineering an agent to be excellent at a skill or domain fits naturally with selecting a subset of grounding knowledge. If you’re an information worker, you have access to millions of pieces of content. It can be hard to find exactly the information you are looking for. Scoping what documents and locations an agent can access makes it (and the user) more successful at the task.
When an agent has a familiar role to the user, has engineered instructions, and only the relevant content, it becomes an amazing specialist at this role’s tasks. A “specialist” is one of the two patterns we see for agents in our user research. The other is a “doer.” A doer agent is able to take actions beyond returning text.
As agents are already focused on a role or task and have a surface to teach users effective use patterns, they’re the ideal place to host tools. Tools to a language model are simply descriptions of programmatic interfaces to another computer system. These tools (also called functions or skills) are critical, as they allow the AI to do things other than send text to the user.
Tools expand what the chat interface can do. The buzzword RAG (Retrieval Augmented Generation) is just one type of tool. Like with the wide scope of available data, there are any number of computer systems and software applications that the user could need to work with.
Focusing an agent on a related set of tools doesn’t only help the user—it helps the model. If you attempt to give an AI dozens or hundreds of tools, its reliability decreases quickly. Having different agents direct the user to the right place to go and limit the AI to a set of tools it can use reliably.
I’m very proud of how agents are helping employees all over the world. But I know that we have a lot of improvements to make in each of their purposes:
I’ll keep working to make agents in Microsoft 365 Copilot even better, but right now they remain the smartest decision a leader can make for their company. Don’t let your licenses gather dust. Forget the training videos. Find or build excellent agents, then pre-install them for your users.
If you need help building agents on any platform, check out my Agent Best Practices series. You’ll get real results, company wide. Agents are not magic, but they are just enough structure to get any user moving and succeeding.
No matter your skill, what matters is whether the buyer is paying attention to what…
How can we fix the bugs in AI models until we understand them? Learn how…
AI art thrives when driven by ideas, not technique. Discover how to make AI-generated media…
AI Agents can communicate like humans, but also in new ways that we've never seen…
The new Agent2Agent protocol enables a very interesting future of the internet. Learn how to…
Will general-purpose AI destroy all other software? Here's the software you can build now that…