An open-source toolkit for controlling out-of-control AI agents

May 28, 2026

7

A fundamental redesign of our APIs is necessary, but budgets, resourcing, and capacity make this hard to deliver overnight. What’s needed, then, is a way to manage agent interactions with APIs, treating agents as a new class of user, providing and enforcing the policies that are needed to manage agent life cycles. The use of Model Context Protocol (MCP) as a standard wrapper for agent access to APIs helps here, as it gives us a common environment where we can implement the governance layer needed to keep agents under control.

Microsoft recently launched a public preview of its open-source Agent Governance Toolkit (AGT), which is intended to wrap policy-based enforcement around agents, ensuring that calls are evaluated before they’re made. You can think of the toolkit as a way to manage agent actions, rather than controlling the inputs and outputs of the large language models (LLMs) your agents use. Figures from Microsoft suggest that this method of securing agents is far safer than relying on rules in prompts. However, in practice it’s a good idea to run a capability tool like Agent Governance Toolkit alongside traditional filters to trap user errors and prompt-based attacks.

AGT is a set of tools designed to cover OWASP’s list of agentic risks, building on Microsoft’s experience securing its own agents and AI platforms, with more than 13,000 tests built into the toolkit. It works by evaluating actions before they’re run, checking them against your policies, before allowing or denying the action and logging the results. Microsoft expects policy evaluation to take less than 0.1ms per operation, keeping overheads to a minimum.

Policies for agents

OWASP’s top 10 agent risks lists the most significant issues that can disrupt agent operations resulting from user prompts and bad application design. These risks include agent goal hijacking, uncontrolled code execution, insecure output handling, and agents going rogue. Features in the toolkit are designed to protect agentic applications from these and other issues, using isolation and sandboxing, as well as validating outputs using content policies.

Previous articleStop checking AI-generated code. Start generating less of it

An open-source toolkit for controlling out-of-control AI agents

Policies for agents

Related Articles