Microsoft Incident Response researchers have uncovered a novel attack vector targeting AI agents that operate on a user's behalf. The method exploits the Model Context Protocol (MCP) by inserting poisoned tool descriptions, allowing attackers to hijack an agent's actions without breaking any explicit rules. The agent, acting in full compliance with its programming, can be made to quietly transfer sensitive company data to an external party.
The attack is particularly insidious because nothing appears anomalous to monitoring systems. Every step the agent takes looks routine within a default configuration, so no security alarm fires. This low-and-slow exfiltration technique could allow attackers to siphon data over an extended period without detection.
The technical mechanism involves manipulating the semantic descriptions that AI agents use to select appropriate tools. By subtly altering these descriptions, an attacker can steer the agent toward malicious tools or actions that the agent incorrectly believes serve the user's legitimate request. The agent never violates any rules, making detection extremely difficult in standard setups.
Microsoft has not released a specific patch for this vulnerability, as it stems from a fundamental design assumption in current AI agent architectures. The researchers recommend organizations review their MCP implementations, monitor agent behavior for unusual patterns, and consider adding validation layers for tool descriptions. For high-sensitivity environments, human-in-the-loop approval for certain agent actions remains a practical mitigation.
Counter-argument: Some security experts argue that the attack's practical risk is limited, as it requires an attacker to first gain the ability to modify tool descriptions—a privileged access that would already imply significant compromise. The research may overstate the likelihood of this scenario in well-secured environments.