Your AI Just Got Catphished

By Mo Sadek
December 1, 2025

Secure your AI Systems

TL/DR

ActiveFence researchers found that AI email connectors do not properly validate an email sender’s identity
An AI-enabled connector misinterpreted the user-defined display name as the user’s true identity, bypassing traditional enterprise security mechanisms around e-mail, which increases risks associated with impersonation and phishing attacks.
This gap highlights a key issue with LLMs, especially in enterprise use cases, where general-purpose models lack the context to perform tasks by default.

My parents told me I could be anything when I grew up. It turns out that with the help of AI, I can be anyone, including our CEO.

Looking forward to an awesome vacation – thanks, Noam!

We recently tested something simple: Could we trick AI email assistants into thinking we were someone else? Turns out you can…just by changing your display name.

If you’re a CISO, this is where your eye starts twitching. You’ve been fighting the email security battle for years — training employees to hover over links, implementing SPF, DKIM, and DMARC, running tests, and conducting simulations. And, sure, you’ve had to pull some people aside for the stern, one-on-one conversations, but honestly? You’ve made real progress. Your team knows to verify sender addresses; they’re vigilant and report most issues.

But now, AI connectors – like those provided by OpenAI and Anthropic – are integrating with our favorite apps and email providers, acting as digital employees that promise to boost productivity. Amazing! However, these new digital employees are not necessarily trained in security protocols, don’t know the full context, and in their well-intended focus on making the employee’s life easier, can create misunderstandings that could lead organizations into a rabbit hole of trouble.

How it Works: The Technical Reality

When you receive an email, you’re probably used to seeing this at the top:

Message-ID: <[email protected]> 
Date: Fri, 21 Nov 2025 17:45:28 -0500 
From: John Doe <[email protected]> 
To: [email protected]

While this is not the full header, it is what you are exposed to. When using a connector, the LLM sees the same thing you do. Now, this isn’t a bad thing; however, a display name like John Doe is an arbitrary value. If years of input injection have taught us anything about user input, we probably shouldn’t trust it. And years of phishing trainings have educated users on how something that doesn’t pass a sniff test. Case in point:

Message-ID: <[email protected]> 
Date: Fri, 21 Nov 2025 17:45:28 -0500 
From: YourCEO ([email protected]) <[email protected]> 
To: [email protected]

Traditionally, the industry has relied on email authentication protocols such as SPF, DKIM, and DMARC. While AI knows this, it does not understand this in practice when being invoked through a connector. So, when it sees a display name that’s formatted the way it’s expecting it to look like, the LLM will ignore the actual origin domain during summarization.

This is an issue with LLM reasoning and the default behavior being applied across broad use cases. We tested this across multiple models and connectors, but found the same results every time – LLMs are taking e-mails at face value and not validating any of the headers.

We did have a few cases where having the domain in the display name was noticed by the LLM, but this can be bypassed by instructing the LLM to ignore the headers during summarization:

What the e-mail actually looked like (attacker domains redacted)

What the e-mail actually looked like (attacker domains redacted)

Why It’s Important in the Real World

The Trust Model Failure

This attack represents a fundamental failure in the trust model, turning a trusted (and highly anticipated) productivity accelerator into an unwitting accomplice for phishing. The user no longer looks at email headers, sender names, and addresses – they’re trusting the AI’s “clean” summary. The AI becomes a sanitizer, accepting “dirty” input (spoofed display names), stripping suspicious metadata (the actual sender), and presenting “trusted” output.

As our CISO put it, this is AI-assisted impersonation, and it makes years of email security awareness training obsolete for this workflow.

The Nightmare Scenario: When AI Has Agency

If, in our current state, AI is being used to read your emails and tell you what’s there so you, the user can take action – the future is much more autonomous. Very soon, AI agents will not only read the email, but also take action.

Imagine the scenario where a finance manager uses an agent to create a task list based on specific emails – say from their CEO. An attacker sends spoofed emails from the “CEO”, requesting to update an account’s wire information to a different one. This task gets added to the list and gets assigned for completion. In a world where agentic workflows are key productivity boosters, this action could realistically be completed by an agent which updates a database on the user’s behalf.

This is just one example. There’s no need for phishing links or obvious red flags when the AI is unintentionally scrubbing out indicators of fraud.

Why Old Defenses Don’t Apply

The security industry has invested significant resources in anti-phishing efforts. Training, gateways, detection, reporting. It’s necessary. Phishing remains the top attack method, and with AI impersonation and deep fakes making it easier than ever – incidence rates have risen 49% since 2022.

But AI-mediated attacks are different:

Traditional defense: Train users to inspect suspicious elements
AI-mediated: AI removes those elements before users see them
Traditional security: Analyze headers, reputation, and content
AI-mediated: The threat is in how AI interprets legitimate headers (or doesn’t)

As AI assistants become more common at work, security training needs to level up. It’s no longer about spotting the fake, it’s about understanding the signals and interpreting them contextually so that a fake is exposed for what it is.

So What’s a CISO to Do?

Security solutions do not have the luxury to lag behind AI-enabled attackers. Tools relying on traditional methods, such as signatures or reputation checks, are obsolete. Organizations must employ modern solutions to address these modern problems.

Between you and me, I’m not a CISO (nor do I want to be), but we had a chance to ask our CISO, Guy Stern what he would do:

How do we prepare a workforce for GenAI?

GenAI means that “Traditional” awareness training must focus on business process integrity, such as always verifying high-stakes requests through a separate, trusted channel.

What about our current set of tools? How do they serve against these attack vectors?

Any tool still relying on “old-school” signatures or simple reputation is obsolete. This means investing in AI-native email security (ICES), EDR, XDR platforms that can identify anomalies, rather than just matching signatures.

Build or Buy, Guy?

Buy for the core AI model, build the specific use case. CISO’s “build” effort in this specific area should be focused on the integrations and automation playbooks that are unique to each environment.

Are native security controls from enterprise LLM providers sufficient?

No. They are a generic baseline, not a complete solution. “Good enough” for everyone means they aren’t great for high-value assets, an organization’s unique compliance needs, or the executive risk profile.

For a problem like AI-phishing, is it worth building a layer between our users and the LLMs for more control?

Yes, a dedicated control layer is becoming a strategic necessity.

For AI-phishing, you could have a “choke point” – or what the industry is now calling an “AI Firewall”— between users and LLMs to enforce policies. This does take time to build, from understanding the types of use-cases, establishing baselines around default, expected, and anomalous behavior, to configuring logs for quality and incident response. This space acts as a zone where an organization can validate the sender’s identity before the LLM sees the prompt, or block the AI from processing high-risk requests based on rules.

The Bigger Picture: Summary and Takeaways

This email vulnerability is a window into a broader category of risk: AI systems inheriting trust relationships they’re not equipped to validate. And the attack surface is only expanding:

Can AI coding assistants that pull from repositories distinguish legitimate internal libraries from malicious packages with similar names?
Can AI customer service agents that access records properly validate identity before disclosing sensitive data?
Can AI analytics querying databases understand when a query is actually an injection attack in natural language?

We’re connecting powerful AI systems to critical infrastructure faster than we’re building the necessary security context. The email sender impersonation we discovered isn’t an isolated issue. It’s a pattern. LLMs optimize for convenience at the expense of security validation.

The industry has made this mistake before. Prioritizing functionality over security, then scrambling to patch the security gaps later. With AI, the stakes are higher because the systems are more autonomous and the attack surface is less visible. As we continue to lean in on integrations and connectors from our AI providers of choice, we must treat them with equal parts optimism and skepticism.

As someone’s Uncle Ben used to say, “With great AI comes great responsibility.”

Safeguard your AI Systems

Talk to an expert

Your AI Just Got Catphished

TL/DR

How it Works: The Technical Reality

Why It’s Important in the Real World

The Trust Model Failure

The Nightmare Scenario: When AI Has Agency

Why Old Defenses Don’t Apply

So What’s a CISO to Do?

The Bigger Picture: Summary and Takeaways

Table of Contents

Related Content

Play Safe: 5 Principles for Designing Safe AI Companions for Gaming

ActiveFence AI Security Benchmark Report Summary

What is AI Safety and Security?

Play Safe:
5 Principles for Designing Safe AI Companions for
Gaming