Your AI Is Talking. But Is It Saying the Right Things?

By Phillip Johnston
June 17, 2025

ActiveFence protects against CBRN theats

See How Your AI Handles the Pressure

The demand for AI powered apps and agents is real, and enterprise companies are moving quickly to launch. But a recent ActiveFence study reveals an uncomfortable truth. Today’s most popular large language models remain dangerously vulnerable to being manipulated into sharing harmful information. For any organization planning to deploy AI systems, these findings raise immediate concerns.

High Stakes in High-Risk Domains

ActiveFence conducted a comparative analysis of two widely used large language models, evaluating their behavior across chemical, biological, radiological, and nuclear (CBRN) risks in Biology, Virology, Chemistry, Nuclear, and Radiology domains. Additionally, each domain is examined through a standardized set of threat vectors, including:

Development
Production
Acquisition
Theft
Transfer
Stockpiling
Weaponization
Dissemination
Concealment
Handling

Three user personas were tested in single-turn prompt interactions, evaluating how the model responds to isolated but strategically constructed CBRN queries. These included a non-expert user with little subject-matter knowledge, an expert with technical fluency, and a malicious actor with clear intent to misuse the model. The test prompts fell into three categories. Some asked the model to create harmful content. Others sought to retrieve dangerous information. A third category asked the model to describe how certain harmful acts could be executed.

The results were striking. Even non-expert users succeeded in prompting unsafe responses over 25 percent of the time. Expert and malicious users triggered unsafe outputs at rates exceeding 45 percent. These were not isolated events. The vulnerabilities spanned multiple CBRN categories and a wide variety of prompt types.

Percent of generated responses flagged as unsafe by user type and LLM

What This Means for Enterprises Deploying AI

These models are being deployed across industries including healthcare, finance, education, and defense to support AI-powered customer support agents, search assistants, and chatbots. Many of these systems are open to public input, and a determined actor can exploit that exposure.

The most alarming finding is that even basic prompts can yield harmful results. The study showed that unsafe responses were most prevalent in nuclear and biology-related queries. Activities like dissemination, concealment, and transfer triggered the highest number of unsafe responses across both models, indicating broad and deep vulnerabilities.

Percent of generated responses flagged as unsafe per harmful domain and LLM.

Mitigation Requires More Than What LLMs Offer

The takeaway for enterprise developers is that responsible AI systems must be treated as core infrastructure, not as an optional layer added at the end of the initial dev cycle. Simple prompt engineering or content filtering is not enough, and enterprises must adopt multi-layered safety systems that include:

Context-aware prompt monitoring
Dynamic threat detection based on user behavior
Continuous red teaming across high-risk domains
Ongoing audits of outputs in sensitive use cases

A mature safety strategy combines domain-specific threat research, expert red teaming, and AI observability. Models must be stress-tested not only for obvious misuse but also for edge cases, user escalation paths, and evolving social engineering techniques.

Ensure your AI apps and agents aren’t misused from the start with advanced red teaming that provides domain-informed stress tests to reveal how your AI models behave under pressure from a wide range of threat actors. ActiveFence Advanced Red Teaming simulates real-world attack scenarios based on up-to-date threat intelligence gathered in over 50 languages to uncover vulnerabilities before they go live.

After launch, deploy a purpose-built safety infrastructure that goes beyond static rules. ActiveFenceGuardrails dynamically evaluates every user prompt and model output in real time, informed by global threat intelligence and policy-aware safeguards, fine tunes responses based on your unique brand requirements, and offers up to the second visibility into every interaction. This ensures your models remain safe, aligned, and compliant, even as user behavior evolves.

With these tools in place, you can launch AI experiences that are not only powerful, but resilient against risks like those presented by CBRN.

Concerned About CBRN Risks in AI?

Talk to our experts →

Your AI Is Talking. But Is It Saying the Right Things?

High Stakes in High-Risk Domains

What This Means for Enterprises Deploying AI

Mitigation Requires More Than What LLMs Offer

Table of Contents

Related Content

These Are the Top Generative AI Dangers to Watch For in 2024

How a Rhyme-Driven Jailbreak Slipped Past GenAI Guardrails

The Importance of Threat Expertise in GenAI Red Teaming