Real-time visibility, safety, and security for your GenAI-powered agents and applications
Proactively test GenAI models, agents, and applications before attackers or users do
Deploy generative AI applications and agents in a safe, secure, and scalable way with guardrails.
Proactively identify vulnerabilities through red teaming to produce safe, secure, and reliable models.
See How Your AI Handles the Pressure
The demand for AI powered apps and agents is real, and enterprise companies are moving quickly to launch. But a recent ActiveFence study reveals an uncomfortable truth. Today’s most popular large language models remain dangerously vulnerable to being manipulated into sharing harmful information. For any organization planning to deploy AI systems, these findings raise immediate concerns.
ActiveFence conducted a comparative analysis of two widely used large language models, evaluating their behavior across chemical, biological, radiological, and nuclear (CBRN) risks in Biology, Virology, Chemistry, Nuclear, and Radiology domains. Additionally, each domain is examined through a standardized set of threat vectors, including:
Three user personas were tested in single-turn prompt interactions, evaluating how the model responds to isolated but strategically constructed CBRN queries. These included a non-expert user with little subject-matter knowledge, an expert with technical fluency, and a malicious actor with clear intent to misuse the model. The test prompts fell into three categories. Some asked the model to create harmful content. Others sought to retrieve dangerous information. A third category asked the model to describe how certain harmful acts could be executed.
The results were striking. Even non-expert users succeeded in prompting unsafe responses over 25 percent of the time. Expert and malicious users triggered unsafe outputs at rates exceeding 45 percent. These were not isolated events. The vulnerabilities spanned multiple CBRN categories and a wide variety of prompt types.
Percent of generated responses flagged as unsafe by user type and LLM
These models are being deployed across industries including healthcare, finance, education, and defense to support AI-powered customer support agents, search assistants, and chatbots. Many of these systems are open to public input, and a determined actor can exploit that exposure.
The most alarming finding is that even basic prompts can yield harmful results. The study showed that unsafe responses were most prevalent in nuclear and biology-related queries. Activities like dissemination, concealment, and transfer triggered the highest number of unsafe responses across both models, indicating broad and deep vulnerabilities.
Percent of generated responses flagged as unsafe per harmful domain and LLM.
The takeaway for enterprise developers is that responsible AI systems must be treated as core infrastructure, not as an optional layer added at the end of the initial dev cycle. Simple prompt engineering or content filtering is not enough, and enterprises must adopt multi-layered safety systems that include:
A mature safety strategy combines domain-specific threat research, expert red teaming, and AI observability. Models must be stress-tested not only for obvious misuse but also for edge cases, user escalation paths, and evolving social engineering techniques.
Ensure your AI apps and agents aren’t misused from the start with advanced red teaming that provides domain-informed stress tests to reveal how your AI models behave under pressure from a wide range of threat actors. ActiveFence Advanced Red Teaming simulates real-world attack scenarios based on up-to-date threat intelligence gathered in over 50 languages to uncover vulnerabilities before they go live.
After launch, deploy a purpose-built safety infrastructure that goes beyond static rules. ActiveFenceGuardrails dynamically evaluates every user prompt and model output in real time, informed by global threat intelligence and policy-aware safeguards, fine tunes responses based on your unique brand requirements, and offers up to the second visibility into every interaction. This ensures your models remain safe, aligned, and compliant, even as user behavior evolves.
With these tools in place, you can launch AI experiences that are not only powerful, but resilient against risks like those presented by CBRN.
Concerned About CBRN Risks in AI?
Over the past year, we’ve learned a lot about GenAI risks, including bad actor tactics, foundation model loopholes, and how their convergence allows harmful content creation and distribution - at scale. Here are the top GenAI risks we are concerned with in 2024.
See how easily multiple GenAI models, from LLMs to speech-to-speech, were tricked into divulging malicious code and weapon design instructions.
Dive into why deep threat expertise on GenAI red teams is increasingly important.