Improve your detection and simplify moderation - in one AI-powered platform.
Stay ahead of novel risks and bad actors with proactive, on-demand insights.
Proactively stop safety gaps to produce safe, reliable, and compliant models.
Deploy generative AI in a safe and scalable way with active safety guardrails.
Online abuse has countless forms. Understand the types of risks Trust & Safety teams must keep users safe from on-platform.
Protect your most vulnerable users with a comprehensive set of child safety tools and services.
Our out-of-the-box solutions support platform transparency and compliance.
Keep up with T&S laws, from the Online Safety Bill to the Online Safety Act.
Over 70 elections will take place in 2024: don't let your platform be abused to harm election integrity.
Protect your brand integrity before the damage is done.
From privacy risks, to credential theft and malware, the cyber threats to users are continuously evolving.
Stay ahead of industry news in our exclusive T&S community.
Find out more about Generative AI safety with NeMo Guardrails
Generative AI is transforming how we interact online, opening new possibilities for innovation and communication. However, with these advancements comes an urgent need to safeguard digital environments.
At ActiveFence, we’ve been at the forefront of online safety, combatting risks associated with user-generated content. As AI-generated content grows exponentially, so do the challenges.
Today, ActiveFence offers the most advanced AI Content Safety solutions, designed specifically for applications powered by large language models (LLMs). We’re helping bolster the safety of AI-generated content using the NVIDIA NeMo Guardrails platform – making AI safety solutions more accessible to businesses of all sizes, from tech giants, through enterprises to startups, using open-source technologies.
NVIDIA has introduced three new NIM microservices for safeguarding AI applications. These NeMo Guardrails NIM microservices use advanced datasets and modeling techniques to improve the safety and reliability of enterprise generative AI applications. They’re designed to build user trust in AI-driven tools, like AI agents, chatbots and other LLM-enabled systems.
Central to the orchestration of these NIM microservices is the NVIDIA NeMo Guardrails platform, built to support developers in integrating AI guardrails in LLM applications. NeMo Guardrails provides a scalable framework to integrate multiple small, specialized rails, helping developers deploy AI systems without compromising on performance or safety.
We’ve also integrated NeMo Guardrails with ActiveFence’s proprietary LLM models through our API, ActiveScore. This integration adds robust content moderation to AI systems, helping to prevent harmful, hateful, or inappropriate content in conversational AI.
We are excited to deliver an AI safety solution integrated directly with NeMo Guardrails. This brings our expertise to a broader audience, helping organizations worldwide safely adopt generative AI. ActiveFence offers the most mature and comprehensive AI content safety solutions, which helps ensure that organizations can implement generative AI with greater safety and precision.
Our adoption of NVIDIA NeMo Guardrails introduces a multi-level safety approach to protect LLM-enabled applications:
ActiveFence already works with seven foundation model organizations and top AI players like Cohere and Stability AI. This new work extends our reach, delivering safety solutions to the global developer community. Whether you’re building an AI agent, chatbot or deploying enterprise-scale AI tools, our solution ensures safe AI interactions that are aligned with your brand’s values.
At ActiveFence, our commitment to safety goes beyond technology. With years of experience and acquisitions like Spectrum Labs and Rewire, we’ve developed a vast intelligence network to support AI content safety. Our solutions are designed to scale and address the challenges of moderating interactions in LLM-enabled environments.
With ActiveFence’s safety solutions and NVIDIA NeMo Guardrails, creating secure, user-friendly AI systems has never been easier. If you’re exploring how generative AI can improve your user interactions, we’re here to help ensure your integration is safe, scalable, and effective.
Let’s shape the future of generative AI-together. Reach out to learn more about how ActiveFence can help secure your AI-powered solutions. Deploy Safe and Reliable Generative AI.
The following is an activation guide for integrating ActiveFence’s ActiveScore API with your chatbot using the NeMo Guardrails library. The library now supports the API out-of-the-box, and the underlying implementation details can be found here. Here’s how to get started.
Assuming you already have the following configuration structure in your project, as described in NeMo Guardrails documentation:
. ├── config │ ├── actions.py │ ├── config.py │ ├── config.yml │ ├── rails.co │ ├── ...
To enable ActiveScore moderation for the user input, add the following to config.yml file:
rails: input: flows: - activefence moderation
The activefence moderation flow uses a risk score threshold of 0.85 to decide whether use input should be allowed. If the score exceeds this threshold, it is considered a violation. You also need to set the ACTIVEFENCE_API_KEY environment variable.
You may also use activefence moderation detailed, which has individual scores per violation category, by adding:
rails: input: flows: - activefence moderation detailed
To customize the scores, you have to overwrite the default flows in your config. For example, to change the threshold for ActiveFence moderation, add the following flow to your rails.co file:
define subflow activefence moderation """Guardrail based on the maximum risk score.""" $result = execute call activefence api if $result.max_risk_score > 0.9 # change the threshold here bot inform cannot answer stop
In the above example, we’re overriding the “activefence moderation” flow. We defined the bot behavior as follows:
Basically, the bot will refuse to respond if the max risk score exceeds 0.9.
ActiveFence’s ActiveScore API provides flexibility to control specific violations individually. For example, to moderate hate speech:
define flow activefence moderation detailed $result = execute call activefence api if $result.violations.get("abusive_or_harmful.hate_speech", 0) > 0.8 bot inform cannot engage in abusive or harmful behavior stop define bot inform cannot engage in abusive or harmful behavior "I will not engage in any abusive or harmful behavior."
This makes sure the bot will refuse to engage in hate speech if the risk score for it exceeds 0.8.
To ensure that the generated output from the LLM follows moderation policies, we will have to override the system action.
The default action only runs on the user input text, by adding the following to your actions.py file, we change it to run on any text:
import os import aiohttp from nemoguardrails.actions import action from nemoguardrails.utils import new_uuid @action(name="call activefence api", is_system_action=True) async def call_activefence_api(text: str): api_key = os.environ.get("ACTIVEFENCE_API_KEY") if api_key is None: raise ValueError("ACTIVEFENCE_API_KEY environment variable not set.") url = "https://apis.activefence.com/sync/v3/content/text" headers = {"af-api-key": api_key, "af-source": "nemo-guardrails"} data = { "text": text, "content_id": "ng-" + new_uuid(), } async with aiohttp.ClientSession() as session: async with session.post( url=url, headers=headers, json=data, ) as response: if response.status != 200: raise ValueError( f"ActiveFence call failed with status code {response.status}.\n" f"Details: {await response.text()}" ) response_json = await response.json() violations = response_json["violations"] violations_dict = {} max_risk_score = 0.0 for violation in violations: if violation["risk_score"] > max_risk_score: max_risk_score = violation["risk_score"] violations_dict[violation["violation_type"]] = violation["risk_score"] return {"max_risk_score": max_risk_score, "violations": violations_dict}
You don’t have to read and understand this long method to use it, the essence of the change is in the method arguments. To use it, update the action call as part of our rails and replace your existing action call like this:
$result = execute call activefence api(text=$user_message)
Or, to moderate the LLM output:
$result = execute call activefence api(text=$bot_message)
Lastly, to activate it, add this to your config.yml file:
rails: output: flows: - activefence moderation
By activating that output rail, the API checks the LLM-generated response for safety.
Generative AI is transforming industries, but its growth brings complex safety challenges. ActiveFence is addressing these risks by combining AI content safety expertise with NeMo Guardrails, an open-source framework designed to orchestrate industry-leading safeguards for LLM-enabled applications.
By using ActiveFence’s robust API and risk assessment tools, developers can seamlessly add multi-layered safeguards to their AI systems, ensuring they follow platform policies and build user trust.
Whether you’re building a chatbot or deploying enterprise-scale solutions, ActiveFence helps ensure safety at every stage, making safer AI interactions safer for everyone.
Read about the latest updates in ActiveOS and ActiveScore that empower faster automation and improve visibility for administrators.
Check out our discussion with Mike Pappas of Modulate, and get practical tips and strategies for building more trustworthy online communities.
The State of Online Gaming in 2025 report explores the safety challenges and trends shaping the future of player protection.