ActiveFence Advances Safe Generative AI Solutions with NVIDIA NeMo Guardrails


January 16, 2025

Find out more about Generative AI safety with NeMo Guardrails

Learn more

Generative AI is transforming how we interact online, opening new possibilities for innovation and communication. However, with these advancements comes an urgent need to safeguard digital environments.

At ActiveFence, we’ve been at the forefront of online safety, combatting risks associated with user-generated content. As AI-generated content grows exponentially, so do the challenges.

Today, ActiveFence offers the most advanced AI Content Safety solutions, designed specifically for applications powered by large language models (LLMs). We’re helping bolster the safety of AI-generated content using the NVIDIA NeMo Guardrails platform – making AI safety solutions more accessible to businesses of all sizes, from tech giants, through enterprises to startups, using open-source technologies.

New NeMo Guardrails NIM Microservices

NVIDIA has introduced three new NIM microservices for safeguarding AI applications. These NeMo Guardrails NIM microservices use advanced datasets and modeling techniques to improve the safety and reliability of enterprise generative AI applications. They’re designed to build user trust in AI-driven tools, like AI agents, chatbots and other LLM-enabled systems.

Central to the orchestration of these NIM microservices is the NVIDIA NeMo Guardrails platform, built to support developers in integrating AI guardrails in LLM applications. NeMo Guardrails provides a scalable framework to integrate multiple small, specialized rails, helping developers deploy AI systems without compromising on performance or safety.

Integrating ActiveFence with NeMo Guardrails for AI Content Safety Expertise

We’ve also integrated NeMo Guardrails with ActiveFence’s proprietary LLM models through our API, ActiveScore. This integration adds robust content moderation to AI systems, helping to prevent harmful, hateful, or inappropriate content in conversational AI.

We are excited to deliver an AI safety solution integrated directly with NeMo Guardrails. This brings our expertise to a broader audience, helping organizations worldwide safely adopt generative AI. ActiveFence offers the most mature and comprehensive AI content safety solutions, which helps ensure that organizations can implement generative AI with greater safety and precision.

Steps Toward Safer AI Content Interactions

Our adoption of NVIDIA NeMo Guardrails introduces a multi-level safety approach to protect LLM-enabled applications:

  1. Input Filtering: Use ActiveFence’s API to assess the risk of every user prompt, ensuring it meets your platform’s guidelines.
  2. Output Filtering: ActiveFence’s API provides a risk score for AI-generated outputs, making sure they comply with your platform’s policies.
  3. Incident Visibility and Management: Our AI Safety Center allows you to review filtered inputs and outputs and use them as data sets to retrain and improve model performance or update its rules.
  4. Automation of Decision-Making: ActiveFence also offers automations that enable platforms to enforce bespoke policies and automatically remove violative content based on customized risk thresholds. Further, using user-level rules like a “three strikes policy” platforms can take action at the user level, to avoid platform misuse.
  5. End-User Reporting: ActiveFence’s Flagging API lets users report risky or unwanted outputs and experiences, providing an essential additional line of defense for end-user reporting.

Why This Matters

ActiveFence already works with seven foundation model organizations and top AI players like Cohere and Stability AI. This new work extends our reach, delivering safety solutions to the global developer community. Whether you’re building an AI agent, chatbot or deploying enterprise-scale AI tools, our solution ensures safe AI interactions that are aligned with your brand’s values.

Safety by Design for LLMs

At ActiveFence, our commitment to safety goes beyond technology. With years of experience and acquisitions like Spectrum Labs and Rewire, we’ve developed a vast intelligence network to support AI content safety. Our solutions are designed to scale and address the challenges of moderating interactions in LLM-enabled environments.

With ActiveFence’s safety solutions and NVIDIA NeMo Guardrails, creating secure, user-friendly AI systems has never been easier. If you’re exploring how generative AI can improve your user interactions, we’re here to help ensure your integration is safe, scalable, and effective.

Let’s shape the future of generative AI-together. Reach out to learn more about how ActiveFence can help secure your AI-powered solutions. Deploy Safe and Reliable Generative AI.

Integrate ActiveFence API with NeMo Guardrails

The following is an activation guide for integrating ActiveFence’s ActiveScore API with your chatbot using the NeMo Guardrails library. The library now supports the API out-of-the-box, and the underlying implementation details can be found here. Here’s how to get started.

Activation Steps

Assuming you already have the following configuration structure in your project, as described in NeMo Guardrails documentation:

.
├── config
│   ├── actions.py
│   ├── config.py
│   ├── config.yml
│   ├── rails.co
│   ├── ...

To enable ActiveScore moderation for the user input, add the following to config.yml file:

rails:
  input:
    flows:
      - activefence moderation

The activefence moderation flow uses a risk score threshold of 0.85 to decide whether use input should be allowed. If the score exceeds this threshold, it is considered a violation. You also need to set the ACTIVEFENCE_API_KEY environment variable.

You may also use activefence moderation detailed, which has individual scores per violation category, by adding:

rails:
  input:
    flows:
      - activefence moderation detailed

Customization

To customize the scores, you have to overwrite the default flows in your config. For example, to change the threshold for ActiveFence moderation, add the following flow to your rails.co file:

define subflow activefence moderation
  """Guardrail based on the maximum risk score."""
  $result = execute call activefence api

  if $result.max_risk_score > 0.9 # change the threshold here
    bot inform cannot answer
    stop

In the above example, we’re overriding the “activefence moderation” flow. We defined the bot behavior as follows:

  • execute call activefence api: Pass the user input message to ActiveFence’s ActiveScore API, which will return both max_risk_score and violations_dict to the result variable.
  • if $result.max_risk_score > 0.9: Validates that the API max risk score is higher than 0.9. Use that to define the
  • threshold beyond which the chatbot refuses to respond.
    bot inform cannot answer: Bot will inform the user that it is unable to respond to this query.

Basically, the bot will refuse to respond if the max risk score exceeds 0.9.

Individual Violation Control

ActiveFence’s ActiveScore API provides flexibility to control specific violations individually. For example, to moderate hate speech:

define flow activefence moderation detailed
  $result = execute call activefence api

  if $result.violations.get("abusive_or_harmful.hate_speech", 0) > 0.8
    bot inform cannot engage in abusive or harmful behavior
    stop

define bot inform cannot engage in abusive or harmful behavior
  "I will not engage in any abusive or harmful behavior."

This makes sure the bot will refuse to engage in hate speech if the risk score for it exceeds 0.8.

LLM Output Moderation

To ensure that the generated output from the LLM follows moderation policies, we will have to override the system action.

The default action only runs on the user input text, by adding the following to your actions.py file, we change it to run on any text:

import os

import aiohttp
from nemoguardrails.actions import action
from nemoguardrails.utils import new_uuid

@action(name="call activefence api", is_system_action=True)
async def call_activefence_api(text: str):
    api_key = os.environ.get("ACTIVEFENCE_API_KEY")

    if api_key is None:
        raise ValueError("ACTIVEFENCE_API_KEY environment variable not set.")

    url = "https://apis.activefence.com/sync/v3/content/text"
    headers = {"af-api-key": api_key, "af-source": "nemo-guardrails"}
    data = {
        "text": text,
        "content_id": "ng-" + new_uuid(),
    }
    async with aiohttp.ClientSession() as session:
        async with session.post(
                url=url,
                headers=headers,
                json=data,
        ) as response:
            if response.status != 200:
                raise ValueError(
                    f"ActiveFence call failed with status code {response.status}.\n"
                    f"Details: {await response.text()}"
                )
            response_json = await response.json()
            violations = response_json["violations"]

            violations_dict = {}
            max_risk_score = 0.0
            for violation in violations:
                if violation["risk_score"] > max_risk_score:
                    max_risk_score = violation["risk_score"]
                violations_dict[violation["violation_type"]] = violation["risk_score"]

            return {"max_risk_score": max_risk_score, "violations": violations_dict}

You don’t have to read and understand this long method to use it, the essence of the change is in the method arguments. To use it, update the action call as part of our rails and replace your existing action call like this:

$result = execute call activefence api(text=$user_message)

Or, to moderate the LLM output:

$result = execute call activefence api(text=$bot_message)

Lastly, to activate it, add this to your config.yml file:

rails: output: flows: - activefence moderation

By activating that output rail, the API checks the LLM-generated response for safety.

Toward Safer AI Interactions

Generative AI is transforming industries, but its growth brings complex safety challenges. ActiveFence is addressing these risks by combining AI content safety expertise with NeMo Guardrails, an open-source framework designed to orchestrate industry-leading safeguards for LLM-enabled applications.

By using ActiveFence’s robust API and risk assessment tools, developers can seamlessly add multi-layered safeguards to their AI systems, ensuring they follow platform policies and build user trust.

Whether you’re building a chatbot or deploying enterprise-scale solutions, ActiveFence helps ensure safety at every stage, making safer AI interactions safer for everyone.

 

Table of Contents

Find out more about Generative AI safety with NeMo Guardrails

Learn more