ActiveOS Updates: Prosocial, Updated CSAM Model, and More

By Emma Datny
April 15, 2024

Megaphone with glowing light and text 'Product Updates April 2024' on a blue background.

Join our ongoing demo series - Demo Tuesdays

Our product team has been working hard on some exciting ActiveOS and ActiveScore features and enhancements, to help boost moderation efficiency and safeguard your community.

Just this month, we’re excited to share about the following releases:

Improved CSAM detection model: Enable enhanced protection and detection accuracy when combatting Child Sexual Abuse Material(CSAM).
New Prosocial model – Identify and amplify the most positive users, to proactively encourage positive behavior, and boost retention.
Self-serve keywords tool – Gain full control with our self-serve UI and new filtering methods for greater detection accuracy.
Upgraded moderation queue UI – Improve focus and productivity during manual review. Our new UI offers enhanced data visibility, improved chat views, and even more customization options for swift decision making.

Check out more details below about each feature.

Stronger child safety protection with novel CSAM models

We have made significant improvements to our CSAM detection model performance to enhance protection and detection accuracy. Our novel CSAM detectors extend beyond hash matching and can identify multilingual terminology, GenAI text and image prompt manipulation techniques, NCMEC media matching, and more.

You can access the model, like any of our others, with one API integration. When analyzing content, ActiveScore considers all surrounding metadata, including usernames, bios, chat messages, prompts, logos, and more, to provide greater context and optimal accuracy.

Each analyzed item will return a risk score from 0-100, indicating the likelihood of it containing CSAM. The results also include associated indicators and descriptions.

Note that we are continuously improving our models daily through an automatic feedback loop from moderator decisions, retraining according to customer’s unique policies, real-world drifts, and up-to-date findings from our intelligence team. This ongoing refinement increases accuracy over time, as you can see in our benchmarks below:

Proactively encourage positive behavior with our new Prosocial model

Creating a safe and inclusive community is critical to drive engagement and retain users. To support this goal, we are excited to introduce our new ActiveScore Prosocial model.

This model helps deter negative behaviors and encourages positive ones. It automatically analyzes conversations to identify key indicators of positivity and highlights the most positive users.

With this information, you can automate actions to recognize and reward these users, leading to improved engagement within your community.

How it works:

You can easily assess the impact of users on the overall health of your community by leveraging a user reputation score. This score helps you rank users based on their positive and toxic behaviors. Then, you can take the appropriate actions to reward respectful users or take action against bad actors with low scores.

More accurate self serve keyword filtering and matching

We have upgraded our keywords tool to improve the accuracy of matching. By using different filtering methods, you can control the level of specificity in your detection. It also allows for flexibility in detecting variations in language usage, so you won’t miss catching anything such as deliberate misspellings, abbreviations, typos, or alternative phrasings.

You can now easily set your keyword filters to exact, fuzzy, or partial match. This allows for adaptations such as case insensitivity, leet translations, duplication removal, typos, character separation, and more.

Here’s an overview of each new filtering option:

Exact match: This option detects content that exactly matches a specified keyword or phrase. It also includes enhanced case sensitivity to catch broader detection avoidance techniques.

Fuzzy match: Previously an embeddable match, this option matches content through sensitivity-based comparison. It allows for some degree of variation or imprecision, such as accommodating data that may contain typos, or duplication.

Partial match: This is a new detector that identifies content that matches characters, even if they are part of a larger token or contain prefixes and suffixes.

Plus, this feature gives you the flexibility to fine-tune the matching process based on your specific needs. If you want to capture only very close matches, you can set a lower threshold. On the other hand, if you want to allow for more variation and include slightly different variations of the keyword, you can set a higher threshold.

We have also enhanced explainability and transparency by providing more context on the match.

Here are some examples using the word “dog“:

Fuzzy Match:

High sensitivity: “d3org”, due to 2 invalid character
Medium sensitivity: “doqg“, due to 1 invalid characters
Low sensitivity: “dog“, due to 0 invalid characters

Partial match:

High sensitivity: “el12aq2dr2og“, due to 2 invalid character & prefix
Medium sensitivity: “twew2zxsdqogprf”, due to 1 invalid characters, prefix & suffix
Low sensitivity: “twew2zxsdogprf”, due to 0 invalid characters, prefix & suffix

We have also upgraded the ActiveOS user interface to make these features more accessible and user-friendly. The upgraded UI lets you view and filter information more easily based on method, language, similarities, and other fields.

Boost manual review efficiency with new UI

We have made further improvements to enhance moderator efficiency with our new moderation queue UI. The upgraded UI now includes better data visibility, improved chat views, and more customization options. This allows teams to show only the relevant data you need per each queue, to ensure that moderators can focus on the most relevant information, for quicker decision making.

Our new chat view makes it easier to pinpoint the context of a conversation, and see where it may have steered to a negative direction, in order to take action against the content or at a user-level. These new functionalities are available to all our ActiveOS users without the need for any additional configuration.

Stay tuned, as we are continuing to work on many more exciting features and enhancements for ActiveOS.

If you’re interested in learning more or seeing these features in action, we invite you to our ongoing demo series – Demo Tuesdays. It’s a great opportunity to see the product in action, meet with our team, and ask any questions you may have! Alternatively, you can also schedule a 1-1 demo session with us.

Join our ongoing demo series - Demo Tuesdays

ActiveOS Updates: Prosocial, Updated CSAM Model, and More

Stronger child safety protection with novel CSAM models

Proactively encourage positive behavior with our new Prosocial model

More accurate self serve keyword filtering and matching

Boost manual review efficiency with new UI

Table of Contents

Related Content

Keeping Up with New Business Priorities: A Crash Course In GenAI Safety for T&S Professionals

ActiveOS Updates: Real-Time Actioning, Detecting Sexual Solicitation, and Stopping Extremism

Detecting Novel CSAM – Why Image Hash Matching Isn’t Enough Anymore