Your guide to selecting the best Trust and Safety tools Download the Buyer's Guide to Trust and Safety 2.0

hero

AI Model Safety and Accelerates Model Releases

Cohere, a leader in AI language technology, leverages ActiveFence’s Generative AI Safety solution to enhance model safety and accelerate release timelines.

At a Glance Cohere founded in 2019, Cohere is a leading Large Language Model company, which trains versatile generative AI models for use in business applications and enterprises in over 100 languages. Understanding that novel generative AI technologies come with significant risks, the company established an AI safety division, led by Seraphina Goldfarb-Tarrant, PhD. The team leads the company’s efforts to identify and mitigate potential harms that could arise from the use or misuse of AI technology.

The Challenge

Like any large language model, the novel nature of AI technology meant that Cohere faced a broad range of unknown threats. Cohere’s broad linguistic coverage, however, added the challenge of detecting these threats in languages that are not covered by traditional detection systems.

Particularly concerning for Seraphina was the potential for harmful activity in non-Romance languages.  She was concerned with malicious actors using Cohere’s models to create sophisticated attacks and harmful content like misinformation, hate speech, and CSAM, and the inadvertent generation of offensive or biased content, as well as suicide and self-harm content.

“Because this technology is so new and constantly evolving, the potential for harm by malicious users is enormous, and we don’t fully understand how they will do it, which makes it very hard to detect” 

Seraphina looked for a partner with true domain expertise across a wide range of abuse areas, who could identify threats and work with her to find solutions. She knew that she didn’t have the time or resources to develop this domain-level expertise in-house, so she turned to ActiveFence.

Company Info

PROFILE
Cohere is a leading Large Language Model company, which trains versatile generative AI models for use in business applications and enterprises in over 100 languages
INDUSTRY
GenAI

The Solution

To support Cohere’s AI safety team, ActiveFence provided two distinct services: targeted data feeds and red teaming: Targeted Data Feeds: Using specialized domain-area knowledge across abuse areas and languages, ActiveFence provides the team with feeds of risky prompts and annotations. This data is then used to train Cohere’s models, enabling them to better recognize and appropriately respond to similar content, reducing the risk of harmful outputs.  

“ActiveFence is one of our main streams of data that we use for safety evaluation. It's especially important for threat actor evaluation because of the domain expertise.”

Red Teaming: ActiveFence’s team of experts conducts specialized red teaming exercises to test specific features and model releases. These exercises mimic real-world risks by simulating attacks or problematic scenarios that a malicious user might attempt, and assessing Cohere’s resilience against these threats. This proactive approach helps the team discover weaknesses before they can be exploited maliciously in deployed applications. 

By harnessing ActiveFence’s specialized domain expertise across several abuse areas and multiple languages, the team is able to get real insights into the Cohere’s safety challenges. Then, through a collaborative relationship, come up with targeted solutions.

“My experience working with ActiveFence has been distinct from my experiences with other partnerships, in that it is much more of a collaborative discussion where we take ActiveFence’s domain expertise in different types of content and combine that with what we know about machine learning and our models to come up with what we should do from there.” 

The Impact

By leveraging ActiveFence’s red teaming insights and targeted data, the AI safety team is able to improve model safety and reliability, accelerate model release timelines, and be proactive about regulatory compliance.

Applying ActiveFence’s domain expertise allows the team to develop more sophisticated safety mechanisms within Cohere’s models. These findings translate to more reliable AI models, that are less likely to generate harmful content, particularly within high-risk abuse areas like misinformation, hate speech,  and child safety.

"ActiveFence has significantly impacted our iteration speed and confidence in our evaluations and mitigations. It has enabled us to develop a faster evaluation suite, allowing us to release models more quickly and safely."

Recently, the company released several major models, each of which involved multiple iterations. As part of the release process, the AI safety team had to find a good balance between performance and safety. ActiveFence data helped the team with these evaluations:

The partnership also enables Cohere to be proactive about safety. By using the outcomes of red teaming exercises, Seraphina is able to identify what the AI safety team should focus on next, targeting her efforts to the areas that need it most. Moreover, by using verified malicious prompts to train models, she is able to proactively tackle harmful content, before it arrives at the model organically.

Within three months of integrating, we reduced time to handle by 38%

Seraphina Goldfarb-Tarrant

Head of AI Safety

Discover how GenAI red teaming can protect your platform in ActiveFence’s latest report.

Read the Report

Related Case Studies

Smiling couple learning online together on a laptop at home.
CASE STUDY

Udemy

Discover how Udemy uses ActiveFence’s solutions to safeguard learners and educators worldwide.

Read More
Female gamer immersed in a virtual reality environment with multiple screens.
CASE STUDY

Niantic

See how Niantic boosts user safety and engagement with ActiveFence’s cutting-edge technologies.

Read More
Group of diverse young people standing together in front of a rainbow flag, symbolizing LGBTQ+ support and inclusivity.
CASE STUDY

The Trevor Project

Explore how The Trevor Project leverages ActiveFence’s tools to create a safe, supportive space for LGBTQ+ youth.

Read More