Black Forest Labs (FLUX)

Scaling Safety for Text-to-Image and Image-to-Image Generation Through Red Teaming

Ongoing adversarial testing helps one of the world’s top image-creation models to stay resilient against misuse and safety blind spots.

Request a Demo GenAI Safety Solution

At a Glance Black Forest Labs, developer of FLUX, one of the world’s most advanced image generation models, partnered with ActiveFence to support its responsible development practices across a range of model releases. In a series of red teaming exercises, our SMEs crafted hundreds of adversarial prompts to probe potential vulnerabilities. The findings informed retraining and enforcement improvements, helping the client meet tight release deadlines while addressing high-impact risks like NCII and child safety. The result: safer outputs, maintained creative fidelity, and greater resilience against misuse.

Company Info

Descritpion

Black Forest Labs is the creator of FLUX, one of the world’s most advanced image generation AI models. Designed to deliver immersive, high-fidelity image creation, FLUX is widely used across creative, entertainment, and digital media industries.

Product

Leading text-to-image and image-to-image model.

Value

Responsible model launches with maintained quality.

Volume

Hundreds of crafted adversarial prompts.

Prevalence

Critical risks surfaced in each sprint.

The Challenge

Like many generative AI systems, FLUX faces evolving challenges around content safety, misuse, and policy compliance. Bad actors have developed increasingly sophisticated prompt and input engineering techniques designed to bypass these protections. These attempts have grown both more frequent and more difficult to anticipate.

Black Forest Labs approached ActiveFence to identify potential vulnerabilities for further mitigation. Early, pre-safety-tuned checkpoints of the model showed susceptibility to malicious prompts or inputs, including inappropriate deepfakes, sexually explicit imagery, and non-consensual intimate imagery (NCII).

While internal safety evaluations were conducted, Black Forest Labs recognized the value of external evaluation to uncover hidden vulnerabilities and proactively address emerging risks.

The company sought a proactive, rigorous solution to:

* Detect edge-case failures and alignment gaps.

* Identify vulnerabilities related to child safety, NCII, and inappropriate deepfakes.

* Improve policy enforcement and retraining inputs.

* Keep pace with emerging abuse tactics being openly shared online.

Company Info

Descritpion

Product

Leading text-to-image and image-to-image model.

Value

Responsible model launches with maintained quality.

Volume

Hundreds of crafted adversarial prompts.

Prevalence

Critical risks surfaced in each sprint.

The Solution

To uncover safety vulnerabilities and stay ahead of emerging threats, Black Forest Labs partnered with ActiveFence to implement a tailored AI Red Teaming program focused on adversarial stress-testing.

The effort centered on expert-led manual red teaming, with adversarial prompts crafted by subject matter experts (SMEs) to target potential weaknesses and bypasses. These prompts were designed to probe edge cases, policy boundaries, and areas of known concern, such as NCII and child safety. All resulting model outputs were persisted to Amazon S3 for durable, scalable storage, enabling efficient cross-sprint analysis and traceability.

The process included:

* Crafting hundreds of nuanced prompts designed to test the limits of the model’s initial safeguards.

* Leveraging subject-matter experts to identify blind spots, alignment failures, and safety policy gaps.

* Collaborating closely with the client to align on risk thresholds, safety guidelines, and content boundaries

This bespoke approach allowed for deeper analysis of the model’s behavior and surfaced vulnerabilities that informed retraining by Black Forest Labs, policy refinement, and broader risk mitigation efforts.

The Impact

ActiveFence’s Red Teaming program played a critical role in Black Forest Labs’ pre-launch decision-making process, with structured adversarial testing sprints conducted ahead of major updates and launches.

Through adversarial testing cycles, the company was able to:

* Map the risk landscape.

* Surface high-risk outputs and edge-case failures missed by traditional safety mechanisms.

* Strengthen detection and mitigation of sensitive content, including NCII and child safety risks.

* Generate clear, data-driven inputs for policy enforcement and model retraining.

* Maintain user experience and creative quality while systematically improving safety alignment.

This process enabled the client to release with confidence, stay ahead of emerging abuse tactics, and reinforce trust and resilience in a fast-evolving threat landscape.

“Responsible model development is a top priority for us, and we value working with partners who help us uncover and address risks with care and expertise.”

Related Case Studies

CASE STUDY

Amazon Nova

To validate its most advanced foundation model to date, Amazon engaged ActiveFence for a manual red-teaming evaluation of Nova Premier, testing the model's readiness for safe and secure deployment.

CASE STUDY

Cohere

Learn how Cohere leveraged ActiveFence’s Generative AI Safety solution to enhance model safety and accelerate release timelines.

Female gamer immersed in a virtual reality environment with multiple screens.

CASE STUDY

Niantic

See how Niantic boosts user safety and engagement with ActiveFence’s cutting-edge technologies.