How Predators Are Abusing Generative AI

By ,
April 18, 2023
Collage of generatie-AI created images of children

[This blog’s cover image is comprised of AI-generated images of children]

Generative AI has seen mass adoption, but as with all technologies - it is open to abuse by threat actors. In this article Guy Paltieli, ActiveFence’s Senior Child Safety Researcher, uses his exclusive research into predator communities to discuss the risks of this new technology and the steps platforms can take to secure themselves. This article will remain purposefully high-level to avoid providing specific techniques that threat actors could exploit.

By freeing creative processes from human constraints, generative AI will enable mass content production and the fast and accurate transmission of instructional information. The capabilities enabled by generative AI are extensive, but all platforms must prepare as AI-generated content becomes more widespread.

Various types of threat actors, including those engaged in terrorist propaganda, hate speech, disinformation campaigns, or child abuse, are testing the new possibilities offered by generative AI. ActiveFence is monitoring these communities, active on hidden forums and instant messaging groups, to detect new trends in abuse. The first article in our series will focus on child predators.

Generating Child Sexual Abuse Material with AI

Child predator communications often take place in hidden communities on private or instant messaging forums. In many of these forums, we have identified newly created sections dedicated to the abuse of AI. Here members advise and request information on acquiring child sexual abuse-related materials from AI systems.

Testing platforms to locate weaknesses, the predators share examples of how they were able to circumvent safeguards, often including examples of the content that they were able to produce. Access to this chatter allows us to identify platform weaknesses and understand how to strengthen the systems in place best.

We are seeing that child predators are using generative AI for serious text and image-based violations, including:

  • Production of guides on how to locate and groom vulnerable minors
  • Generation of scripts to communicate with and groom minors
  • Writing poetry and short stories that describe children in a sexual context
  • Modification and sexual distortion of existing images of children
  • Creation of novel pseudo-photographic CSAM

The creation of instructional guides to abuse minors and the sexualized modification of innocent images depicting minors is undoubtedly dangerous and frequently would be regarded as criminal. The visual CSAM generated by the AI would often be classified with the same severity as child pornography. This approach can be seen in the UK, Australia, and Canada, where sexual depictions of minors meet the criminal threshold. This is particularly important when considering that generative AI platforms themselves create child sexual content, albeit under the direction of threat actors.

Exploiting Trust & Safety Gaps in Generative AI

Predators can create this malicious material by tapping into Trust & Safety weaknesses in the Generative AI platform and processes. These weaknesses are discovered via dedicated group communications in a concerted effort to test platform defenses: when one predator finds a weakness – such as sub-optimal coverage of a certain language, they will share that information with the group. Other predators will take that information, swarming to test and prod it, coming up with the specific strings of text in different languages that will provide the desired outcome. Three of the most common weaknesses involve language coverage, contextual understanding, and technical loopholes.

Language Coverage Inconsistencies

A core weakness for Trust & Safety in generative AI platforms is the lack of complete language coverage. Our research found that not only is there targeted predator activity seeking to locate gaps, but also uneven coverage that is vulnerable to this abuse. This opens up massive opportunities for pedophiles to create child predator content in unsecured languages.

As an illustrative example, if one platform has sophisticated protection in English but weaker systems to tackle malicious content in Urdu or Korean, predators will attempt to use the platform to produce child predator content – like fantasy stories or grooming guides, using Korean commands.

This inconsistent security poses a major risk to platform integrity as once threat actors discover a gap, they share and exploit this weakness quickly. For providers to ensure the safety of their platforms, regardless of language, they should not offer services in languages where they cannot protect users. New language coverage should only be added once proper, language-specific safeguards are in place

Abuse Due to Lack of Predator Keywords

Another challenge facing generative AI platforms involves the ability to recognize and block generic, as well as niche expressions, keywords, or references to child sexual exploitation. While generic terminology may be known to many Trust & Safety teams, references to niche names of CSAM studios and popular child predator manuals or guides require more specialized knowledge.

Specialized knowledge of child predator terms and CSAM production studios is critical for generative AI. To illustrate this, we observed a generative AI tool responding to a request to draft a list of tips on grooming minors based on a well-known pedophile manual. The AI tool located the predator manual and extracted relevant information from the guides, presenting dangerous tips on how to abuse minors sexually. Had the model been trained to block queries related to this manual, which is well-known in predator circles, it would have triggered a warning and refused the request.

Technical Loopholes: Successive Chains of Requests

By identifying technical loopholes, predators can easily manipulate generative AI to create sexual content depicting children. These technical loopholes are also easily shared, replicated, and built upon, opening opportunities for further platform abuse. One example of this involves using strings of primary and secondary commands, which are frequently based on mainstream media AI guides.

While an initial request for violative content may not be successful, predators have found and shared sets of queries, which, when asked in succession, can manipulate the AI to create harmful content. These usually require multiple steps to produce sexual material that depicts children. However, by utilizing specific primary commands paired with secondary requests on an AI-generated image, predators can direct the same tool to produce explicitly sexual image-based content of minors. Accordingly, teams should consider the risks posed by a flow of queries and train the systems to take these as a whole.

Addressing the Generative AI Risk

While the challenge is great, effective moderation and risk mitigation are possible.

To ensure protection against the threat of child predator abuse, Trust & Safety teams at generative AI companies must develop and expand their safeguarding techniques in the following ways.

  • Reconsider on-platform moderation to include the three stages of AI: training data, monitoring prompts and detecting violative generated content.
  • Enrich platform safeguards with a deep understanding and cataloging of predator cultural cues and multi-language keywords.
  • Enable recognition of malicious series of requests that might result in the creation of child sexual content.

 

ActiveFence’s deep knowledge of the child predator landscape and advanced research capabilities into new TTPs will enable those Trust & Safety teams to moderate those questions malicious users pose and control the content their platforms will generate. Click below to learn about our threat intelligence solution.

Table of Contents