Content Moderation 101

August 6, 2023
Digital megaphone projecting colorful geometric shapes from a laptop screen

A core component of Trust & Safety, content moderation involves the use of various harmful content detection tools, both AI-based and human-led. This dynamic field has many aspects and considerations, from operational complexities to nuanced geopolitical decisions.  Read more to understand what content moderation entails, the integral role moderators play in ensuring online safety, and how teams can choose the right approach to protect their users and their platforms.

What is Content Moderation?

Growing online threats and new online safety regulations are prompting platforms of all sizes to create a Trust & Safety strategy. When it comes to ensuring online safety, content moderation is key. This blog provides an overview of content moderation and explores the role content moderators play in ensuring Trust & Safety.

For newly formed Trust & Safety teams, defining content moderation is a critical first step. In general, content moderation is the process of:

  • Reviewing content posted on online platforms
  • Comparing that content to platform policies
  • Determining whether or not that content violates policy

While all moderators work towards a uniform goal, there are many ways to achieve it. Below is a review of different approaches to content moderation.

A hand holding a smartphone with a vibrant, colorful screen displaying generative AI visuals.

Approaches to Content Moderation 

Content moderation looks different for every platform. Once platforms have established their policy, they can begin building their content moderation strategy. Moderation teams should take the following considerations into account when deciding on the right approach:

  • Platform Audience: age, location, interests
  • Content Volume and Type: text, audio, video, and images
  • Type of Communication: one-to-one, one-to-many, or public
  • Additional Considerations: relevant laws, user interests, and PR risks

After evaluating the above, teams can decide on the approach that works for them. Below are some options:

Proactive Content Moderation

Proactive moderation involves identifying potentially harmful content before users see it.

There are many ways to proactively detect harmful content. The intelligence-led approach provides moderators with insights into threat actor tactics. Equipped with these insights, moderators can quickly identify and assess content that may not seem violative, and avoid risks altogether.

Product teams can enhance safety by adding features such as email and age verification, profanity filters, and policy reminders. Certain platforms require moderators to check all content before it is visible to others, in a process called pre-moderation.

Reactive Content Moderation

With reactive content moderation, also known as post moderation, moderators take action on published content that users report. Although this allows for quick responses, relying solely on this method can result in undetected harmful content.

Community-Based Content Moderation

Here, members of the platform’s online community review and decide if the content violates platform rules. Some methods of community moderation include voting on content, and appointing a content moderation team from within the community.

This approach to moderation generally works best in smaller, interest-based forums. Most online platforms will opt for centralized moderation, where teams of moderators review content.

Automated vs Manual Detection

Moderators have limited ability to rapidly review large amounts of posts, links, and videos. Automated content moderation relies on machine learning algorithms and other artificial intelligence (AI) tools to efficiently detect and remove harmful content.

Tools such as NLP, OCR, and digital hash technology can help streamline efficiency. These tools can identify text, images, video, and audio content. By doing so, they make the work of human moderators easier.

While automated tools can find harmful content at scale, they generally struggle with violations, hidden meanings, and language variations. Therefore, the need for manual moderation arises in order to provide contextual and nuanced content detection. For this reason, most online platforms combine human and automated moderation to balance scale and accuracy.

To learn more about content moderation approaches, check out our on-demand webinar, New Approaches to Dealing With Content Moderation Challenges.

Content Moderator Responsibilities

As digital first responders, content moderators are responsible for reviewing user-generated content (UGC) submitted to platforms. Following content review, moderation teams determine an appropriate course of action.

Platform size, audience, and orientation are among the factors that inform a content moderation team. At larger companies, content moderators are part of Trust & Safety teams. A large social media platform, for example, requires a robust team. This may include experts in intelligence, abuse areas, languages, policy, and operations.

In contrast, the role of content moderation at smaller companies may fall under IT, support, legal, or even marketing. In fact, a small platform may only need one person to respond to user queries.

Read our new eBook, Advancing Trust & Safety to learn more.

Maintaining Moderator Resilience

Content moderators must regularly sift through thousands of pieces of content. As a result, moderators will interact with large quantities of malicious posts, videos, and links. Researchers have linked long-term exposure to high volumes of harsh content with anxiety, depression, and PTSD.

To protect moderators, platforms must take steps to prevent and mitigate risks:

  • Prevention involves reducing the volume of manually reviewed content. One way of doing this is integrating automated systems that automatically act on harmful content.
  • Mitigation involves supporting moderator well-being and improving resilience. This may include providing individual and group counseling sessions, mandated time away from the queue, and role rotations.

Types of Content Moderation Actions 

Policy enforcement is a key function of content moderation. Accordingly, moderation teams should formulate enforcement mechanisms to tackle policy violations and address the users who commit them.

While people generally perceive enforcement as a binary decision of keeping or removing content, moderators frequently use other actions. These include:

  • Editing: Applying simple changes to the content, such as removing profanity from text.
  • Hiding: Preventing visibility, and allowing the user to revise the content. Moderators often use this process for first-time offenders, false or misleading information, copyright infringement, profanity, and spam.
  • Warning/Labeling: Applying labels to signal disputed, misleading, or trustworthy content. Moderators can also issue warnings for content, such as potentially gruesome content, nudity, violence, and profanity.
  • Suggesting Alternatives: Redirecting users to safer content when they search for potentially harmful content, such as searches for self-harm techniques.
  • Age Restrictions: Limiting access to platform features or content based on age, or exposure to nudity or profanity.
  • Limiting Visibility/Demoting: Removing harsh content from appearing in searches. This involves downranking the content in suggestion algorithms or disabling the spread of such content. Moderators frequently use this method for spam, illegal goods and services, and false or misleading information.

Access The Guide to Policy Enforcement for a full menu of enforcement actions. 

Measuring Content Moderation

Although there are no universal benchmarks for success, most platforms evaluate their content moderation efficiency based on certain criteria. Key measurements include:

Platform Health:

  • Prevalence: Measures the percentage of harmful content on the platform.
  • Harmful Content Reach: Measures the number of users impacted by harmful content.
  • Restored Content Rate: Measures the amount of content that was restored after it was removed or actioned on.

Operational Measures:

  • Average Handle Time (AHT): The average time it takes to action content.
  • Proactivity Rate: The percentage of content actioned on, before any users engaged with it.
  • Recall: Measures the percent of total harmful content that is detected.
  • Precision: Refers to the percent of detected content that is actually harmful.

People often consider recall and precision as a tradeoff, where high recall necessarily implies low precision. However, using the right combination of automated and manual detection enables teams to strike a delicate balance between the two.

Hand interacting with a digital content moderation panel, symbolizing the process of content moderation

Content Moderation Tools and Services 

Understanding what moderation solution works best for your platform involves both a strategic and cost-benefit analysis.

Find out how to choose the right tools for your platform in our latest guide.

ActiveFence believes that Trust & Safety is a basic right. Every platform should have access to the right tools to protect its users and secure its services. ActiveOS provides a free content moderation platform with no coding required. Among the comprehensive features included, ActiveOS offers:

  • Custom automation queues and actions
  • Automated workflows for policy-based actioning
  • Custom analytics dashboard
  • Moderation control center
  • Tailored risk-score thresholds
  • Policy control center

To find out how ActiveOS can help your platform with free AI-driven moderation, click the button below.

Table of Contents