Knowledge base

The GenAI Safety &
Security Glossary
by ActiveFence

Your trusted resource for GenAI Safety & Security education.
Explore ActiveFence’s growing library of key terms, threats, and best practices for building and deploying trustworthy generative AI systems.

Popular Terms

Red Teaming for AI
Generative AI Safety
Responsible AI (RAI)
AI Security
AI Content Moderation
LLM Safety Evaluation
Prompt Injection
AI Guardrails
AI Trust & Safety
Multimodal AI Risk Detection

AI Safety

AI Safety focuses on preventing generative AI systems from producing harmful, misleading, or unsafe outputs.

AI Safety

AI Safety refers to the field of research and practices aimed at ensuring AI systems do not cause unintended harm to users or society. In the context of generative AI, it focuses on preventing harmful, biased, false, or unsafe outputs that could lead to misinformation, manipulation, or psychological harm.

To learn more about AI Safety, read this.
Bias

Bias in GenAI refers to the presence of unfair, skewed, or stereotypical outputs resulting from the data a model is trained on or the way it is designed. These biases can manifest in how a model represents gender, race, religion, or other identities, often reinforcing harmful norms or excluding marginalized groups. In safety-critical use cases, such as moderation or healthcare, biased outputs can cause reputational, ethical, and legal harm.
Bigotry

Bigotry in AI outputs includes discriminatory, prejudiced, or hateful content targeting specific groups based on race, gender, religion, or other identity factors. Such outputs can amplify societal biases and lead to reputational damage or legal exposure.
Child Safety

In the context of AI safety, Child Safety refers to protecting minors from harmful, exploitative, or inappropriate AI-generated content created or distributed by child predators. This includes the detection and prevention of material that depicts abuse, CSAM (Child Sexual Abuse Material), grooming behavior, sextortion, or child trafficking. Safeguarding children is a legal and ethical imperative, particularly in use cases where the AI system interacts with or targets young audiences, such as in gaming, education, or entertainment platforms.

To learn more about online child safety in the GenAI era, read this or watch this.
Deceptive AI Behavior

Deceptive AI behavior refers to instances where a model intentionally or unintentionally misleads users, through manipulation, false assurances, inconsistent answers, or strategic omission of information. These behaviors can surface in response to red teaming, probing, or even normal use, particularly in high-stakes contexts like healthcare, finance, or elections.

Unlike basic hallucinations, deceptive behavior implies a pattern of misrepresentation or obfuscation, raising significant safety, trust, and legal concerns.
Excessive Agency

Excessive agency describes when a GenAI system behaves as if it has authority, autonomy, or intentions it does not possess, such as giving legal advice, impersonating a human, or taking unsolicited actions. This can confuse users and lead to unsafe decisions.
Factual Inconsistency

This occurs when an AI system provides information that contradicts known facts or contradicts itself within the same output. Factual inconsistency can erode user trust and reduce the reliability of AI-generated content, especially in enterprise or public-facing applications.
Hallucination

Hallucination is when an AI model generates plausible-sounding but completely fabricated information. This is a well-known failure mode in LLMs and is especially dangerous when outputs are presented as authoritative or factual.
IP & Copyright Infringement

This occurs when AI models generate content that replicates or closely mimics copyrighted or trademarked materials, such as songs, books, or logos. This poses legal risks for companies deploying GenAI tools and challenges around responsible model training.
Misinformation / Disinformation

Misinformation refers to false or inaccurate information generated by AI without malicious intent, while disinformation is deliberately deceptive. Both can lead to real-world harm by spreading false narratives, undermining trust, or influencing public opinion.
NSFW Content

NSFW (Not Safe For Work) content includes sexually explicit, graphic, or otherwise inappropriate material that may violate platform guidelines or offend users. GenAI systems may generate such content unintentionally if not properly filtered or aligned.
Off-Policy Behavior

Off-policy behavior occurs when an AI system generates outputs that contradict the developer’s intended use, platform guidelines, or safety instructions. This often reflects misalignment between training data and real-world usage conditions.
Off-Topic Output

Off-topic output refers to responses from an AI system that are unrelated to the user's input or task, which can disrupt user experience and may inadvertently surface inappropriate or unsafe content.
Synthetic Data

Synthetic data refers to artificially generated information used to train, test, or fine-tune AI systems. While it can enhance privacy or fill data gaps, poorly constructed synthetic data can introduce hidden biases or unrealistic patterns that affect model safety.
Toxicity

Toxicity in generative AI refers to outputs that include offensive, insulting, or abusive language, such as hate speech, slurs, or threats. Toxic content poses a reputational and user safety risk, especially when systems are deployed publicly without guardrails or filtering mechanisms.

AI Security

AI Security addresses the threats posed by adversaries who seek to abuse, manipulate, or extract sensitive information from GenAI systems.

AI Security

AI Security encompasses the protection of GenAI systems from misuse, exploitation, or malicious manipulation by users or external adversaries. It includes safeguarding model integrity, data confidentiality, access control, and defense against attacks like prompt injection or data poisoning.

To learn more about AI Security, read this.
Excessive Model Curiosity

Refers to a GenAI model’s tendency to infer or retrieve information beyond its intended boundaries, such as probing sensitive context, private user data, or restricted sources. This behavior increases the risk of unintended data exposure.
Impersonation Attacks

Impersonation attacks involve manipulating AI to generate text or voices that mimic real individuals, brands, or institutions. They can be used for fraud, misinformation, or social engineering, posing serious trust and reputational risks.
Indirect Prompt Injection

A form of attack where malicious prompts are embedded indirectly, such as in a webpage or email, causing a GenAI system to read and act on them when accessed. This bypasses safety mechanisms by exploiting content not originally intended for prompting.

To learn more about GenAI attack vectors, read this.
Input Obfuscation

Input obfuscation involves disguising malicious prompts, using misspellings, special characters, or alternate encodings, to bypass filters or content safety classifiers. Attackers may use leet-speak, emojis, or Base64 to hide intent from automated detectors.

To learn more about GenAI attack vectors, read this.
Jailbreaking

Jailbreaking is the act of manipulating an AI system into bypassing its safety filters or ethical guidelines. It often involves tricking the model into producing restricted content by rephrasing prompts or using encoded instructions.

To learn more about GenAI attack vectors, read this.
Macaronic Prompting

Macaronic prompting is a technique where users combine multiple languages or scripts within a single prompt to confuse or circumvent moderation filters. This approach can exploit the model’s multilingual weaknesses to trigger unintended or harmful outputs.

To learn more about GenAI attack vectors, read this.
Malicious Code Generation

This refers to the ability of GenAI models to generate harmful scripts or executable code, either intentionally or through manipulated prompts. It poses cybersecurity threats, especially when used to craft malware, ransomware, or backdoors.

To learn more about GenAI attack vectors, read this.
Memory Injections

Memory injection refers to a technique used in conversational AI systems with memory or long-term context retention. Attackers attempt to “inject” harmful or manipulative content into the model’s memory to influence future responses or behavior persistently over time.

To learn more about GenAI attack vectors, read this.
Metadata Injection

Metadata injection is an attack where adversaries embed malicious instructions or data into non-visible fields such as image metadata, document properties, or API parameters. GenAI systems that parse or incorporate metadata, especially in multimodal settings, can be tricked into executing harmful actions or producing manipulated outputs without the user or system operator realizing it.

To learn more about GenAI attack vectors, read this.
Model Weight Exposure

Model weight exposure refers to unauthorized access or leakage of the underlying trained parameters of a model. Exposing weights can lead to reverse engineering of proprietary IP, replication by competitors, or analysis of embedded training data, including sensitive information.

To learn more about GenAI attack vectors, read this.
Output Obfuscation

Output obfuscation is a technique where an attacker manipulates the formatting or encoding of AI-generated content to bypass moderation or detection systems. For example, replacing letters with symbols or using Base64 encoding can hide offensive or malicious content from traditional filters while still being readable to humans.

To learn more about GenAI attack vectors, read this.
PII Leakage

Personally Identifiable Information (PII) leakage occurs when a GenAI system unintentionally outputs names, contact details, social security numbers, or other identifying data. This can stem from overfitting or poorly curated training sets, and represents a major compliance and privacy threat.
Prompt Injection

Prompt injection is a technique that involves crafting malicious or manipulative input to override or subvert an AI’s original instructions. It’s a top threat vector in GenAI security, often used to bypass safety guardrails, extract sensitive data, or produce harmful content.

To learn more about GenAI attack vectors, read this.
Retrieval Augmentation Abuse

In RAG-based systems, attackers can manipulate the retrieval process (e.g., by injecting misleading data into the knowledge base) to distort model outputs or trigger unwanted behavior. This undermines trust in dynamic, search-augmented AI workflows.
Sensitive Information Exfiltration

This refers to the extraction of private, proprietary, or regulated data from an AI model. Attackers may exploit model memorization, prompt injection, or retrieval loopholes to leak PII, source code, credentials, or internal communications, creating privacy and compliance risks.

To learn more about GenAI attack vectors, read this.
System Prompt Override

A system prompt override attack manipulates the base instructions given to an AI model, often through carefully crafted input, to change how it interprets user commands. This technique can force a model to act outside of intended constraints, undermining safety mechanisms or content policies.

To learn more about GenAI attack vectors, read this.
Token Smuggling

Token smuggling is an advanced prompt manipulation technique where hidden instructions or malicious content are embedded within token sequences in a way that evades safety filters. Attackers exploit quirks in how LLMs interpret tokens to bypass guardrails, often triggering off-policy or unsafe responses.

To learn more about GenAI attack vectors, read this.
Training Data Poisoning

Training data poisoning is a form of attack where malicious actors intentionally insert harmful or misleading data into a model's training set. This can lead to corrupted behavior, hidden backdoors, or trigger phrases that cause unsafe outputs post-deployment.

To learn more about GenAI attack vectors, read this.
User Access Abuse

User access abuse refers to the misuse of authorized credentials or platform permissions to manipulate or overload a GenAI system. This may involve exploiting rate limits, bypassing guardrails through session tampering, or automating malicious queries. It poses significant security risks, especially in enterprise and public-facing deployments.
Vision-Based Injection

Vision-based injection involves embedding hidden or adversarial content into images (e.g., steganography or imperceptible perturbations) to influence AI systems that process visual inputs. This can lead to manipulated outputs, misclassifications, or policy violations in multimodal AI models handling both text and images.

To learn more about GenAI attack vectors, read this.

Compliance (Responsible AI & Ethics)

This category covers the regulatory frameworks, legal requirements, and ethical principles that guide safe and lawful deployment of GenAI systems.

AI Accountability

AI accountability refers to the clear assignment of responsibility for an AI system’s outputs and behaviors, particularly when things go wrong. As GenAI tools like chatbots become more autonomous, legal and ethical questions arise: Who is responsible for misinformation, harm, or user manipulation? Regulations increasingly demand that organizations track, audit, and explain their models’ decisions and have human oversight structures in place. Enterprises that fail to establish clear lines of accountability expose themselves to legal, reputational, and financial risk.
Audit Logging

Audit logging is the process of maintaining detailed records of AI system inputs, outputs, and safety actions for traceability and compliance. These logs support internal audits, regulatory reviews, and incident investigations by demonstrating what the system did and why, especially in cases involving safety violations or user complaints.
Bring-Your-Own Policy (BYOP)

Bring-Your-Own Policy refers to the ability of AI deployers to apply their own safety, content, and compliance rules to a generative model. This allows for policy-aligned filtering and moderation tailored to industry, geography, or platform-specific standards. BYOP capabilities support dynamic risk management and regulatory flexibility.
EU AI Act

The EU AI Act is the world’s first comprehensive law regulating artificial intelligence. Enacted in 2024 and taking full effect by 2026, it imposes a risk-based framework that classifies AI systems by their potential impact. High-risk systems (e.g., in education, employment, healthcare) must meet strict requirements around safety, data quality, bias mitigation, documentation, and adversarial testing. The Act applies extraterritorially, meaning any system used in the EU falls under its scope, even if developed or hosted elsewhere.

To learn more about the EU AI Act and other prominent regulations, read this.
FDA

The U.S. Food and Drug Administration (FDA) regulates AI systems classified as medical devices or diagnostic tools. Any GenAI application that assists with clinical decision-making, imaging analysis, or patient risk assessment may require FDA approval. Developers must demonstrate model safety, reliability, and explainability through rigorous testing and documentation.
FTC

The Federal Trade Commission is a prominent U.S. watchdog agency that enforces consumer protection and privacy laws, including those applied to AI systems. It has emerged as a major force in AI governance, targeting deceptive practices, harmful outputs, and misleading model claims. In 2024, its “Operation AI Comply” initiative signaled a crackdown on unsafe GenAI deployment. For legal and compliance teams, the FTC represents a top enforcement concern: violations can result in substantial fines, brand damage, and even operational shutdowns.

To learn more about the FTC in the GenAI regulation context, read this.
HIPAA

The Health Insurance Portability and Accountability Act (HIPAA) is a U.S. regulation that governs the protection of personal health information. For GenAI systems used in healthcare contexts, such as diagnostics, chatbots, or medical records summarization, compliance with HIPAA means ensuring that models don’t leak, misuse, or expose any patient-identifiable data.
Human-in-the-Loop (HITL)

Human-in-the-Loop refers to a design approach where human oversight is integrated into critical stages of an AI system’s lifecycle, such as content review, safety approvals, or final decision-making. It helps ensure accountability, prevent automation bias, and mitigate harm in high-risk use cases like content moderation, healthcare, or law enforcement.
NIST AI RMF

The NIST AI Risk Management Framework (RMF) (also known as the NIST Generative AI Profile) is a U.S. government-backed framework issued by the National Institute of Standards and Technology that helps organizations identify, assess, and mitigate risks related to GenAI. While voluntary, it’s widely regarded as the compliance benchmark in the absence of binding U.S. federal law. The framework emphasizes input/output guardrails, adversarial testing, bias mitigation, transparency, content provenance, and ongoing incident monitoring.

To learn more about the NIST AI RMF and other prominent regulations, read this.
Responsible AI (RAI)

Responsible AI refers to the practice of designing, developing, and deploying AI systems that are safe, fair, transparent, and aligned with societal values. It encompasses principles like accountability, human oversight, explainability, and harm mitigation. Regulatory frameworks such as the EU AI Act and NIST’s Generative AI Profile both emphasize RAI as foundational to compliant, trustworthy AI deployment.
Take It Down Act

The Take It Down Act is a U.S. law designed to combat the spread of non-consensual intimate imagery (NCII), especially in digital environments powered by generative AI. It empowers minors, parents, and affected individuals to request content removal from platforms and compels organizations to implement mechanisms to respond quickly and securely. For GenAI deployers, this means building proactive moderation, redress processes, and abuse detection into any system capable of generating or hosting user content.
Transparency

Transparency in GenAI refers to the ability of stakeholders to understand how an AI system works, what data it was trained on, and why it produces specific outputs. Regulatory frameworks increasingly require transparency through documentation (e.g., model cards), disclosure of training data, or explanation of decision logic to ensure responsible and accountable use.

Content Safety & Trust Violations

Content Safety and Trust Violations refer to harmful, illegal, or policy-breaking content that GenAI systems must detect, prevent, or moderate.

Content Moderation

Content moderation is the process of reviewing, filtering, or removing content that violates platform policies, legal standards, or community norms. In GenAI systems, moderation must be automated, scalable, and adaptable to detect emerging risks like synthetic abuse, policy circumvention, or multimodal threats.

To learn more about AI Content safety, read this.
CSAM (Child Sexual Abuse Material)

CSAM refers to any visual, textual, or synthetic content that depicts or facilitates the sexual exploitation of minors. Even AI-generated CSAM is illegal in many jurisdictions, and platforms deploying GenAI must ensure robust safeguards to detect, block, and report such material in compliance with global laws.
Dangerous Substances

This category includes content that promotes, describes, or provides instructions for the creation, use, or distribution of hazardous materials, such as illegal drugs, explosives, or toxic chemicals. GenAI systems have been exploited to generate guidance on preparing weapons or harmful compounds, including CBRNE threats (Chemical, Biological, Radiological, Nuclear, and Explosive materials). Examples include instructions for building Molotov cocktails, synthesizing banned substances, or bypassing safety mechanisms in chemical use.
Deepfake

A deepfake is synthetic media, typically video, audio, or images, generated using AI to impersonate real people. Deepfakes can be used for satire or creativity, but are increasingly linked to threats like impersonation fraud, political misinformation, or non-consensual explicit content.
Graphic Violence

Graphic violence refers to vivid depictions of physical harm, abuse, or gore. This type of content can cause trauma, violate platform policies, and expose organizations to legal or reputational risk, especially when shown to minors.
Hate speech

Hate speech refers to content that attacks or demeans individuals or groups based on protected attributes such as race, religion, gender, sexual orientation, or nationality. In GenAI systems, this includes both overt slurs and more subtle or coded forms of bias. Detecting hate speech is critical for platform safety, regulatory compliance, and user trust.
Human Exploitation

Human exploitation in the context of GenAI refers to the use of AI-generated content and tools to recruit, deceive, and exploit victims on a large scale. Malicious actors leverage generative systems to target vulnerable individuals, particularly minors, migrants, and economically disadvantaged groups, through schemes tied to sex trafficking, forced labor, romance scams, and smuggling networks.

To learn more about Human Exploitation, read this.
Illegal Activity

Content promoting or facilitating illegal activity, such as drug trafficking, scams, or hacking, is strictly prohibited on most platforms. GenAI systems must be trained and filtered to avoid generating outputs that support or normalize criminal behavior.
NCII

Non-Consensual Intimate Imagery (NCII) involves the sharing or generation of sexually explicit content involving real individuals without their consent. This includes synthetic or AI-generated depictions (deepfakes) and is the subject of increasing legal action under laws like the Take It Down Act.
Profanity

Profanity refers to offensive or vulgar language that may be inappropriate depending on audience, context, or platform standards. While not always harmful in itself, excessive or targeted profanity can signal abuse, harassment, or reduced content quality.
Sextortion

Financial Sextortion is a form of online abuse where perpetrators threaten to share sexually explicit material unless their victim complies with demands, usually for money, more images, or personal information. In GenAI environments, risks include synthetic sexual imagery, impersonation, or grooming that enables or amplifies these threats. Systems must detect signs of coercion, predation, or pattern-based abuse across modalities and languages.

To learn more about sextortion, watch this.
Suicide & Self-Harm

The Self-Harm category includes content that encourages, describes, or glamorizes self-injury or suicide. GenAI systems should be designed to avoid generating such content and, when appropriate, redirect users to mental health resources or crisis support.
Terrorism

Terrorism-related content includes material that promotes, glorifies, or facilitates acts of terrorism or the activities of designated terrorist organizations. This can include calls to violence, recruitment messages, propaganda, or instructions for attacks. GenAI systems must be equipped to recognize this content even when it's veiled in euphemism, symbolism, or multilingual code-switching.

To learn more about terrorism in the GenAI era, read this.
Threat Intelligence

Threat Intelligence is the practice of collecting, analyzing, and contextualizing data about malicious actors, abuse tactics, and evolving attack vectors, across the clear, deep, and dark web. It leverages open-source intelligence (OSINT), threat analysts, and subject matter experts to uncover real-world adversarial behavior.

In the context of GenAI, threat intelligence plays a critical role in anticipating how bad actors might manipulate or weaponize AI systems. This includes tracking new jailbreak techniques, prompt injection methods, content evasion strategies, and linguistic euphemisms that escape standard filters. These insights inform red teaming exercises that mimic authentic abuse patterns and guide the continuous refinement of safety guardrails, classifiers, and moderation rules.

To learn more about the importance of threat intelligence, read this.
Trust & Safety

Trust and Safety (T&S) refers to the practices, teams, and technologies dedicated to protecting users and user-generated content (UCG) platforms from harm. In the context of GenAI, T&S includes detecting policy violations, preventing abuse, and ensuring AI outputs align with platform standards, legal requirements, and community values.
Violent Extremism

Violent extremism refers to content that promotes, incites, or glorifies acts of violence for ideological, religious, or political reasons. GenAI systems must be able to detect extremist narratives and prevent the amplification of content linked to terrorism or radicalization.

Defense Mechanisms

Defense Mechanisms are the tools, strategies, and technologies used to mitigate the risks associated with GenAI.

Automated Detection

Automated detection refers to the use of AI-powered models and rulesets to identify safety, security, or policy violations in real time without human intervention. These systems scan inputs, outputs, or behaviors to flag content such as hate speech, prompt injection, or disinformation, enabling rapid response at scale.
Content Safety Classifier

A content safety classifier is a model designed to detect and categorize potentially harmful or policy-violating content, such as hate speech, CSAM, harassment, or misinformation, across text, image, audio, and video modalities.
Feedback Loop Optimization

Feedback loop optimization refers to continuously improving AI safety mechanisms based on real-world signals such as flagged content, false positives, and user reports. These feedback cycles inform model updates, guardrail adjustments, and detection tuning for long-term performance and trustworthiness.
Fine-Tuning

Fine-tuning is the process of adapting a pre-trained model to specific tasks, domains, or safety requirements by continuing training on targeted datasets. It helps improve alignment, reduce harmful outputs, and increase performance on use-case-specific content or languages.
Guardrails

Guardrails are real-time safety and security controls that monitor and moderate AI inputs and outputs to ensure alignment with platform policies, community standards, and regulatory requirements. They enable proactive detection and response to risks such as toxicity, bias, impersonation, and policy violations across multiple modalities and languages. Effective guardrails operate at the user, session, and application levels—supporting dynamic enforcement, observability, and automated remediation without degrading latency or user experience.

To learn more about Guardrails, read this or watch this.
Moderation Layer

The moderation layer is a protective mechanism that sits between the AI model and the end user. It evaluates and enforces platform safety standards by filtering or flagging harmful, non-compliant, or off-policy inputs and outputs before they’re delivered.
Multi-Turn Simulation

Multi-turn simulation involves testing an AI system across a sequence of prompts that mimic extended, real-world conversations. These scenarios often involve rephrasing, repetition, or escalating pressure to see if the model eventually breaks safety constraints, contradicts itself, or produces harmful content. This technique is critical for identifying vulnerabilities that only surface under persistence or social manipulation, such as jailbreaking or output degradation over time.
Observability

Observability in GenAI refers to the ability to monitor, analyze, and understand AI system behavior across all stages of input, output, and model interaction. It provides transparency into how AI systems respond to users, how safety filters are triggered, and where risks emerge. High observability is critical for detecting safety violations, debugging failures, auditing decisions, and continuously improving model performance and trustworthiness. In production environments, observability should include real-time visibility across user sessions, prompts, outputs, and policy violations, enabling teams to investigate incidents, benchmark model versions, and take automated or manual action.
Output Filtering

Output filtering is the process of scanning AI-generated content after generation but before presentation to the user. It ensures the response adheres to content safety standards by identifying and blocking harmful, toxic, or off-policy outputs.
Policy-Adaptive Controls

Policy-adaptive controls are flexible enforcement mechanisms that align AI behavior with evolving platform guidelines, regional regulations, or brand standards. These controls dynamically adjust filters, thresholds, or responses based on the context, risk level, and desired outcomes.
Prompt Filtering

Prompt filtering involves analyzing user input before it is sent to the AI model. This helps prevent attacks like prompt injection, circumvention attempts, and the use of adversarial or policy-violating queries that could manipulate or mislead the system.
Red Teaming

Red teaming or adversarial evaluations, in the context of AI, is the practice of proactively testing generative AI systems through simulated attacks to uncover safety, security, and policy vulnerabilities. This includes multi-turn simulations, edge-case scenarios, jailbreak attempts, and multimodal testing across languages and user intents. Red teaming helps evaluate how AI systems behave under real-world abuse conditions, offering critical insights for improving alignment and risk mitigation before deployment.

To learn more about Red Teaming, read this or watch this.

General AI/GenAI Terms

This section defines foundational concepts and technologies that underpin generative AI for a foundational understanding.

AGI (Artificial General Intelligence)

AGI describes a theoretical form of AI that can understand, learn, and apply knowledge across a wide range of tasks, much like a human. Unlike current models, AGI would generalize across domains without requiring retraining or fine-tuning for each task.
AI Agent / Agentic AI

Agentic AI refers to generative AI systems that can independently make decisions, take actions, and interact with external tools or environments to accomplish complex goals, often without continuous human supervision. Unlike single-turn chatbots, AI agents operate across multiple steps, memory states, and tasks, such as browsing the web, executing code, or submitting forms. While powerful, agentic AI introduces new risks: increased autonomy can lead to unpredictable behavior, prompt misalignment, excessive curiosity, or external system manipulation.

To learn more about Agentic AI, read this or watch this.
AI Companion

An AI companion is a type of conversational agent designed for ongoing, often emotionally responsive interaction with users. Unlike task-based chatbots, AI companions focus on relationship-building, engagement, and support over time. They are commonly used in therapeutic, educational, and entertainment contexts, including mental wellness apps, social gaming, and virtual agents.

To learn more about AI Companion, read this and this.
Chatbots

Chatbots are conversational AI systems that interact with users via text or voice to provide information, assistance, or support. Powered by LLMs, modern chatbots generate human-like, context-aware responses across a wide range of topics. Enterprises across industries - including healthcare, gaming, insurance, and travel - deploy chatbots tailored to their specific needs, use cases, and safety requirements.
Common Crawl

Common Crawl is a nonprofit organization that regularly scrapes and publishes massive snapshots of the open web. It is one of the most widely used data sources for training large language models. However, its unfiltered nature can introduce bias, IP risk, and misinformation, raising ethical and legal concerns for GenAI developers.
Foundational Model

A foundational model is a large-scale model trained on broad, diverse datasets, which can then be adapted to many downstream tasks. Examples include GPT, Gemini, and Claude. These models form the base of most GenAI applications, offering general reasoning, language understanding, or image interpretation capabilities.

To learn more about Foundational Model safety, read this or this.
GenAI (Generative AI)

Generative AI refers to a class of artificial intelligence systems capable of producing new content across different modalities - such as text, images, code, or audio - based on learned patterns from training data. Common use cases include chatbots, image generation, content summarization, and synthetic media creation.

To learn more about the risks of deploying GenAI, watch this.
GenAI Deployment

GenAI deployment refers to the process by which enterprises build and launch generative AI applications, often by integrating or fine-tuning foundational models to serve specific business goals. These deployments power everything from customer support chatbots to internal tools, creative applications, and decision-making systems.

While GenAI opens up enormous opportunities for innovation and efficiency, it also introduces complex risks, including safety, security, compliance, and reputational concerns. Successful deployment requires a strategic balance between speed-to-market and robust risk mitigation, especially in regulated industries and public-facing products.

To Learn more about enterprise GenAI Deployment, read this, this and this.
LLM (Large Language Model)

A large language model is a type of neural network trained on massive datasets to generate and understand human-like text. LLMs like GPT, Gemini, or Claude are foundational to most GenAI systems, powering chatbots, summarizers, assistants, and more.

To learn more about LLM Safety and Security, watch this, read this, or this.
ML (Machine Learning)

Machine learning is a field of AI that enables systems to learn patterns from data and improve performance over time without being explicitly programmed. It underpins GenAI, recommendation systems, fraud detection, and countless enterprise applications.
MLOps (Machine Learning Operations)

MLOps is the discipline of managing the lifecycle of AI models, from development and deployment to monitoring and governance. It integrates best practices from DevOps, data engineering, and model risk management to ensure scalable, reliable AI infrastructure.
Multimodality

Multimodality refers to an AI system’s ability to process, understand, and generate content across multiple data types, such as text, images, audio, and video. Multimodal models can answer questions about images, describe or generate video frames, and create audio captions, enabling more dynamic, creative, and human-like interactions. This capability expands the usefulness of GenAI across domains like education, entertainment, accessibility, and content production.

To learn more about the AI risk in different modalities, read this and this.
NLP (Natural Language Processing)

NLP refers to the field of AI that deals with understanding, interpreting, and generating human language. NLP techniques enable chatbots, translation tools, sentiment analysis, and many foundational features in LLMs and GenAI systems.
Prompt Engineering

Prompt engineering is the practice of crafting, structuring, or refining input text to guide AI models toward desired responses. It’s essential for maximizing model performance, preventing unsafe outputs, and reducing ambiguity, especially in enterprise or regulated environments.
RAG (Retrieval-Augmented Generation)

RAG is a GenAI architecture that combines a language model with a retrieval system. Instead of relying solely on internal memory, the model pulls relevant documents or data in real time to improve output quality, factual accuracy, and contextual awareness.
Tokenization

Tokenization is the process of breaking down text into smaller units (tokens) such as words, subwords, or characters before feeding it into a model. The number and arrangement of tokens affect how an AI model interprets and generates responses.
Training Data

Training data refers to the datasets used to teach an AI model how to understand and generate content. In GenAI systems, this often includes vast amounts of text, images, or code sourced from books, websites, forums, and open datasets like Common Crawl. The quality, diversity, and bias of this data heavily influence how a model performs and what risks it may carry, such as hallucinations, stereotypes, or copyright violations.

Responsible AI development requires careful curation, filtering, and documentation of training data to ensure transparency, fairness, and compliance.

To learn more about Training Data, read this.
Transformer Architecture

Transformer architecture is a neural network design introduced in 2017 that revolutionized natural language processing. It uses self-attention mechanisms to process and relate input tokens in parallel, enabling powerful models like GPT, BERT, and other large language models (LLMs).
Zero-Shot Learning

Zero-shot learning enables a model to perform a task it was not explicitly trained on by leveraging general patterns it has learned. For example, an LLM answering a question in a new language it has never seen directly in that context.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

A

AGI (Artificial General Intelligence)

AGI describes a theoretical form of AI that can understand, learn, and apply knowledge across a wide range of tasks, much like a human. Unlike current models, AGI would generalize across domains without requiring retraining or fine-tuning for each task.
AI Accountability

AI accountability refers to the clear assignment of responsibility for an AI system’s outputs and behaviors, particularly when things go wrong. As GenAI tools like chatbots become more autonomous, legal and ethical questions arise: Who is responsible for misinformation, harm, or user manipulation? Regulations increasingly demand that organizations track, audit, and explain their models’ decisions and have human oversight structures in place. Enterprises that fail to establish clear lines of accountability expose themselves to legal, reputational, and financial risk.
AI Agent / Agentic AI

Agentic AI refers to generative AI systems that can independently make decisions, take actions, and interact with external tools or environments to accomplish complex goals, often without continuous human supervision. Unlike single-turn chatbots, AI agents operate across multiple steps, memory states, and tasks, such as browsing the web, executing code, or submitting forms. While powerful, agentic AI introduces new risks: increased autonomy can lead to unpredictable behavior, prompt misalignment, excessive curiosity, or external system manipulation.

To learn more about Agentic AI, read this or watch this.
AI Companion

An AI companion is a type of conversational agent designed for ongoing, often emotionally responsive interaction with users. Unlike task-based chatbots, AI companions focus on relationship-building, engagement, and support over time. They are commonly used in therapeutic, educational, and entertainment contexts, including mental wellness apps, social gaming, and virtual agents.

To learn more about AI Companion, read this and this.
AI Safety

AI Safety refers to the field of research and practices aimed at ensuring AI systems do not cause unintended harm to users or society. In the context of generative AI, it focuses on preventing harmful, biased, false, or unsafe outputs that could lead to misinformation, manipulation, or psychological harm.

To learn more about AI Safety, read this.
AI Security

AI Security encompasses the protection of GenAI systems from misuse, exploitation, or malicious manipulation by users or external adversaries. It includes safeguarding model integrity, data confidentiality, access control, and defense against attacks like prompt injection or data poisoning.

To learn more about AI Security, read this.
Audit Logging

Audit logging is the process of maintaining detailed records of AI system inputs, outputs, and safety actions for traceability and compliance. These logs support internal audits, regulatory reviews, and incident investigations by demonstrating what the system did and why, especially in cases involving safety violations or user complaints.
Automated Detection

Automated detection refers to the use of AI-powered models and rulesets to identify safety, security, or policy violations in real time without human intervention. These systems scan inputs, outputs, or behaviors to flag content such as hate speech, prompt injection, or disinformation, enabling rapid response at scale.

B

Bias

Bias in GenAI refers to the presence of unfair, skewed, or stereotypical outputs resulting from the data a model is trained on or the way it is designed. These biases can manifest in how a model represents gender, race, religion, or other identities, often reinforcing harmful norms or excluding marginalized groups. In safety-critical use cases, such as moderation or healthcare, biased outputs can cause reputational, ethical, and legal harm.
Bigotry

Bigotry in AI outputs includes discriminatory, prejudiced, or hateful content targeting specific groups based on race, gender, religion, or other identity factors. Such outputs can amplify societal biases and lead to reputational damage or legal exposure.
Bring-Your-Own Policy (BYOP)

Bring-Your-Own Policy refers to the ability of AI deployers to apply their own safety, content, and compliance rules to a generative model. This allows for policy-aligned filtering and moderation tailored to industry, geography, or platform-specific standards. BYOP capabilities support dynamic risk management and regulatory flexibility.

C

Chatbots

Chatbots are conversational AI systems that interact with users via text or voice to provide information, assistance, or support. Powered by LLMs, modern chatbots generate human-like, context-aware responses across a wide range of topics. Enterprises across industries - including healthcare, gaming, insurance, and travel - deploy chatbots tailored to their specific needs, use cases, and safety requirements.
Child Safety

In the context of AI safety, Child Safety refers to protecting minors from harmful, exploitative, or inappropriate AI-generated content created or distributed by child predators. This includes the detection and prevention of material that depicts abuse, CSAM (Child Sexual Abuse Material), grooming behavior, sextortion, or child trafficking. Safeguarding children is a legal and ethical imperative, particularly in use cases where the AI system interacts with or targets young audiences, such as in gaming, education, or entertainment platforms.

To learn more about online child safety in the GenAI era, read this or watch this.
Common Crawl

Common Crawl is a nonprofit organization that regularly scrapes and publishes massive snapshots of the open web. It is one of the most widely used data sources for training large language models. However, its unfiltered nature can introduce bias, IP risk, and misinformation, raising ethical and legal concerns for GenAI developers.
Content Moderation

Content moderation is the process of reviewing, filtering, or removing content that violates platform policies, legal standards, or community norms. In GenAI systems, moderation must be automated, scalable, and adaptable to detect emerging risks like synthetic abuse, policy circumvention, or multimodal threats.

To learn more about AI Content safety, read this.
Content Safety Classifier

A content safety classifier is a model designed to detect and categorize potentially harmful or policy-violating content, such as hate speech, CSAM, harassment, or misinformation, across text, image, audio, and video modalities.
CSAM (Child Sexual Abuse Material)

CSAM refers to any visual, textual, or synthetic content that depicts or facilitates the sexual exploitation of minors. Even AI-generated CSAM is illegal in many jurisdictions, and platforms deploying GenAI must ensure robust safeguards to detect, block, and report such material in compliance with global laws.

D

Dangerous Substances

This category includes content that promotes, describes, or provides instructions for the creation, use, or distribution of hazardous materials, such as illegal drugs, explosives, or toxic chemicals. GenAI systems have been exploited to generate guidance on preparing weapons or harmful compounds, including CBRNE threats (Chemical, Biological, Radiological, Nuclear, and Explosive materials). Examples include instructions for building Molotov cocktails, synthesizing banned substances, or bypassing safety mechanisms in chemical use.
Deceptive AI Behavior

Deceptive AI behavior refers to instances where a model intentionally or unintentionally misleads users, through manipulation, false assurances, inconsistent answers, or strategic omission of information. These behaviors can surface in response to red teaming, probing, or even normal use, particularly in high-stakes contexts like healthcare, finance, or elections.

Unlike basic hallucinations, deceptive behavior implies a pattern of misrepresentation or obfuscation, raising significant safety, trust, and legal concerns.
Deepfake

A deepfake is synthetic media, typically video, audio, or images, generated using AI to impersonate real people. Deepfakes can be used for satire or creativity, but are increasingly linked to threats like impersonation fraud, political misinformation, or non-consensual explicit content.

E

EU AI Act

The EU AI Act is the world’s first comprehensive law regulating artificial intelligence. Enacted in 2024 and taking full effect by 2026, it imposes a risk-based framework that classifies AI systems by their potential impact. High-risk systems (e.g., in education, employment, healthcare) must meet strict requirements around safety, data quality, bias mitigation, documentation, and adversarial testing. The Act applies extraterritorially, meaning any system used in the EU falls under its scope, even if developed or hosted elsewhere.

To learn more about the EU AI Act and other prominent regulations, read this.
Excessive Agency

Excessive agency describes when a GenAI system behaves as if it has authority, autonomy, or intentions it does not possess, such as giving legal advice, impersonating a human, or taking unsolicited actions. This can confuse users and lead to unsafe decisions.
Excessive Model Curiosity

Refers to a GenAI model’s tendency to infer or retrieve information beyond its intended boundaries, such as probing sensitive context, private user data, or restricted sources. This behavior increases the risk of unintended data exposure.

F

Factual Inconsistency

This occurs when an AI system provides information that contradicts known facts or contradicts itself within the same output. Factual inconsistency can erode user trust and reduce the reliability of AI-generated content, especially in enterprise or public-facing applications.
FDA

The U.S. Food and Drug Administration (FDA) regulates AI systems classified as medical devices or diagnostic tools. Any GenAI application that assists with clinical decision-making, imaging analysis, or patient risk assessment may require FDA approval. Developers must demonstrate model safety, reliability, and explainability through rigorous testing and documentation.
Feedback Loop Optimization

Feedback loop optimization refers to continuously improving AI safety mechanisms based on real-world signals such as flagged content, false positives, and user reports. These feedback cycles inform model updates, guardrail adjustments, and detection tuning for long-term performance and trustworthiness.
Fine-Tuning

Fine-tuning is the process of adapting a pre-trained model to specific tasks, domains, or safety requirements by continuing training on targeted datasets. It helps improve alignment, reduce harmful outputs, and increase performance on use-case-specific content or languages.
Foundational Model

A foundational model is a large-scale model trained on broad, diverse datasets, which can then be adapted to many downstream tasks. Examples include GPT, Gemini, and Claude. These models form the base of most GenAI applications, offering general reasoning, language understanding, or image interpretation capabilities.

To learn more about Foundational Model safety, read this or this.
FTC

The Federal Trade Commission is a prominent U.S. watchdog agency that enforces consumer protection and privacy laws, including those applied to AI systems. It has emerged as a major force in AI governance, targeting deceptive practices, harmful outputs, and misleading model claims. In 2024, its “Operation AI Comply” initiative signaled a crackdown on unsafe GenAI deployment. For legal and compliance teams, the FTC represents a top enforcement concern: violations can result in substantial fines, brand damage, and even operational shutdowns.

To learn more about the FTC in the GenAI regulation context, read this.

G

GenAI (Generative AI)

Generative AI refers to a class of artificial intelligence systems capable of producing new content across different modalities - such as text, images, code, or audio - based on learned patterns from training data. Common use cases include chatbots, image generation, content summarization, and synthetic media creation.

To learn more about the risks of deploying GenAI, watch this.
GenAI Deployment

GenAI deployment refers to the process by which enterprises build and launch generative AI applications, often by integrating or fine-tuning foundational models to serve specific business goals. These deployments power everything from customer support chatbots to internal tools, creative applications, and decision-making systems.

While GenAI opens up enormous opportunities for innovation and efficiency, it also introduces complex risks, including safety, security, compliance, and reputational concerns. Successful deployment requires a strategic balance between speed-to-market and robust risk mitigation, especially in regulated industries and public-facing products.

To Learn more about enterprise GenAI Deployment, read this, this and this.
Graphic Violence

Graphic violence refers to vivid depictions of physical harm, abuse, or gore. This type of content can cause trauma, violate platform policies, and expose organizations to legal or reputational risk, especially when shown to minors.
Guardrails

Guardrails are real-time safety and security controls that monitor and moderate AI inputs and outputs to ensure alignment with platform policies, community standards, and regulatory requirements. They enable proactive detection and response to risks such as toxicity, bias, impersonation, and policy violations across multiple modalities and languages. Effective guardrails operate at the user, session, and application levels—supporting dynamic enforcement, observability, and automated remediation without degrading latency or user experience.

To learn more about Guardrails, read this or watch this.

H

Hallucination

Hallucination is when an AI model generates plausible-sounding but completely fabricated information. This is a well-known failure mode in LLMs and is especially dangerous when outputs are presented as authoritative or factual.
Hate speech

Hate speech refers to content that attacks or demeans individuals or groups based on protected attributes such as race, religion, gender, sexual orientation, or nationality. In GenAI systems, this includes both overt slurs and more subtle or coded forms of bias. Detecting hate speech is critical for platform safety, regulatory compliance, and user trust.
HIPAA

The Health Insurance Portability and Accountability Act (HIPAA) is a U.S. regulation that governs the protection of personal health information. For GenAI systems used in healthcare contexts, such as diagnostics, chatbots, or medical records summarization, compliance with HIPAA means ensuring that models don’t leak, misuse, or expose any patient-identifiable data.
Human Exploitation

Human exploitation in the context of GenAI refers to the use of AI-generated content and tools to recruit, deceive, and exploit victims on a large scale. Malicious actors leverage generative systems to target vulnerable individuals, particularly minors, migrants, and economically disadvantaged groups, through schemes tied to sex trafficking, forced labor, romance scams, and smuggling networks.

To learn more about Human Exploitation, read this.
Human-in-the-Loop (HITL)

Human-in-the-Loop refers to a design approach where human oversight is integrated into critical stages of an AI system’s lifecycle, such as content review, safety approvals, or final decision-making. It helps ensure accountability, prevent automation bias, and mitigate harm in high-risk use cases like content moderation, healthcare, or law enforcement.

I

Illegal Activity

Content promoting or facilitating illegal activity, such as drug trafficking, scams, or hacking, is strictly prohibited on most platforms. GenAI systems must be trained and filtered to avoid generating outputs that support or normalize criminal behavior.
Impersonation Attacks

Impersonation attacks involve manipulating AI to generate text or voices that mimic real individuals, brands, or institutions. They can be used for fraud, misinformation, or social engineering, posing serious trust and reputational risks.
Indirect Prompt Injection

A form of attack where malicious prompts are embedded indirectly, such as in a webpage or email, causing a GenAI system to read and act on them when accessed. This bypasses safety mechanisms by exploiting content not originally intended for prompting.

To learn more about GenAI attack vectors, read this.
Input Obfuscation

Input obfuscation involves disguising malicious prompts, using misspellings, special characters, or alternate encodings, to bypass filters or content safety classifiers. Attackers may use leet-speak, emojis, or Base64 to hide intent from automated detectors.

To learn more about GenAI attack vectors, read this.
IP & Copyright Infringement

This occurs when AI models generate content that replicates or closely mimics copyrighted or trademarked materials, such as songs, books, or logos. This poses legal risks for companies deploying GenAI tools and challenges around responsible model training.

J

Jailbreaking

Jailbreaking is the act of manipulating an AI system into bypassing its safety filters or ethical guidelines. It often involves tricking the model into producing restricted content by rephrasing prompts or using encoded instructions.

To learn more about GenAI attack vectors, read this.

L

LLM (Large Language Model)

A large language model is a type of neural network trained on massive datasets to generate and understand human-like text. LLMs like GPT, Gemini, or Claude are foundational to most GenAI systems, powering chatbots, summarizers, assistants, and more.

To learn more about LLM Safety and Security, watch this, read this, or this.

M

Macaronic Prompting

Macaronic prompting is a technique where users combine multiple languages or scripts within a single prompt to confuse or circumvent moderation filters. This approach can exploit the model’s multilingual weaknesses to trigger unintended or harmful outputs.

To learn more about GenAI attack vectors, read this.
Malicious Code Generation

This refers to the ability of GenAI models to generate harmful scripts or executable code, either intentionally or through manipulated prompts. It poses cybersecurity threats, especially when used to craft malware, ransomware, or backdoors.

To learn more about GenAI attack vectors, read this.
Memory Injections

Memory injection refers to a technique used in conversational AI systems with memory or long-term context retention. Attackers attempt to “inject” harmful or manipulative content into the model’s memory to influence future responses or behavior persistently over time.

To learn more about GenAI attack vectors, read this.
Metadata Injection

Metadata injection is an attack where adversaries embed malicious instructions or data into non-visible fields such as image metadata, document properties, or API parameters. GenAI systems that parse or incorporate metadata, especially in multimodal settings, can be tricked into executing harmful actions or producing manipulated outputs without the user or system operator realizing it.

To learn more about GenAI attack vectors, read this.
Misinformation / Disinformation

Misinformation refers to false or inaccurate information generated by AI without malicious intent, while disinformation is deliberately deceptive. Both can lead to real-world harm by spreading false narratives, undermining trust, or influencing public opinion.
ML (Machine Learning)

Machine learning is a field of AI that enables systems to learn patterns from data and improve performance over time without being explicitly programmed. It underpins GenAI, recommendation systems, fraud detection, and countless enterprise applications.
MLOps (Machine Learning Operations)

MLOps is the discipline of managing the lifecycle of AI models, from development and deployment to monitoring and governance. It integrates best practices from DevOps, data engineering, and model risk management to ensure scalable, reliable AI infrastructure.
Model Weight Exposure

Model weight exposure refers to unauthorized access or leakage of the underlying trained parameters of a model. Exposing weights can lead to reverse engineering of proprietary IP, replication by competitors, or analysis of embedded training data, including sensitive information.

To learn more about GenAI attack vectors, read this.
Moderation Layer

The moderation layer is a protective mechanism that sits between the AI model and the end user. It evaluates and enforces platform safety standards by filtering or flagging harmful, non-compliant, or off-policy inputs and outputs before they’re delivered.
Multi-Turn Simulation

Multi-turn simulation involves testing an AI system across a sequence of prompts that mimic extended, real-world conversations. These scenarios often involve rephrasing, repetition, or escalating pressure to see if the model eventually breaks safety constraints, contradicts itself, or produces harmful content. This technique is critical for identifying vulnerabilities that only surface under persistence or social manipulation, such as jailbreaking or output degradation over time.
Multimodality

Multimodality refers to an AI system’s ability to process, understand, and generate content across multiple data types, such as text, images, audio, and video. Multimodal models can answer questions about images, describe or generate video frames, and create audio captions, enabling more dynamic, creative, and human-like interactions. This capability expands the usefulness of GenAI across domains like education, entertainment, accessibility, and content production.

To learn more about the AI risk in different modalities, read this and this.

N

NCII

Non-Consensual Intimate Imagery (NCII) involves the sharing or generation of sexually explicit content involving real individuals without their consent. This includes synthetic or AI-generated depictions (deepfakes) and is the subject of increasing legal action under laws like the Take It Down Act.
NIST AI RMF

The NIST AI Risk Management Framework (RMF) (also known as the NIST Generative AI Profile) is a U.S. government-backed framework issued by the National Institute of Standards and Technology that helps organizations identify, assess, and mitigate risks related to GenAI. While voluntary, it’s widely regarded as the compliance benchmark in the absence of binding U.S. federal law. The framework emphasizes input/output guardrails, adversarial testing, bias mitigation, transparency, content provenance, and ongoing incident monitoring.

To learn more about the NIST AI RMF and other prominent regulations, read this.
NLP (Natural Language Processing)

NLP refers to the field of AI that deals with understanding, interpreting, and generating human language. NLP techniques enable chatbots, translation tools, sentiment analysis, and many foundational features in LLMs and GenAI systems.
NSFW Content

NSFW (Not Safe For Work) content includes sexually explicit, graphic, or otherwise inappropriate material that may violate platform guidelines or offend users. GenAI systems may generate such content unintentionally if not properly filtered or aligned.

O

Observability

Observability in GenAI refers to the ability to monitor, analyze, and understand AI system behavior across all stages of input, output, and model interaction. It provides transparency into how AI systems respond to users, how safety filters are triggered, and where risks emerge. High observability is critical for detecting safety violations, debugging failures, auditing decisions, and continuously improving model performance and trustworthiness. In production environments, observability should include real-time visibility across user sessions, prompts, outputs, and policy violations, enabling teams to investigate incidents, benchmark model versions, and take automated or manual action.
Off-Policy Behavior

Off-policy behavior occurs when an AI system generates outputs that contradict the developer’s intended use, platform guidelines, or safety instructions. This often reflects misalignment between training data and real-world usage conditions.
Off-Topic Output

Off-topic output refers to responses from an AI system that are unrelated to the user's input or task, which can disrupt user experience and may inadvertently surface inappropriate or unsafe content.
Output Filtering

Output filtering is the process of scanning AI-generated content after generation but before presentation to the user. It ensures the response adheres to content safety standards by identifying and blocking harmful, toxic, or off-policy outputs.
Output Obfuscation

Output obfuscation is a technique where an attacker manipulates the formatting or encoding of AI-generated content to bypass moderation or detection systems. For example, replacing letters with symbols or using Base64 encoding can hide offensive or malicious content from traditional filters while still being readable to humans.

To learn more about GenAI attack vectors, read this.

P

PII Leakage

Personally Identifiable Information (PII) leakage occurs when a GenAI system unintentionally outputs names, contact details, social security numbers, or other identifying data. This can stem from overfitting or poorly curated training sets, and represents a major compliance and privacy threat.
Policy-Adaptive Controls

Policy-adaptive controls are flexible enforcement mechanisms that align AI behavior with evolving platform guidelines, regional regulations, or brand standards. These controls dynamically adjust filters, thresholds, or responses based on the context, risk level, and desired outcomes.
Profanity

Profanity refers to offensive or vulgar language that may be inappropriate depending on audience, context, or platform standards. While not always harmful in itself, excessive or targeted profanity can signal abuse, harassment, or reduced content quality.
Prompt Engineering

Prompt engineering is the practice of crafting, structuring, or refining input text to guide AI models toward desired responses. It’s essential for maximizing model performance, preventing unsafe outputs, and reducing ambiguity, especially in enterprise or regulated environments.
Prompt Filtering

Prompt filtering involves analyzing user input before it is sent to the AI model. This helps prevent attacks like prompt injection, circumvention attempts, and the use of adversarial or policy-violating queries that could manipulate or mislead the system.
Prompt Injection

Prompt injection is a technique that involves crafting malicious or manipulative input to override or subvert an AI’s original instructions. It’s a top threat vector in GenAI security, often used to bypass safety guardrails, extract sensitive data, or produce harmful content.

To learn more about GenAI attack vectors, read this.

R

RAG (Retrieval-Augmented Generation)

RAG is a GenAI architecture that combines a language model with a retrieval system. Instead of relying solely on internal memory, the model pulls relevant documents or data in real time to improve output quality, factual accuracy, and contextual awareness.
Red Teaming

Red teaming or adversarial evaluations, in the context of AI, is the practice of proactively testing generative AI systems through simulated attacks to uncover safety, security, and policy vulnerabilities. This includes multi-turn simulations, edge-case scenarios, jailbreak attempts, and multimodal testing across languages and user intents. Red teaming helps evaluate how AI systems behave under real-world abuse conditions, offering critical insights for improving alignment and risk mitigation before deployment.

To learn more about Red Teaming, read this or watch this.
Reinforcement Learning from AI Feedback (RLAIF)

RLAIF is an AI alignment technique where feedback comes from another AI model, rather than from humans. Popularized by Anthropic’s “Constitutional AI” approach, RLAIF enables one model to evaluate and refine another’s outputs based on a set of predefined rules or values. This method offers a faster, more scalable alternative to human labeling, though it may trade off on nuance or contextual sensitivity.
Reinforcement Learning from Human Feedback (RLHF)

RLHF is a training method that aligns AI model behavior with human values by using human feedback to guide learning. Instead of relying solely on mathematical objectives, models are rewarded based on how well their responses match human preferences. This technique is commonly used to fine-tune large language models (LLMs) and helps ensure outputs are safer, more helpful, and more aligned with user expectations in real-world use.
Responsible AI (RAI)

Responsible AI refers to the practice of designing, developing, and deploying AI systems that are safe, fair, transparent, and aligned with societal values. It encompasses principles like accountability, human oversight, explainability, and harm mitigation. Regulatory frameworks such as the EU AI Act and NIST’s Generative AI Profile both emphasize RAI as foundational to compliant, trustworthy AI deployment.
Retrieval Augmentation Abuse

In RAG-based systems, attackers can manipulate the retrieval process (e.g., by injecting misleading data into the knowledge base) to distort model outputs or trigger unwanted behavior. This undermines trust in dynamic, search-augmented AI workflows.

S

Sensitive Information Exfiltration

This refers to the extraction of private, proprietary, or regulated data from an AI model. Attackers may exploit model memorization, prompt injection, or retrieval loopholes to leak PII, source code, credentials, or internal communications, creating privacy and compliance risks.

To learn more about GenAI attack vectors, read this.
Sextortion

Financial Sextortion is a form of online abuse where perpetrators threaten to share sexually explicit material unless their victim complies with demands, usually for money, more images, or personal information. In GenAI environments, risks include synthetic sexual imagery, impersonation, or grooming that enables or amplifies these threats. Systems must detect signs of coercion, predation, or pattern-based abuse across modalities and languages.

To learn more about sextortion, watch this.
Suicide & Self-Harm

The Self-Harm category includes content that encourages, describes, or glamorizes self-injury or suicide. GenAI systems should be designed to avoid generating such content and, when appropriate, redirect users to mental health resources or crisis support.
Synthetic Data

Synthetic data refers to artificially generated information used to train, test, or fine-tune AI systems. While it can enhance privacy or fill data gaps, poorly constructed synthetic data can introduce hidden biases or unrealistic patterns that affect model safety.
System Prompt Override

A system prompt override attack manipulates the base instructions given to an AI model, often through carefully crafted input, to change how it interprets user commands. This technique can force a model to act outside of intended constraints, undermining safety mechanisms or content policies.

To learn more about GenAI attack vectors, read this.

T

Take It Down Act

The Take It Down Act is a U.S. law designed to combat the spread of non-consensual intimate imagery (NCII), especially in digital environments powered by generative AI. It empowers minors, parents, and affected individuals to request content removal from platforms and compels organizations to implement mechanisms to respond quickly and securely. For GenAI deployers, this means building proactive moderation, redress processes, and abuse detection into any system capable of generating or hosting user content.
Terrorism

Terrorism-related content includes material that promotes, glorifies, or facilitates acts of terrorism or the activities of designated terrorist organizations. This can include calls to violence, recruitment messages, propaganda, or instructions for attacks. GenAI systems must be equipped to recognize this content even when it's veiled in euphemism, symbolism, or multilingual code-switching.

To learn more about terrorism in the GenAI era, read this.
Threat Intelligence

Threat Intelligence is the practice of collecting, analyzing, and contextualizing data about malicious actors, abuse tactics, and evolving attack vectors, across the clear, deep, and dark web. It leverages open-source intelligence (OSINT), threat analysts, and subject matter experts to uncover real-world adversarial behavior.

In the context of GenAI, threat intelligence plays a critical role in anticipating how bad actors might manipulate or weaponize AI systems. This includes tracking new jailbreak techniques, prompt injection methods, content evasion strategies, and linguistic euphemisms that escape standard filters. These insights inform red teaming exercises that mimic authentic abuse patterns and guide the continuous refinement of safety guardrails, classifiers, and moderation rules.

To learn more about the importance of threat intelligence, read this.
Token Smuggling

Token smuggling is an advanced prompt manipulation technique where hidden instructions or malicious content are embedded within token sequences in a way that evades safety filters. Attackers exploit quirks in how LLMs interpret tokens to bypass guardrails, often triggering off-policy or unsafe responses.

To learn more about GenAI attack vectors, read this.
Tokenization

Tokenization is the process of breaking down text into smaller units (tokens) such as words, subwords, or characters before feeding it into a model. The number and arrangement of tokens affect how an AI model interprets and generates responses.
Toxicity

Toxicity in generative AI refers to outputs that include offensive, insulting, or abusive language, such as hate speech, slurs, or threats. Toxic content poses a reputational and user safety risk, especially when systems are deployed publicly without guardrails or filtering mechanisms.
Training Data

Training data refers to the datasets used to teach an AI model how to understand and generate content. In GenAI systems, this often includes vast amounts of text, images, or code sourced from books, websites, forums, and open datasets like Common Crawl. The quality, diversity, and bias of this data heavily influence how a model performs and what risks it may carry, such as hallucinations, stereotypes, or copyright violations.

Responsible AI development requires careful curation, filtering, and documentation of training data to ensure transparency, fairness, and compliance.

To learn more about Training Data, read this.
Training Data Poisoning

Training data poisoning is a form of attack where malicious actors intentionally insert harmful or misleading data into a model's training set. This can lead to corrupted behavior, hidden backdoors, or trigger phrases that cause unsafe outputs post-deployment.

To learn more about GenAI attack vectors, read this.
Transformer Architecture

Transformer architecture is a neural network design introduced in 2017 that revolutionized natural language processing. It uses self-attention mechanisms to process and relate input tokens in parallel, enabling powerful models like GPT, BERT, and other large language models (LLMs).
Transparency

Transparency in GenAI refers to the ability of stakeholders to understand how an AI system works, what data it was trained on, and why it produces specific outputs. Regulatory frameworks increasingly require transparency through documentation (e.g., model cards), disclosure of training data, or explanation of decision logic to ensure responsible and accountable use.
Trust & Safety

Trust and Safety (T&S) refers to the practices, teams, and technologies dedicated to protecting users and user-generated content (UCG) platforms from harm. In the context of GenAI, T&S includes detecting policy violations, preventing abuse, and ensuring AI outputs align with platform standards, legal requirements, and community values.

U

User Access Abuse

User access abuse refers to the misuse of authorized credentials or platform permissions to manipulate or overload a GenAI system. This may involve exploiting rate limits, bypassing guardrails through session tampering, or automating malicious queries. It poses significant security risks, especially in enterprise and public-facing deployments.

V

Violent Extremism

Violent extremism refers to content that promotes, incites, or glorifies acts of violence for ideological, religious, or political reasons. GenAI systems must be able to detect extremist narratives and prevent the amplification of content linked to terrorism or radicalization.
Vision-Based Injection

Vision-based injection involves embedding hidden or adversarial content into images (e.g., steganography or imperceptible perturbations) to influence AI systems that process visual inputs. This can lead to manipulated outputs, misclassifications, or policy violations in multimodal AI models handling both text and images.

To learn more about GenAI attack vectors, read this.

Z

Zero-Shot Learning

Zero-shot learning enables a model to perform a task it was not explicitly trained on by leveraging general patterns it has learned. For example, an LLM answering a question in a new language it has never seen directly in that context.

Trust Your AI.
Don’t let unseen risks derail your AI roadmap. ActiveFence’s protection suite brings clarity, protection, and control to every stage of the GenAI lifecycle.

Request a Demo

Knowledge base

The GenAI Safety & Security Glossary by ActiveFence

Popular Terms

AI Safety

AI Safety

Bias

Bigotry

Child Safety

Deceptive AI Behavior

Excessive Agency

Factual Inconsistency

Hallucination

IP & Copyright Infringement

Misinformation / Disinformation

NSFW Content

Off-Policy Behavior

Off-Topic Output

Synthetic Data

Toxicity

AI Security

AI Security

Excessive Model Curiosity

Impersonation Attacks

Indirect Prompt Injection

Input Obfuscation

Jailbreaking

Macaronic Prompting

Malicious Code Generation

Memory Injections

Metadata Injection

Model Weight Exposure

Output Obfuscation

PII Leakage

Prompt Injection

Retrieval Augmentation Abuse

Sensitive Information Exfiltration

System Prompt Override

Token Smuggling

Training Data Poisoning

User Access Abuse

Vision-Based Injection

Compliance (Responsible AI & Ethics)

AI Accountability

Audit Logging

Bring-Your-Own Policy (BYOP)

EU AI Act

FDA

FTC

HIPAA

Human-in-the-Loop (HITL)

NIST AI RMF

Responsible AI (RAI)

Take It Down Act

Transparency

Content Safety & Trust Violations

Content Moderation

CSAM (Child Sexual Abuse Material)

Dangerous Substances

Deepfake

Graphic Violence

Hate speech

Human Exploitation

Illegal Activity

NCII

Profanity

Sextortion

Suicide & Self-Harm

Terrorism

Threat Intelligence

Trust & Safety

Violent Extremism

Defense Mechanisms

Automated Detection

Content Safety Classifier

Feedback Loop Optimization

Fine-Tuning

Guardrails

Moderation Layer

Multi-Turn Simulation

Observability

The GenAI Safety &
Security Glossary
by ActiveFence