12. Retrieval Augmented Generation (RAG) Pipelines

This chapter dissects Retrieval Augmented Generation systems and their attack surfaces. You'll learn RAG architecture (indexing, embedding, retrieval, generation), vector database security, context injection through retrieval poisoning, prompt leakage via retrieved documents, and how to test the complex data flow that makes RAG both powerful and vulnerable.

12.1 What Is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is a technique that enhances Large Language Models by combining them with external knowledge retrieval systems. Rather than relying solely on the knowledge embedded in the model's parameters during training, RAG systems dynamically fetch relevant information from external sources to inform their responses.

The Core RAG Workflow

  1. Query Processing: A user submits a question or prompt.

  2. Retrieval: The system searches external knowledge bases for relevant documents or passages.

  3. Augmentation: Retrieved content is combined with the original query to create an enriched prompt.

  4. Generation: The LLM generates a response using both its trained knowledge and the retrieved context.

LLM State: Weights vs Context Diagram

Figure 50: LLM State - Weights (Permanent) vs Context (Transient)

Why Organizations Use RAG

  • Up-to-date Information: Access to current data beyond the model's training cutoff date.

  • Domain-Specific Knowledge: Integration with proprietary documents, internal wikis, or specialized databases.

  • Reduced Hallucination: Grounding responses in actual retrieved documents improves accuracy.

  • Cost Efficiency: Avoids expensive fine-tuning for every knowledge update.

  • Traceability: Ability to cite sources and provide evidence for generated responses.

Common RAG Use Cases

  • Enterprise knowledge assistants accessing internal documentation

  • Customer support chatbots with product manuals and FAQs

  • Research assistants querying academic papers or technical reports

  • Legal document analysis and contract review systems

  • Healthcare systems accessing medical literature and patient records


12.2 RAG Architecture and Components

A typical RAG system comprises several interconnected components, each presenting unique security considerations.

Vector Databases and Embedding Stores

  • Purpose: Store document embeddings (high-dimensional numerical representations) for efficient similarity search.

  • Common Solutions: Pinecone, Weaviate, Chroma, FAISS, Milvus, Qdrant

  • Security Concerns: Access controls, data isolation, query injection, metadata leakage

Retrieval Mechanisms

  • Semantic Search: Uses embeddings to find conceptually similar content, even without exact keyword matches.

  • Keyword/Lexical Search: Traditional search using exact or fuzzy text matching (BM25, TF-IDF).

  • Hybrid Approaches: Combine semantic and keyword search for better precision and recall.

  • Reranking: Secondary scoring to improve relevance of retrieved results.

Document Processing Pipeline

The ingestion flow that prepares documents for retrieval:

  1. Document Collection: Gather files from various sources (databases, file stores, APIs)

  2. Parsing and Extraction: Convert PDFs, Office docs, HTML, etc. into text

  3. Chunking: Split documents into manageable segments (e.g., 500-1000 tokens)

  4. Embedding Generation: Convert text chunks into vector representations using embedding models

  5. Metadata Extraction: Capture titles, authors, dates, access permissions, tags

  6. Index Storage: Store embeddings and metadata in the vector database

LLM Integration Layer

  • Query Embedding: User queries are converted to embeddings for similarity search

  • Context Assembly: Retrieved chunks are formatted and injected into the LLM prompt

  • Prompt Templates: Define how retrieved content is presented to the model

  • Response Generation: LLM produces output using both its knowledge and retrieved context

Orchestration and Control

  • Query Routing: Determine which knowledge bases to search based on query type

  • Multi-Step Retrieval: Chain multiple retrievals or refine queries iteratively

  • Result Filtering: Apply business logic, access controls, or content policies

  • Caching: Store frequent queries and results for performance


12.3 RAG System Data Flow

Understanding the complete data flow helps identify attack surfaces and vulnerabilities.

End-to-End RAG Data Flow

RAG Data Flow Diagram

Critical Security Checkpoints

At each stage, security controls should be evaluated:

  • Query Processing: Input validation, query sanitization, rate limiting

  • Retrieval: Access control enforcement, query scope limitation

  • Context Assembly: Injection prevention, content sanitization

  • Generation: Output filtering, safety guardrails

  • Delivery: Response validation, sensitive data redaction


12.4 Why RAG Systems Are High-Value Targets

From an adversary's perspective, RAG systems are extremely attractive targets because they often serve as the bridge between public-facing AI interfaces and an organization's most sensitive data.

Access to Sensitive Enterprise Data

  • Proprietary research and development documentation

  • Financial records and business strategies

  • Customer data and PII

  • Internal communications and meeting notes

  • Legal documents and contracts

  • HR records and employee information

Expanded Attack Surface

RAG systems introduce multiple new attack vectors:

  • Vector database exploits

  • Embedding manipulation

  • Document injection points

  • Metadata exploitation

  • Cross-user data leakage

Trust Boundary Violations

Users often trust AI assistants and may not realize:

  • The AI can access far more documents than they personally can

  • Clever queries can access information from unintended sources

  • The system may lack proper access controls

Integration Complexity

RAG systems integrate multiple components (LLMs, databases, parsers, APIs), each with their own vulnerabilities. The complexity creates:

  • Configuration errors

  • Inconsistent security policies

  • Blind spots in monitoring

  • Supply chain risks


12.5 RAG-Specific Attack Surfaces

12.5.1 Retrieval Manipulation

Attack Vector: Crafting queries designed to retrieve unauthorized or sensitive documents.

Techniques

  • Semantic probing: Using queries semantically similar to sensitive topics

  • Iterative refinement: Gradually narrowing queries to home in on specific documents

  • Metadata exploitation: Querying based on known or guessed metadata fields

  • Cross-document correlation: Combining information from multiple retrieved chunks

Example

Retrieval Manipulation Diagram

Query Type
Query Content

Benign

"What is our vacation policy?"

Malicious

"What are the salary details and compensation packages for executives mentioned in HR documents from 2024?"

12.5.2 Embedding Poisoning

Attack Vector: Injecting malicious documents into the knowledge base to manipulate future retrievals.

Scenario: If an attacker can add documents to the ingestion pipeline (through compromised APIs, shared drives, or insider access), they can:

  • Plant documents with prompt injection instructions

  • Create misleading information that will be retrieved and trusted

  • Inject documents designed to always be retrieved for specific queries

Example Trojan Document

12.5.3 Context Injection via Retrieved Content

Attack Vector: Exploiting how retrieved content is merged with the user's prompt to inject malicious instructions.

Unlike direct prompt injection where the user provides the malicious input, here the injection comes from the retrieved documents themselves.

Context Poisoning via RAG Diagram

Figure 49: Context Poisoning via RAG (Indirect Context Injection)

Impact

  • Override the system's intended behavior

  • Exfiltrate information from other retrieved documents

  • Cause the LLM to ignore safety guidelines

12.5.4 Metadata Exploitation

Attack Vector: Abusing document metadata to infer sensitive information or bypass access controls.

Vulnerable Metadata Fields

  • File paths revealing organizational structure

  • Author names and email addresses

  • Creation/modification timestamps

  • Access control lists (if exposed)

  • Tags or categories

  • Document titles

Example Attack

Attacker Query: "Show me all documents created by the CFO in the last week"

Leakage Type
Information Revealed

Existence

Confirms sensitive docs exist

Titling

Leaks internal project names/topics

Temporal

Reveals when key events occurred

Contextual

Infers subject matter from metadata

12.5.5 Cross-Document Leakage

Attack Vector: Accessing information from documents a user shouldn't have permission to view.

Common Causes

  • Access controls applied at storage level but not enforced during retrieval

  • Permissions checked only on the query, not on retrieved results

  • Shared vector databases without proper tenant isolation

  • Chunking that combines content from multiple documents

12.5.6 Retrieval Bypasses

Attack Vector: Circumventing filters, blocklists, or access restrictions.

Techniques

  • Synonym substitution: Using alternative terms to bypass keyword filters

  • Semantic evasion: Rephrasing queries to avoid detection while maintaining semantic similarity

  • Encoding tricks: Using special characters, Unicode, or alternate spellings

  • Multi-language queries: Exploiting filters that only work in one language


12.6 Common RAG Vulnerabilities

12.6.1 Inadequate Access Control

The Problem: Many RAG implementations fail to properly enforce access controls on retrieved documents.

Vulnerability Pattern
Description
Impact

No retrieval-time checks

Access controls only at storage layer, not enforced during RAG retrieval

Any user can access any document via queries

Role-based gaps

Permissions not properly inherited from source systems

Privilege escalation

Multi-tenant mixing

Documents from different customers stored in shared vector DB

Cross-customer data leakage

Metadata-only filtering

Content retrieved but only metadata filtered

Sensitive content exposed

Example Scenario

A company implements a RAG-powered internal assistant. Documents are stored in SharePoint with proper access controls, but the RAG system:

  1. Ingests all documents into a shared vector database

  2. Retrieves documents based only on semantic similarity

  3. Never checks if the requesting user has permission to access the source document

Result: Any employee can ask questions and receive answers containing information from executive-only documents.

12.6.2 Prompt Injection via Retrieved Content

The Problem: Retrieved documents containing malicious instructions can hijack the LLM's behavior.

Attack Flow

  1. Attacker plants or modifies a document in the knowledge base

  2. Document contains hidden prompt injection payloads

  3. Legitimate user query triggers retrieval of the malicious document

  4. LLM receives both the user query and the injected instructions

  5. LLM follows the malicious instructions instead of system guidelines

Example Malicious Document

Impact

  • Misinformation delivery

  • Unauthorized actions via plugin calls

  • Data exfiltration through response manipulation

  • Reputational damage

The Problem: Even without accessing full documents, attackers can infer sensitive information through iterative similarity queries.

Attack Methodology

  1. Document Discovery: Probe for existence of sensitive documents

    • "Are there any documents about Project Phoenix?"

    • System response speed or confidence indicates presence/absence

  2. Semantic Mapping: Use similarity search to map the information landscape

    • "What topics are related to executive compensation?"

    • Retrieved results reveal structure of sensitive information

  3. Iterative Extraction: Gradually refine queries to extract specific details

    • Start broad: "Company financial performance"

    • Narrow down: "Q4 2024 revenue projections for new product line"

    • Extract specifics: "Revenue target for Project Phoenix launch"

  4. Metadata Mining: Gather intelligence from metadata alone

    • Document titles, authors, dates, categories

    • Build understanding without accessing content

Example

Attacker Query Sequence:

Step
Attacker Query
Outcome

1

"Tell me about strategic initiatives"

Gets vague info

2

"What new projects started in 2024?"

Gets project names

3

"Details about Project Phoenix budget"

Gets financial hints

4

"Project Phoenix Q1 2025 spending forecast"

Gets specific numbers

12.6.4 Chunking and Context Window Exploits

The Problem: Document chunking creates new attack surfaces and can expose adjacent sensitive content.

Chunking Vulnerabilities

  • Boundary Exploitation: Chunks may include context from adjacent sections

    • Document contains: Public section → Private section

    • Chunk boundary falls in between, leaking intro to private content

  • Context Window Overflow: Large context windows allow retrieval of excessive content

    • Attacker crafts queries that trigger retrieval of many chunks

    • Combined chunks contain more information than intended

  • Chunk Reconstruction: Multiple queries to retrieve all chunks of a protected document

    • Query for chunk 1, then chunk 2, then chunk 3...

    • Reassemble entire document piece by piece

Example Scenario

A 10-page confidential strategy document is chunked into 20 segments. Each chunk is 500 tokens. An attacker:

  1. Identifies the document exists through metadata

  2. Crafts 20 different queries, each designed to retrieve a specific chunk

  3. Reconstructs the entire document from the responses


12.7 Red Teaming RAG Systems: Testing Approach

12.7.1 Reconnaissance

Objective: Understand the RAG system architecture, components, and data sources.

Information Gathering

  • System Architecture:

    • Identify LLM provider/model (OpenAI, Anthropic, local model)

    • Vector database technology (Pinecone, Weaviate, etc.)

    • Embedding model (OpenAI, Sentence-BERT, etc.)

    • Front-end interface (web app, API, chat interface)

  • Document Sources:

    • What types of documents are ingested? (PDFs, wikis, emails, databases)

    • How frequently is the knowledge base updated?

    • Are there multiple knowledge bases or collections?

  • Access Control Model:

    • Are there different user roles or permission levels?

    • How are access controls described in documentation?

    • What authentication mechanisms are used?

Reconnaissance Techniques

  1. Query Analysis: Test basic queries and observe response patterns

    • Response times (may indicate database size or complexity)

    • Citation format (reveals document structure)

    • Error messages (may leak technical details)

  2. Boundary Testing: Find the edges of the system's knowledge

    • Ask about topics that shouldn't be in the knowledge base

    • Test queries about different time periods

    • Probe for different document types

  3. Metadata Enumeration:

    • Request lists of available documents or categories

    • Ask about document authors, dates, or sources

    • Test if citations reveal file paths or URLs

12.7.2 Retrieval Testing

Objective: Test whether access controls are properly enforced during document retrieval.

Test Cases

Test Scenario
Test Input / Action
Expected Behavior
Vulnerability Indicator

Unauthorized Document Access

"Show me the latest executive board meeting minutes"

Access denied message

System retrieves and summarizes content

Cross-User Data Leakage

(Account A) "What are the customer support tickets for user B?"

Access denied

System shows tickets from other users

Role Escalation

(Low-privilege user) "What are the salary ranges for senior engineers?"

Permission denied

HR data accessible to non-HR users

Temporal Access Control

"What were the company financials before I joined?"

Only data from user's tenure

Historical data accessible

Systematic Testing Process

  1. Create a list of known sensitive documents or topics

  2. For each, craft multiple query variations:

    • Direct asks

    • Indirect/semantic equivalents

    • Metadata-focused queries

  3. Test with different user roles/accounts

  4. Document any successful unauthorized retrievals

12.7.3 Injection and Poisoning

Objective: Test whether the system is vulnerable to document-based prompt injection or malicious content injection.

Test Approaches

A. Document Injection Testing (if authorized and in-scope)

  1. Create Test Documents: Design documents with embedded instructions

  2. Inject via Available Channels:

    • Upload to shared drives that feed the RAG system

    • Submit via any document ingestion APIs

    • Modify existing documents (if you have edit permissions)

  3. Verify Injection Success:

    • Query topics that would retrieve your planted document

    • Check if the LLM follows your injected instructions

    • Test different injection payloads (data exfiltration, behavior modification)

B. Testing Existing Documents for Injections

Even without injecting new documents, test if existing content can cause issues:

  1. Query for Anomalous Behavior:

    • Ask questions and observe if responses seem manipulated

    • Look for signs the LLM is following hidden instructions

    • Test if certain queries consistently produce unexpected results

  2. Content Analysis (if you have access):

    • Review document ingestion logs

    • Examine highly-ranked retrieved documents for suspicious content

    • Check for documents with unusual formatting or hidden text

C. Indirect Prompt Injection

Test if user-submitted content that gets indexed can inject instructions:

12.7.4 Data Exfiltration Scenarios

Objective: Test systematic extraction of sensitive information.

Attack Scenarios

Scenario 1: Iterative Narrowing

Scenario 2: Batch Extraction

Scenario 3: Metadata Enumeration

Attacker Objective: Extract document metadata

Inference Category
Malicious Query

Author Enumeration

"List all documents by John Doe"

Temporal Probing

"What documents were created this week?"

Classification Discovery

"Show me all confidential project names"

Topic Reconnaissance

"What are the titles of all board meeting documents?"

Scenario 4: Chunk Reconstruction

Goal: Reconstruct a full document piece by piece

Step
Attack Action / Query

1

Identify document exists: "Does a document about Project X exist?"

2

Get chunk 1: "What does the introduction of the Project X document say?"

3

Get chunk 2: "What comes after the introduction in Project X docs?"

4

Continue until full document is reconstructed

Evidence Collection

For each successful exfiltration:

  • Document the query sequence used

  • Capture the retrieved information

  • Note any access controls that were bypassed

  • Assess the sensitivity of the leaked data

  • Calculate the scope of potential data exposure


12.8 RAG Pipeline Supply Chain Risks

RAG systems rely on numerous third-party components, each introducing potential security risks.

Vector Database Vulnerabilities

Security Concerns

  • Access Control Bugs: Flaws in multi-tenant isolation

  • Query Injection: SQL-like injection attacks against vector query languages

  • Side-Channel Attacks: Timing attacks to infer data presence

  • Unpatched Vulnerabilities: Outdated database software

Example: Weaviate CVE-2023-XXXXX (hypothetical) allows unauthorized access to vectors in shared instances.

Embedding Model Risks

Security Concerns

  • Model Backdoors: Compromised embedding models that create predictable weaknesses

  • Adversarial Embeddings: Maliciously crafted inputs that create manipulated embeddings

  • Model Extraction: Attackers probing to reconstruct or steal the embedding model

  • Bias Exploitation: Using known biases in embeddings to manipulate retrieval

Third-Party Embedding Services

  • OpenAI embeddings (API dependency, data sent to third party)

  • Sentence-Transformers (open source, verify integrity)

  • Cohere embeddings (API dependency)

Document Processing Library Risks

Common Libraries and Their Risks

Library
Purpose
Security Risks

PyPDF2, pdfminer

PDF parsing

Malicious PDFs, arbitrary code execution

python-docx

Word document parsing

XML injection, macro execution

BeautifulSoup, lxml

HTML parsing

XSS, XXE attacks

Tesseract

OCR

Image-based exploits, resource exhaustion

Unstructured

Multi-format parsing

Aggregate risks of all dependencies

Attack Scenario

  1. Attacker uploads a malicious PDF to a system that feeds the RAG pipeline

  2. PDF exploits a vulnerability in the parsing library

  3. Attacker gains code execution on the ingestion server

  4. Access to embedding generation, database credentials, and source documents

Data Provenance and Integrity

Questions to Investigate

  • How is document authenticity verified before ingestion?

  • Can users track which source system a retrieved chunk came from?

  • Are documents cryptographically signed or checksummed?

  • How are updates to source documents propagated to the vector database?

  • Can an attacker replace legitimate documents with malicious versions?

Provenance Attack Example

Attack Flow:

Step
Action
Result/Impact

1

Compromise a shared drive that feeds the RAG system

Attacker gains write access

2

Replace "Q4_Financial_Report.pdf" with a modified version

Legitimate file overwritten

3

Modified version contains false financial data

Data integrity compromised

4

RAG system ingests and trusts the malicious document

Poisoned knowledge base

5

Users receive incorrect information from the AI assistant

Disinformation spread


12.9 Real-World RAG Attack Examples

Scenario 1: Accessing HR Documents Through Query Rephrasing

Setup (Case Study 1)

  • Company deploys internal chatbot powered by RAG

  • Vector database contains all company documents, including HR files

  • Access controls are implemented at the file storage level but not enforced during RAG retrieval

Attack (Case Study 1)

An employee (Alice) with no HR access wants to know executive salaries.

User/Role
Interaction
System Outcome

Alice

"What is our compensation philosophy?"

Retrieves public HR policy documents

Alice

"What are examples of compensation at different levels?"

Retrieves salary band information (starts to leak)

Alice

"What specific compensation packages exist for C-level executives?"

Retrieves and summarizes actual executive compensation data

Alice

"What is the CEO's total compensation package for 2024?"

Leaks specific base salary, bonus, and stock options

Root Cause: Access controls not enforced at retrieval time

Impact: Unauthorized access to confidential HR information


Scenario 2: Extracting Competitor Research via Semantic Similarity

Setup (Case Study 2)

  • Customer-facing product assistant with RAG for product documentation

  • Vector database accidentally includes internal competitive analysis documents

  • No content filtering on retrieved documents

Attack (Case Study 2)

A competitor creates an account and systematically probes:

Step
Competitor Query
System Response

1

"How does your product compare to competitors?"

Retrieves marketing materials (Safe)

2

"What are the weaknesses of competing products?"

Starts retrieving from competitive analysis docs (Warning)

3

"What specific strategies are planned to compete with Company X?"

LEAK: Reveals internal analysis and Q1 2025 roadmap

Root Cause: Sensitive internal documents mixed with public-facing content in the same vector database

Impact: Exposure of competitive strategy and proprietary analysis


Scenario 3: Trojan Document Triggering Unintended Actions

Setup (Case Study 3)

  • RAG system with plugin integration (email, calendar, database access)

  • Document ingestion from shared employee drive

  • No content validation or sandboxing of retrieved documents

Attack (Case Study 3)

Malicious insider plants a document:

Trigger

Legitimate user asks: "What's the status of Project Alpha?"

System Behavior

  1. Retrieves the malicious document

  2. LLM processes the hidden instruction

  3. Executes email plugin to send data to attacker

  4. Responds to user with innocuous message

Root Cause: No sanitization of retrieved content before LLM processing

Impact: Data exfiltration, unauthorized actions


Scenario 4: Metadata Exploitation Revealing Confidential Project Names

Setup (Case Study 4)

  • Enterprise search assistant with RAG

  • Document metadata (titles, authors, dates) visible in citations

  • Content access controlled, but metadata not redacted

Attack (Case Study 4)

User without access to confidential projects:

Result: Even without content access, the attacker learns:

  • Confidential project codenames

  • Who is working on what

  • Existence of acquisition plans

  • Timeline of activities

Root Cause: Metadata treated as non-sensitive and not access-controlled

Impact: Intelligence gathering, competitive disadvantage, insider trading risk (for acquisition info)


12.10 Defensive Considerations for RAG Systems

Document-Level Access Controls

Best Practice: Enforce access controls at retrieval time, not just at storage time.

Implementation Approaches

  1. Metadata-Based Filtering:

  2. Tenant Isolation:

    • Separate vector database collections per customer/tenant

    • Use namespace or partition keys

    • Never share embeddings across security boundaries

  3. Attribute-Based Access Control (ABAC):

    • Define policies based on user attributes, document attributes, and context

    • Example: "User can access if (user.department == document.owner_department AND document.classification != 'Secret')"

Input Validation and Query Sanitization

Defensive Measures

  1. Query Complexity Limits:

  2. Semantic Anomaly Detection:

    • Flag queries that are semantically unusual for a given user

    • Detect systematic probing patterns (many similar queries)

    • Alert on queries for highly sensitive terms

  3. Keyword Blocklists:

    • Block queries containing specific sensitive terms (calibrated to avoid false positives)

    • Monitor for attempts to bypass using synonyms or encoding

Retrieved Content Filtering

Safety Measures Before LLM Processing

  1. Content Sanitization:

  2. System/User Delimiter Protection:

  3. Retrieval Result Limits:

    • Limit number of chunks retrieved (e.g., top 5)

    • Limit total token count of retrieved content

    • Prevent context window flooding

Monitoring and Anomaly Detection

Key Metrics to Track

Metric
Purpose
Alert Threshold (Example)

Queries per user per hour

Detect automated probing

>100 queries/hour

Failed access attempts

Detect unauthorized access attempts

>10 failures/hour

Unusual query patterns

Detect systematic extraction

Semantic clustering of queries

Sensitive document retrievals

Monitor access to high-value data

Any access to "Top Secret" docs

Plugin activation frequency

Detect potential injection exploits

Unexpected plugin calls

Logging Best Practices

Secure Document Ingestion Pipeline

Ingestion Security Checklist

Example Secure Ingestion Flow

Secure Document Ingestion Pipeline

Regular Security Audits

Audit Activities

  1. Access Control Testing:

    • Verify permissions are correctly enforced across all user roles

    • Test edge cases and boundary conditions

    • Validate tenant isolation in multi-tenant deployments

  2. Vector Database Review:

    • Audit what documents are indexed

    • Remove outdated or no-longer-authorized content

    • Verify metadata accuracy

  3. Embedding Model Verification:

    • Ensure using official, unmodified models

    • Check for updates and security patches

    • Validate model integrity (checksums, signatures)

  4. Penetration Testing:

    • Regular red team exercises focused on RAG-specific attacks

    • Test both internal and external perspectives

    • Include social engineering vectors (document injection via legitimate channels)


12.11 RAG Red Team Testing Checklist

Use this checklist during RAG-focused engagements:

Pre-Engagement

Retrieval and Access Control Testing

Injection and Content Security

Data Extraction and Leakage

Supply Chain and Infrastructure

Monitoring and Detection

Documentation and Reporting


12.12 Tools and Techniques for RAG Testing

Custom Query Crafting

Manual Testing Tools

  • Query Templates: Maintain a library of test queries for different attack types

  • Semantic Variation Generator: Create multiple semantically similar queries

Vector Similarity Analysis

Understanding Embedding Space

Applications

  • Find semantically similar queries to tested ones

  • Identify queries likely to retrieve specific document types

  • Understand which query variations might bypass filters

Document Embedding and Comparison

Probing Document Space

RAG-Specific Fuzzing Frameworks

Emerging Tools

  • PromptInject: Automated prompt injection testing tool (works for RAG context injection)

  • PINT (Prompt Injection Testing): Framework for systematic injection testing

  • Custom RAG Fuzzer: Build your own based on attack patterns

Example Custom Fuzzer Structure

Access Control Testing Scripts

Automated Permission Testing


RAG systems represent one of the most powerful - and vulnerable - implementations of LLM technology in enterprise environments. By understanding their architecture, attack surfaces, and testing methodologies, red teamers can help organizations build secure, production-ready AI assistants. The next chapter will explore data provenance and supply chain security - critical for understanding where your AI system's data comes from and how it can be compromised.

12.13 Conclusion

Chapter Takeaways

  1. RAG Extends LLM Capabilities and Vulnerabilities: Retrieval systems introduce attack vectors through document injection, query manipulation, and embedding exploitation

  2. Document Poisoning is High-Impact: Attackers who compromise RAG knowledge bases can persistently influence model outputs across many users

  3. Vector Databases Create New Attack Surfaces: Embedding manipulation, similarity search exploitation, and metadata abuse enable novel attacks

  4. RAG Security Requires Defense-in-Depth: Protecting retrieval systems demands document validation, query sanitization, embedding integrity, and output filtering

Recommendations for Red Teamers

  • Map the Entire RAG Pipeline: Understand document ingestion, embedding generation, similarity search, and context injection processes

  • Test Document Injection: Attempt to add malicious documents to knowledge bases through all available channels

  • Exploit Retrieval Logic: Craft queries that retrieve unintended documents or bypass access controls

  • Manipulate Embeddings: Test if embedding similarity can be exploited to retrieve inappropriate content

Recommendations for Defenders

  • Validate All Documents: Implement rigorous input validation for documents added to RAG knowledge bases

  • Implement Access Controls: Ensure retrieval systems respect user permissions and data classification

  • Monitor Retrieval Patterns: Track unusual queries, suspicious document retrievals, and anomalous embedding similarities

  • Sanitize Retrieved Context: Treat retrieved documents as potentially malicious—validate before injecting into LLM context

Future Considerations

As RAG systems become more sophisticated with multi-hop retrieval, cross-modal search, and dynamic knowledge updates, attack surfaces will expand. Expect research on adversarial retrieval, embedding watermarking for provenance tracking, and AI-powered anomaly detection in retrieval patterns.

Next Steps


Last updated

Was this helpful?