14. Prompt Injection

14.1 Introduction to Prompt Injection
What is Prompt Injection?
Simple Example
Why Prompt Injection is the "SQL Injection of LLMs"
Historical Context
Early Demonstrations (2022)
Escalation (2023)
Current State (2024-2025)
Prevalence in Real-World Systems
Why It's So Common
Fundamental Challenges
Unlike Traditional Systems
Theoretical Foundation
Why This Works (Model Behavior)
Foundational Research
Paper
Key Finding
Relevance
What This Reveals About LLMs
14.2 Understanding Prompts and System Instructions
Anatomy of an LLM Prompt
System Prompts vs. User Prompts
System Prompt (Developer-Controlled)
User Prompt (Untrusted)
Context Windows and Prompt Structure
Component
Content Example
The Lack of Privilege Separation
Traditional Computing (Hardware-Enforced Separation)
Mode
Privilege
Protection
LLMs (No Privilege Separation)
Layer
Status
Why LLMs Struggle to Distinguish Instructions from Data
Reason 1: Training Objective
Reason 2: Natural Language Ambiguity
Input
Classification
Rationale
Reason 3: Contextual Understanding
14.3 Direct Prompt Injection
14.3.1 Definition and Mechanics
Attack Flow
Example
14.3.2 Basic Techniques
1. Instruction Override
Example Attack
2. Role Play and Persona Manipulation
Example
3. Context Switching
4. Delimiter Confusion
5. Priority Elevation Tactics
14.3.3 Advanced Techniques
1. Multi-Turn Attacks (Conversational Manipulation)
2. Payload Fragmentation
3. Encoding and Obfuscation
Base64 Encoding
ROT13
Unicode and Special Characters
Emoji/Symbol Encoding
4. Language Switching and Translation Exploits
Mixed Language Attack
5. Token Smuggling and Special Character Abuse
14.3.4 Examples and Attack Patterns
Example 1: System Prompt Extraction
Example 2: Goal Hijacking
Example 3: Information Extraction via Instruction Manipulation
Example 4: Role Confusion Attack
14.4 Indirect Prompt Injection
14.4.1 Definition and Mechanics
Attack Flow
Critical Difference from Direct Injection
14.4.2 Attack Vectors
1. Poisoned Documents in RAG Systems
Attack
Execution
2. Malicious Web Pages (LLM Browsing/Summarizing)
Real-World Example: Bing Chat (2023)
User Action
Vulnerable Response
3. Compromised Emails (Email Assistants)
Attack Email
When LLM email assistant processes this
4. Manipulated Database Records
5. Poisoned API Responses
Compromised API Response
6. Hidden Instructions in Images (Multimodal Attacks)
14.4.3 Persistence and Triggering
1. Time-Delayed Activation
2. Conditional Triggers
Specific Users
Specific Contexts
Specific Keywords
3. Self-Replicating Instructions
Worm-like Behavior
Propagation
4. Cross-User Persistence
14.4.4 Examples and Real-World Cases
Case Study 1: Bing Chat Email Extraction (2023)
Malicious Page Content
User Action
Bing's Vulnerable Behavior
14.5 First-Party vs. Third-Party Prompt Injection
14.5.1 First-Party Prompt Injection
Scope
Examples
Content Filter Bypass
System Prompt Extraction
Feature Abuse
14.5.2 Third-Party Prompt Injection
Scope
Characteristics
Examples
Shared Knowledge Base Poisoning
RAG System Manipulation
Email Campaign Attack
Plugin Hijacking for Others
14.5.3 Risk Comparison
Aspect
First-Party
Third-Party
14.5.4 Liability and Responsibility Considerations
First-Party Attacks
Third-Party Attacks
For Defenders
14.6 Prompt Injection Attack Objectives
14.6.1 Information Extraction
Target Types
1. System Prompt Extraction
2. Training Data Leakage
3. RAG Document Access
4. API Keys and Secrets
5. User Data Theft
14.6.2 Behavior Manipulation
1. Bypassing Safety Guardrails
2. Forcing Unintended Outputs
3. Changing Model Personality/Tone
4. Generating Prohibited Content
14.6.3 Action Execution
1. Triggering Plugin/Tool Calls
2. Sending Emails or Messages
3. Data Modification or Deletion
4. API Calls to External Systems
5. Financial Transactions
14.6.4 Denial of Service
1. Resource Exhaustion via Expensive Operations
2. Infinite Loops in Reasoning
3. Excessive API Calls
4. Breaking System Functionality
14.7 Common Prompt Injection Patterns and Techniques
14.7.1 Instruction Override Patterns
Pattern 1: Direct Override
Pattern 2: Authority Claims
Pattern 3: Context Termination
Pattern 4: Priority Escalation
14.7.2 Role and Context Manipulation
DAN (Do Anything Now) Variant
Developer Mode
Test/Debug Mode
Roleplay Scenarios
Character Adoption
14.7.3 Delimiter and Formatting Attacks
Fake Delimiters
Code Block Injection
Comment Manipulation
14.7.4 Multilingual and Encoding Attacks
Language Switching
Mixed Language
Base64 Encoding
ROT13
Hex Encoding
Unicode Tricks
Leetspeak
14.7.5 Logical and Reasoning Exploits
False Syllogisms
Contradiction Exploitation
Hypotheticals
Meta-Reasoning
Pseudo-Logic
14.7.6 Payload Splitting and Fragmentation
Multi-Turn Buildup
Completion Attacks
Fragmented Instruction
Using Assistant's Own Output
14.8 Red Teaming Prompt Injection: Testing Methodology
14.8.1 Reconnaissance
1. Identifying LLM-Powered Features
Enumeration Questions
2. Understanding System Architecture
Map the Flow
Architecture Discovery
3. Mapping Input Vectors
Enumerate All Input Channels
4. Discovering System Prompts
Techniques
Simple Ask
Indirect Extraction
Delimiter Confusion
Error Exploitation
5. Analyzing Safety Mechanisms
Test What's Filtered
Example Testing
14.8.2 Direct Injection Testing
Structured Approach
Phase 1: Basic Patterns
Phase 2: Encoding Variations
Phase 3: Multi-Turn Attacks
Phase 4: Escalation
Testing All Input Fields
14.8.3 Indirect Injection Testing
⚠️ WARNING: Only test with explicit authorization and in isolated environments
Phase 1: Identifying Data Sources
Phase 2: Crafting Malicious Content
Document Injection (If Authorized)
Web Page Injection (Test Environment)
Phase 3: Testing Retrieval and Processing
Phase 4: Persistence Testing
Phase 5: Conditional Trigger Testing
14.8.4 Plugin and Tool Exploitation
Phase 1: Enumerate Capabilities
Response Analysis
Phase 2: Test Tool Invocation
Phase 3: Test Parameter Manipulation
Phase 4: Test Tool Chaining
Phase 5: Evidence Collection
14.8.5 Evidence Collection
Critical Evidence to Capture
1. Reproduction Steps
Finding: System Prompt Extraction
Reproduction Steps
Expected Behavior
Actual Behavior
2. Request/Response Pairs
3. Screenshots and Videos
4. System Logs (if accessible)
5. Impact Assessment
Impact Analysis
Technical Impact
Business Impact
Affected Users
Exploitability
6. Proof of Concept
14.9 Real-World Prompt Injection Attack Scenarios
Scenario 1: System Prompt Extraction from Customer Support Bot
Attack Execution
Impact
Lessons Learned
Scenario 2: Bing Chat Indirect Injection via Malicious Website (2023)
Attack Setup
User Interaction
Impact
Microsoft's Response
Significance
Scenario 3: Email Assistant Data Exfiltration
Attack Email
Execution
Impact
Detection
Mitigation
Scenario 4: RAG System Document Poisoning in Enterprise
Attack Execution
Phase 1: Document Upload
Phase 2: Persistence
Phase 3: Exploitation
Impact
Detection
Response
Scenario 5: Plugin Hijacking for Unauthorized Financial Transactions
Attack Execution
Reconnaissance
Attack
Vulnerable Bot Behavior
Impact
Actual Defense (Saved This Attack from Succeeding)
Lessons Learned
14.10 Defensive Strategies Against Prompt Injection
14.10.1 Input Sanitization and Filtering
Techniques
1. Blocklists (Pattern Matching)
Pros
Cons
3. Input Length Limits
14.10.2 Prompt Design and Hardening
1. Clear Instruction Hierarchies
2. Delimiter Strategies
3. Signed Instructions (Experimental)
4. Defensive Prompt Patterns
14.10.3 Output Validation and Filtering
1. Sensitive Data Redaction
3. Content Safety Filters
14.10.4 Architectural Defenses
1. Privilege Separation for Different Prompt Types
2. Dual-LLM Architecture
4. Human-in-the-Loop for Sensitive Operations
14.10.5 Monitoring and Detection
1. Anomaly Detection in Prompts
3. User Feedback Loops
5. Real-Time Alerting
14.10.6 The Fundamental Challenge
Why Prompt Injection May Be Unsolvable
Current State
No defense is perfect - the goal is risk reduction, not elimination
14.11 Prompt Injection Testing Checklist
Pre-Testing
Direct Injection Tests
Basic Patterns
Advanced Techniques
Specific Objectives
Indirect Injection Tests (If In Scope)
Document Injection
Web Content Injection
Other Vectors
Plugin/Tool Testing (If Applicable)
Defense Validation
Input Filtering
Output Filtering
Monitoring
Post-Testing
14.12 Tools and Frameworks for Prompt Injection Testing
Manual Testing Tools
1. Browser Developer Tools
Usage
2. Burp Suite / OWASP ZAP
Example Burp Workflow
3. Custom Scripts
Automated Testing Frameworks
1. spikee - Prompt Injection Testing Kit
3. Custom Fuzzer
Payload Libraries
Curated Lists of Known Patterns
Monitoring and Analysis Tools
1. Log Analysis
14.13 Ethical and Legal Considerations
Responsible Testing
Core Principles
1. Always Obtain Authorization
2. Stay Within Scope
3. Avoid Real Harm
Prohibited Actions (Even If Technically Possible)
Safe Testing Practices
4. Responsible Disclosure
Disclosure Process
Legal Risks
1. Computer Fraud and Abuse Act (CFAA) - United States
Relevant Provisions
How Prompt Injection Testing Might Violate
Grey Areas
2. Terms of Service Violations
Common TOS Clauses Prohibiting Security Testing
3. Liability for Unauthorized Access
Scenario Analysis
4. International Legal Variations
European Union: GDPR Considerations
United Kingdom: Computer Misuse Act
Other Jurisdictions
Coordinated Disclosure
Best Practices
1. When to Report
2. Bug Bounty Programs
Advantages
Example Platforms
Typical Prompt Injection Bounties
Severity
Impact
Typical Payout
3. Public Disclosure Timelines
Standard Timeline
4. Credit and Attribution
Proper Credit
14.14 The Future of Prompt Injection
Evolving Attacks
1. AI-Generated Attack Prompts
Implications
2. More Sophisticated Obfuscation
Current
Future
3. Automated Discovery of Zero-Days
4. Cross-Modal Injection
Text-to-Image Models
Audio Models
Evolving Defenses
1. Instruction-Following Models with Privilege Separation
Research Direction
2. Formal Verification
3. Hardware-Backed Prompt Authentication
Concept
4. Constitutional AI and Alignment Research
Anthropic's Constitutional AI
Open Research Questions
1. Is Prompt Injection Fundamentally Solvable?
Pessimistic View
Optimistic View
2. Capability vs. Security Trade-offs
3. Industry Standards and Best Practices
Needed
Emerging Efforts
4. Regulatory Approaches
Potential Regulations
Debate
14.14 Research Landscape
Seminal Papers
Paper
Year
Venue
Contribution
Evolution of Understanding
Current Research Gaps
Recommended Reading
For Practitioners (by time available)
By Focus Area
14.15 Conclusion
Key Takeaways
Recommendations for Red Teamers
Recommendations for Defenders
Next Steps
Quick Reference
Attack Vector Summary
Key Detection Indicators
Primary Mitigation
Pre-Engagement Checklist
Administrative
Technical Preparation
Prompt Injection Specific
Post-Engagement Checklist
Documentation
Cleanup
Reporting
Prompt Injection Specific
End of Chapter 14
Last updated
Was this helpful?


