15. Data Leakage and Extraction

15.1 Introduction to Data Leakage in LLMs
15.1.1 Definition and Scope
What constitutes data leakage in AI/LLM systems
Difference between intended vs. unintended data exposure
Impact on privacy, security, and compliance
15.1.2 Types of Sensitive Data at Risk
Training data exposure
User conversation history
System prompts and instructions
API keys and credentials
Personally Identifiable Information (PII)
Proprietary business information
Theoretical Foundation
Why This Works (Model Behavior)
Foundational Research
Paper
Key Finding
Relevance
What This Reveals About LLMs
15.2 Training Data Extraction Attacks
15.2.1 Memorization in Large Language Models
How LLMs memorize training data
Factors affecting memorization

Verbatim vs. near-verbatim extraction
15.2.2 Extraction Techniques
Direct prompting for known data
Completion attacks
Prefix-suffix attacks
Temperature and sampling manipulation
15.2.3 Targeted vs. Untargeted Extraction
Untargeted extraction (fishing expeditions)
Targeted extraction
Statistical approaches
15.3 Conversation History and Context Leakage
15.3.1 Cross-User Data Leakage
Shared context bleeding between users

Attack vectors
Session management vulnerabilities
Testing approach
Multi-tenant isolation failures
15.3.2 Temporal Leakage Patterns
Information persistence across sessions
Testing
Cache-based leakage
Model fine-tuning contamination
15.3.3 Extraction Techniques
Context probing attacks
Indirect reference exploitation
Conversation replay attacks
15.4 System Prompt and Instruction Extraction
15.4.1 Why System Prompts are Valuable
Understanding model constraints
Bypassing safety measures
Reverse engineering business logic
15.4.2 Extraction Methods
Direct interrogation techniques
Instruction inference from behavior
Boundary testing and error analysis
Role-playing and context switching
15.4.3 Advanced Extraction Tactics
Recursive prompt extraction
Encoding and obfuscation bypass
Multi-step extraction chains
Jailbreak + extraction combinations
15.5 Credential and Secret Extraction
15.5.1 Common Credential Leakage Vectors
Hardcoded secrets in training data
API keys in documentation
Configuration exposure
Environment variable leakage
15.5.2 Extraction Techniques
Pattern-based probing
Context manipulation for secret revelation
Code generation exploitation
15.5.3 Post-Extraction Validation
Testing extracted credentials
Scope assessment
Impact analysis
Responsible disclosure
15.6 PII and Personal Data Extraction
15.6.1 Types of PII in LLM Systems
User-submitted data
Training corpus PII
Synthetic data that resembles real PII
15.6.2 Regulatory Considerations
GDPR implications
CCPA compliance
Right to be forgotten challenges
15.6.3 Extraction and Detection
Targeted PII extraction techniques
Automated PII discovery
Volume-based extraction attacks
15.7 Model Inversion and Membership Inference
15.7.1 Model Inversion Attacks
Reconstructing training data from model outputs
Attribute inference
Feature extraction
15.7.2 Membership Inference Attacks
Determining if specific data was in training set
Method
Confidence-based detection
Shadow model techniques
15.7.3 Practical Implementation
Tools and frameworks
Success metrics
Limitations and challenges
15.8 Side-Channel Data Leakage
15.8.1 Timing Attacks
Response time analysis
What timing reveals
Token generation patterns
Rate limiting inference
15.8.2 Error Message Analysis
Information disclosure through errors
Stack traces and debugging information
Differential error responses
15.8.3 Metadata Leakage
HTTP headers and cookies
API response metadata
Version information disclosure
15.9 Automated Data Extraction Tools
15.9.1 Custom Scripts and Frameworks
Python-based extraction tools
API automation
Response parsing and analysis
15.9.2 Commercial and Open-Source Tools
Available extraction frameworks
Custom tool development
15.9.3 Building Your Own Extraction Pipeline
Architecture considerations
Rate limiting and detection avoidance
Data collection and analysis
15.10 Detection and Monitoring
15.10.1 Detecting Extraction Attempts
Anomalous query patterns
High-volume requests
Suspicious prompt patterns
15.10.2 Monitoring Solutions
Logging and alerting
Behavioral analysis
ML-based detection systems
15.10.3 Response Strategies
Incident response procedures
User notification
Evidence preservation
15.11 Mitigation and Prevention
15.11.1 Data Sanitization
Pre-training data cleaning
PII removal and anonymization
Secret scanning and removal
15.11.2 Technical Controls
Output filtering and redaction
Differential privacy techniques
Context isolation and sandboxing
Rate limiting and throttling
15.11.3 Architectural Mitigations
Zero Trust design principles
Least privilege access
Data segmentation
Secure model deployment
15.11.4 Policy and Governance
Data retention policies
Access control procedures
Incident response plans
Data Leakage Incident Response Plan
User education and awareness
User Security Training for LLM Systems
15.12 Case Studies and Real-World Examples
15.12.1 Notable Data Leakage Incidents
Samsung ChatGPT data leak (2023)
GitHub Copilot secret exposure
ChatGPT conversation history bug (March 2023)
15.12.2 Research Findings
Example: Testing memorization on different models
Attack Type
Success Rate
Cost
Complexity
15.12.3 Lessons Learned
Common patterns in incidents
Effective vs. ineffective mitigations
Industry best practices
Data Leakage Prevention Best Practices
15.13 Testing Methodology
15.13.1 Reconnaissance Phase
Information gathering
Attack surface mapping
Baseline behavior analysis
15.13.2 Exploitation Phase
Systematic extraction attempts
Iterative refinement
Documentation and evidence
15.13.3 Reporting and Remediation
Finding classification and severity
Proof of concept development
Remediation recommendations
Retesting procedures
15.14 Ethical and Legal Considerations
15.14.1 Responsible Disclosure
Coordinated vulnerability disclosure
Vendor Notification
Initial Contact Template
Disclosure Timeline
Disclosure timelines
Severity
Initial Response Expected
Fix Timeline
Public Disclosure
Communication best practices
15.14.2 Legal Boundaries
Computer Fraud and Abuse Act (CFAA)
Terms of Service compliance
International regulations
15.14.3 Ethical Testing Practices
Scope limitation
Data handling and destruction
User privacy protection
Authorization and consent
Authorization Checklist
15.15 Summary and Key Takeaways
Critical Vulnerabilities in Data Handling
Most Effective Extraction Techniques
Essential Mitigation Strategies
Future Trends and Emerging Threats
15.16 Structured Conclusion
Key Takeaways
Recommendations for Red Teamers
Recommendations for Defenders
Next Steps
Quick Reference
Attack Vector Summary
Key Detection Indicators
Primary Mitigation
Pre-Engagement Checklist
Administrative
Technical Preparation
Data Leakage Specific
Post-Engagement Checklist
Documentation
Cleanup
Reporting
Data Leakage Specific
15.15 Research Landscape
Seminal Papers
Paper
Year
Venue
Contribution
Evolution of Understanding
Current Research Gaps
Recommended Reading
For Practitioners (by time available)
By Focus Area
15.16 Conclusion
Last updated
Was this helpful?

