7. Lab Setup and Environmental Safety

Lab Setup Header

This chapter provides a comprehensive technical blueprint for constructing a secure, isolated AI red teaming environment. It covers architectural strategies for containing "runaway" agents, hardware sizing for local inference, robust harness development for stochastic testing, and essential operational safety protocols to prevent production drift and financial exhaustion.

7.1 Introduction

Red teaming Artificial Intelligence is fundamentally different from traditional penetration testing. In standard network security, a "sandbox" usually implies a virtual machine used to detonate malware. In AI red teaming, the "sandbox" must contain not just code execution, but cognitive execution—the ability of an agent to reason, plan, and execute multi-step attacks that may escape trivial boundaries.

Why This Matters

Without a rigorously isolated environment, AI red teaming operations risk catastrophic side effects. Testing a "jailbreak" against a production model can leak sensitive attack prompts into trusted telemetry logs, inadvertently training the model to recognize (and potentially learn from) the attack. Furthermore, the rise of Agentic AI—models capable of writing and executing their own code—introduces the risk of "breakout," where a tested agent autonomously exploits the testing infrastructure itself.

  • Data Contamination: In 2023, several organizations reported that proprietary "red team" prompts leaked into public training datasets via API logs, permanently embedding sensitive vulnerability data into future model iterations.

  • Financial Denial of Service: Automated fuzzing loops, if left unchecked, can consume tens of thousands of dollars in API credits in minutes. One researcher famously burned $2,500 in 15 minutes due to a recursive retry loop in an evaluation script.

  • Infrastructure Drift: Non-deterministic model behavior means that a test run today may yield different results tomorrow. A controlled lab is the only way to isolate variables and achieve scientific reproducibility.

Key Concepts

  • Stochastic Reproducibility: The ability to statistically reproduce findings despite the inherent randomness of LLM token generation.

  • Cognitive Containment: Limiting an AI's ability to plan outside its intended scope, distinct from checking for simple code execution.

  • Inference Isolation: Separating the model's compute environment from the control plane to prevent resource exhaustion attacks or side-channel leakage.

Theoretical Foundation

Why This Works (Model Behavior)

The necessity of physically and logically isolated labs stems from the underlying mechanics of modern Transformers and their deployment:

  • Architectural Factor (Side-Channels): Transformers process tokens sequentially. This creates timing and power consumption side-channels that can be measured to infer the content of prompts or the model's internal state.

  • Training Artifact (Memorization): Large models have a high capacity for memorization. "Testing in production" risks the model memorizing the attack patterns, effectively "burning" the red team's capabilities.

  • Input Processing (Unbounded Context): Agentic loops typically feed the model's output back as input. Without strict environmental limits, this feedback loop can spiral into infinite resource consumption or unintentional recursive self-improvement attempts.

Chapter Scope

We will cover the complete architecture of a "Red Team Home Lab," from VRAM calculations and GPU selection to network namespaces and Docker isolation. We will build a custom, stochastic-aware testing harness in Python and examine real-world case studies of lab failures.


7.2 Secure Lab Architecture

Effective red teaming requires a sandbox that mimics production without exposing the organization to risk. The architecture must balance isolation (safety) with replicability (scientific rigor).

Isolation Strategies: Docker vs. Virtual Machines

A lead architect must choose the appropriate layer for segmentation based on the "level of agency" being tested.

Isolation Method
Pros
Cons
Best For

Docker Containers

Low overhead; easy GPU passthrough (NVIDIA Container Toolkit); rapid tear-down/spin-up.

Shared kernel; weaker isolation against privilege escalation or kernel exploits.

High-volume fuzzing; testing prompt injection; standard API probing.

Virtual Machines (VMs)

Strong kernel isolation; distinct OS environments; easier network segmentation.

Higher resource overhead; GPU passthrough (PCIe) can be complex to configure.

Testing "Agent" breakout; malware generation; Remote Code Execution (RCE) exploits.

Air-Gapping

Maximum security; zero network leakage.

Updates and telemetry transfer require "sneakernet" (USB); high operational friction.

Testing malware generation; analyzing sensitive proprietary weights.

How Network Isolation Works

Architectural diagram of the Docker-based isolated red team lab environment.

Below is the logical topology for a standard isolated red team lab.

spinner

Mechanistic Explanation

At the system level, we rely on Linux namespaces and cgroups (in Docker) or hardware virtualization (in VMs) to contain the threat.

  1. Network Namespaces: We use --network none or create an internal network with docker network create --internal <name> to ensure the inference engine has no route to the internet. This prevents the model from "phoning home" or an agent from downloading external payloads.

  2. Resource Cgroups: We strictly limit CPU cycles and memory. This prevents a "Denial of Service" attack where a model enters an infinite generation loop and freezes the host.

  3. Read-Only Mounts: The model weights themselves should always be mounted as Read-Only. An advanced attack vector involves an agent modifying its own weights or configuration files to persist a backdoor.


Detailed Network Isolation Implementation

Attacker Workstation Dockerfile

Starting the Lab

VM-Based Isolation

Use dedicated VMs for stronger isolation. Virtual machines provide excellent containment—a compromised VM cannot easily escape to the host system.

VirtualBox Setup (No GPU Support)

Proxmox/QEMU Setup (GPU Passthrough Possible)

QEMU/KVM with PCI passthrough creates strong isolation with GPU access, but it's an advanced configuration that dedicates the entire GPU to one VM.

Firewall Rules (iptables)

7.3 Hardware & Resource Planning

LLM inference is memory-bandwidth bound. Your hardware choice dictates the size of the model you can test and the speed of your attacks.

Local Hardware Requirements

To run models locally (essential for testing white-box attacks or avoiding API leaks), Video RAM (VRAM) is the constraint.

Precision Matches: Models are typically trained in FP16 (16-bit). Quantization (reducing to 8-bit or 4-bit) dramatically lowers VRAM with minimal accuracy degradation.

Model Size
Precision (Bit-depth)
VRAM Requirement
Hardware Strategy

7B

8-bit (Standard)

~8GB

Single RTX 3060/4060

7B

4-bit (Compressed)

~5GB

Entry-level Consumer GPU

70B

16-bit (FP16)

~140GB

Enterprise Cluster (A100/H100)

70B

8-bit (Quantized)

~80GB

2x A6000 or 4x 3090/4090

70B

4-bit (GPTQ/AWQ)

~40GB

Single A6000 or 2x 3090/4090

Local vs. Cloud (RunPod / Vast.ai)

If you lack local hardware, "renting" GPU compute is viable, but comes with OPSEC caveats.

  • Hyperscalers (AWS/Azure): High cost, high security. Best for emulating enterprise environments.

  • GPU Clouds (RunPod, Lambda, Vast.ai): Cheap, bare-metal access. WARNING: These are "noisy neighbors." Do not upload highly sensitive data or proprietary unreleased weights here. Use them only for testing public models or synthetic data.


7.4 Local LLM Deployment

For red teaming, you need full control over the inference parameters (temperature, system prompts). Reliance on closed APIs (like OpenAI) limits visibility and risks account bans.

Inference Engines

Network diagram showing API traffic interception and analysis using a proxy.

Select the engine based on your testing vector:

  • Ollama: Best for Rapid Prototyping. Easy CLI, OpenAI-compatible API.

  • vLLM: Best for Throughput/DoS Testing. Uses PagedAttention for high-speed token generation.

  • llama.cpp: Best for Portability. Runs on CPU/Mac Silicon if GPU is unavailable.

Practical Example: Setting up vLLM for Red Teaming

What This Code Does

This setup script pulls a Docker image for vLLM, which provides a high-performance, OpenAI-compatible API server. This allows you to point your attack tools (which likely expect OpenAI's format) at your local, isolated model.


Ollama is the simplest way to get up and running with an OpenAI-compatible API.

Installation (Ollama)

Pulling Test Models

Running the Ollama Server

Python Integration

Option C: Text-Generation-WebUI (Full GUI)

This gives you a web interface for model management and testing.

Option D: llama.cpp (Lightweight, Portable)

Best for CPU inference or minimal setups.

7.5 Practical Tooling: The Attack Harness

A scriptable testing harness is essential to move beyond manual probing and achieve high-coverage adversarial simulation. Stochasticity—the randomness of LLM outputs—means a single test is never enough.

Core Python Environment

Garak (The LLM Vulnerability Scanner)

Garak is an open-source tool that automates LLM vulnerability scanning. Use it for:

  • Baseline assessments: Quickly test a model against known attack categories.

  • Regression testing: Verify that model updates haven't introduced new bugs.

  • Coverage: Garak includes hundreds of probes for prompt injection, jailbreaking, and more.

  • Reporting: Generates structured output for your documentation.

Treat Garak as your first pass—it finds low-hanging fruit. You still need manual testing for novel attacks.

Practical Example: harness.py

Flowchart illustrating the execution lifecycle of the custom Python test harness.

What This Code Does

This Python script is a modular testing framework. It:

  1. Iterates through test cases multiple times to account for randomness.

  2. Detects Refusals using a heuristic keyword list (to differentiate a "safe" refusal from a "jailbreak").

  3. Logs Results in a structured JSONL format for forensic analysis.

  4. Calculates Latency to detect potential algorithmic complexity attacks (timings).

Key Components

  1. TestCase Dataclass: Defines the structure of an attack payload.

  2. is_refusal: A simple classifier to determine if the attack succeeded.

  3. RedTeamHarness: The orchestrator that manages the connection to the LLM and the test loop.

Success Metrics

  • Refusal Rate: What percentage of malicious prompts were successfully blocked?

  • Leaked Information: Did any response contain PII or internal system instructions?

  • Consistency: Did the model refuse the prompt 10/10 times, or only 8/10? (The latter is a failure).


7.6 Operational Safety and Monitoring

When operating an AI red team lab, operational safety is paramount to prevent runaway costs, accidental harm, or legal liability.

Detection Methods

Detection Method 1: Financial Anomaly Detection

  • What: Monitoring API usage velocity.

  • How: Middleware scripts that track token usage rolling averages.

  • Effectiveness: High for preventing "Denial of Wallet".

  • False Positive Rate: Low.

Detection Method 2: Resource Spikes

  • What: Monitoring CPU/GPU pinning.

  • How: Use nvidia-smi or Docker stats.

  • Rationale: A model entered into an infinite loop often pins the GPU at 100% utilization with 0% variation for extended periods.

Mitigation and Defenses

Defense Strategy: The Kill Switch

For autonomous agents (models that can execute code or browse the web), a "Kill Switch" is a mandatory requirement.

[!CAUTION] Autonomous Agent Risk: An agent given shell access to "fix" a bug in the lab might decide that the "fix" involves deleting the logs or disabling the firewall. Never run agents with root privileges.

Logic flowchart of the safety watchdog script, monitoring Docker stats and triggering a container kill command if thresholds are exceeded.

Implementation: A simple watchdog script that monitors the docker stats output. If a container exceeds a defined threshold (e.g., Network I/O > 1GB or Runtime > 10m), it issues a docker stop -t 0 command, instantly killing the process.

Comprehensive Kill Switch Script

Watchdog Timer

The watchdog timer provides automatic shutdown if you forget to stop testing, step away, or if an automated test runs too long. Use it daily.

Rate Limiter

Rate limiting prevents cost overruns and limits risks of getting blocked. This token bucket implementation provides dual protection.

Cost Tracking System

This Python tracker monitors usage against a hard budget. It's safer than relying on provider dashboards which often have hours of latency.

Engagement Budget Template

Plan your spending before you start. Allocating budget by phase forces you to prioritize high-value testing.


7.7 Advanced Techniques

GPU Passthrough for Maximum Isolation

While Docker is convenient, it shares the host kernel. For testing malware generation capabilities or advanced RCE exploits, you must use a VM. Standard VirtualBox does not support GPU passthrough well. You must use KVM/QEMU with IOMMU enabled in the BIOS.

This links the physical GPU PCIe lane directly to the VM. The host OS loses access to the GPU, but the VM aims effectively "metal-level" performance with complete kernel isolation.

Simulating Multi-Agent Systems

Advanced labs simulate "Federations"—groups of agents interacting.

  • Agent A (Attacker): Red Team LLM.

  • Agent B (Defender): Blue Team LLM monitoring chat logs.

  • Environment: A shared message bus (like RabbitMQ) or a chat interface.

This setup allows testing "Indirect Prompt Injection", where the Attacker poisons the data source that the Defender reads, causing the Defender to get compromised.


7.8 Research Landscape

Seminal Papers

Paper
Year
Contribution

2024

Demonstrated "Code Escape" where prompt injection leads to Remote Code Execution in LLM frameworks [1].

2024

Showed how network traffic patterns (packet size/timing) can leak the topic of LLM prompts even over encrypted connections [2].

2025

Introduced a benchmark for evaluating the security of sandboxes against code generated by LLMs [3].

Current Research Gaps

  1. Stochastic Assurance: How many iterations are statistically sufficient to declare a model "safe" from a specific jailbreak?

  2. Side-Channel Mitigation: Can we pad token generation times to prevent timing attacks without destroying user experience?

  3. Agentic Containment: Standard containers manage resources, but how do we manage "intent"?


7.9 Case Studies

Case Study 1: The "Denial of Wallet" Loop

Incident Overview

  • Target: Internal Research Lab.

  • Impact: $4,500 API Bill in overnight run.

  • Attack Vector: Self-Recursive Agent Loop.

Attack Timeline

  1. Setup: A researcher set up an agent to "critique and improve" its own code.

  2. Glitch: The agent got stuck in a loop where it outputted "I need to fix this" but never applied the fix, retrying immediately.

  3. Exploitation: The script had no max_retries or budget limit. It ran at 50 requests/minute for 12 hours.

  4. Discovery: Accounting alert the next morning.

Lessons Learned

  • Hard Limits: Always set max_iterations in your harness (as seen in our harness.py).

  • Budget Caps: Use OpenAI/AWS "Hard Limits" in the billing dashboard, not just soft alerts.

Case Study 2: The Data Leak

Incident Overview

  • Target: Financial Services Finetuning Job.

  • Impact: Customer PII leaked into Red Team Logs.

  • Attack Vector: Production Data usage in Test Environment.

Key Details

The red team used a dump of "production databases" to test if the model would leak PII. It succeeded—but they logged the successful responses (containing real SSNs) into a shared Splunk instance that was readable by all developers.

Lessons Learned

  • Synthetic Data: use Faker or similar tools to generate test data. Never use real PII in a red team lab.

  • Log Sanitization: Configure the harness.py to scrub sensitive patterns (like credit card regex) before writing to the jsonl log file.


7.10 Conclusion

Building a Red Team lab is not just about installing Python and Docker. It is about creating a controlled, instrumented environment where dangerous experiments can be conducted safely.

Chapter Takeaways

  1. Isolation is Non-Negotiable: Use Docker for speed, VMs for maximum security.

  2. Stochasticity Requires Repetition: A single test pass means nothing. Use the harness.py loop.

  3. Safety First: Kill switches and budget limits prevent the Red Team from becoming the incident.

Next Steps

  • Chapter 8: Automated Vulnerability Scanning (Building on the harness).

  • Practice: Set up a local vLLM instance and run the harness.py against it with a basic prompt injection test.


Appendix A: Pre-Engagement Checklist

Lab Readiness

Appendix B: Post-Engagement Checklist

Cleanup


Last updated

Was this helpful?