Indirect Prompt Injection: How Poisoned Documents Hijack Your RAG Pipeline

Contents Overview

The Attack You Didn’t See Coming

A company deploys a customer-facing AI assistant backed by a RAG pipeline. Employees upload product manuals, policy documents, and internal SOPs to the knowledge base. The system works well, until a routine document upload carries a hidden payload: instructions embedded in white text, buried inside a PDF footer, telling the LLM to treat all subsequent user queries as requests for account data and to forward summaries to an external endpoint.

No user did anything wrong. No API was called directly. The model wasn’t jailbroken through a chat interface. The attack entered through a document; trusted, processed without inspection, and injected directly into the model’s context window at query time.

This is indirect prompt injection targeting RAG (Retrieval-Augmented Generation) pipelines, and it’s one of the more underappreciated threat vectors in enterprise AI deployments today. At Pentest Testing Corp, we test these systems as part of our AI penetration testing service, and we see variants of this vulnerability consistently across knowledge-base-backed LLM applications.

indirect_prompt_injection_rag_featured_image

What Is Indirect Prompt Injection?

Prompt injection, broadly, is the manipulation of an LLM’s behavior by introducing instructions that override or subvert the model’s intended system prompt. In direct prompt injection, the attacker controls the user input field and types the malicious instruction themselves. In indirect prompt injection, the attack arrives through data the model retrieves from an external source; a document, a webpage, an email, a database record.

The distinction matters because indirect injection bypasses all controls focused on the user interface. Input sanitization at the chat layer, rate limiting, and session management don’t touch it. The malicious content enters through the retrieval mechanism, not the user channel.

For RAG systems specifically, this creates a particularly clean attack path. The entire design of RAG assumes that retrieved context is safe to inject into the model’s prompt, and in most production deployments, that assumption is never tested.

How RAG Pipelines Create the Attack Surface

To understand the vulnerability, it helps to understand how a typical RAG pipeline processes a user query:

Document ingestion: Source documents (PDFs, Word files, HTML pages, Confluence pages, SharePoint docs) are chunked and converted to embeddings, then stored in a vector database.
Query embedding: At inference time, the user’s query is embedded and used to retrieve the most semantically similar chunks from the vector store.
Context injection: Retrieved chunks are inserted into the prompt sent to the LLM, typically framed as “relevant context” before the user’s actual question.
LLM inference: The model generates a response based on both its system prompt and the injected context.

The attack surface lives at step 1 and step 3. If an attacker can control any content that gets ingested into the knowledge base, even read-only access to upload a file, they can plant instructions that the model will later treat as authoritative context. Most LLMs don’t reliably distinguish between “this is retrieved document content” and “this is an instruction I should follow.” That’s not a bug in any specific model; it’s a fundamental property of how autoregressive language models process token sequences.

Common document ingestion vectors include:

Public file upload endpoints (support ticket attachments, RFP submissions, user-generated content)
Email integrations that auto-ingest attachments into the knowledge base
Web scrapers that index third-party content
CI/CD pipelines that auto-sync documentation repositories
Shared drives without write-access controls

Any of these represents a potential injection point. The attack doesn’t require privileged access, it requires only the ability to get a document into the corpus.

OWASP LLM Top 10 Mapping: LLM01 and LLM04

This attack maps to two categories in the OWASP Top 10 for Large Language Model Applications (2025).

LLM01: Prompt Injection

OWASP LLM01 covers both direct and indirect prompt injection. The indirect variant, where malicious instructions are embedded in external content retrieved by the model, is explicitly listed as the higher-risk subtype in enterprise deployments, because it can affect any user of the system without requiring individual exploitation. A single poisoned document in a shared knowledge base creates a persistent threat that affects every query retrieving that chunk, across every user session.

Key risk factors relevant to RAG:

No authentication required if the upload endpoint is accessible
The malicious content persists until the document is removed and the vector index is updated
The attack scales automatically as more users query the system

LLM04: Data and Model Poisoning

LLM04 addresses scenarios where training data, fine-tuning data, or retrieval-time data is manipulated to influence model behavior. In the RAG context, knowledge base poisoning is a runtime variant of this, the “poisoned” data isn’t baked into model weights, but it’s injected at inference time with similar effect. The distinction from LLM01 is one of intent and scope: LLM04 focuses on the integrity of the data pipeline itself, while LLM01 focuses on the instruction-following behavior that results.

Both categories apply here because the attack both manipulates the retrieval data (LLM04) and triggers unauthorized instruction execution (LLM01).

The NIST AI Risk Management Framework (AI RMF) also identifies data integrity as a core risk category under the “Manage” function, specifically calling out adversarial data injection as a threat to AI system trustworthiness.

Attack Scenario: The Poisoned Policy Document

This is a sanitized, illustrative scenario, not a working exploit or step-by-step weaponizable guide. Details are generic and designed to communicate the threat model to developers and security teams.

Context

An HR platform uses a RAG-backed assistant to answer employee questions about company policy. HR administrators can upload policy documents through a web portal. Documents are automatically chunked, embedded, and stored in a vector database. Employees query the assistant via a chat interface.

What the attacker does

An attacker, who has employee-level access to the HR portal, uploads a document titled updated-remote-work-policy-2025.pdf. The document contains genuine-looking HR policy text, making it plausible to any administrator who reviews uploaded files. However, it also contains additional text rendered in a font color matching the document background: a block of text instructing the AI assistant that whenever it references remote work policy, it should append a link to an external URL for “policy acknowledgment.”

What happens at query time

An employee asks: “What is our policy on working from home while traveling internationally?”

The retrieval system returns the poisoned chunk alongside legitimate policy text. The LLM, which sees no distinction between the retrieved context and genuine instructions, processes the injected text as guidance. Depending on the model, system prompt structure, and guardrails in place, the response may include the external link, pointing to an attacker-controlled page designed to harvest session cookies or credentials.

Why this is realistic – and bounded

A few important caveats worth stating clearly for technical readers: this attack’s success depends heavily on system prompt design. A well-constructed system prompt with explicit context-separation instructions and output restrictions will significantly reduce the model’s compliance with injected instructions. Modern frontier models have improved defenses against obvious injection attempts. The scenario above works more reliably against less hardened deployments and older models.

What makes this threat persistent isn’t that it’s easy to execute at a high confidence rate, it’s that it’s easy to attempt at scale, requires no post-upload interaction, and is nearly invisible to conventional security monitoring.

RAG Security Testing vs. Traditional App Pentest

The test methodology for a RAG-backed LLM application differs substantially from a conventional web application penetration test. Here’s how the two approaches compare:

Dimension	Traditional Web App Pentest	RAG / LLM Pipeline Pentest
Primary attack surface	HTTP endpoints, auth flows, business logic	Document ingestion, retrieval layer, prompt construction
Injection type	SQL, XSS, command injection	Prompt injection, context manipulation, schema confusion
Exploit delivery	Request parameters, cookies, headers	Uploaded documents, scraped content, external data sources
Vulnerability confirmation	Deterministic – SQL error, reflected payload	Probabilistic – model response varies; requires multiple probes
Persistence model	Typically session or state-based	Persistent in vector store until document purged
Tooling	Burp Suite, SQLMap, custom scripts	Custom LLM probing harnesses, vector DB inspection, embedding analysis
OWASP framework	OWASP Web Security Testing Guide	OWASP LLM Top 10 (2025)
Defense validation	WAF rules, parameterized queries	System prompt hardening, input/output sanitization, retrieval guardrails
Output analysis	HTTP response codes, DOM inspection	Natural language response analysis, data leakage patterns
False positive risk	Low – deterministic outputs	Higher – requires statistical sampling and prompt variation

This comparison reflects why organizations can’t simply extend their existing web app testing scope to cover AI systems. The threat model is different, the tools are different, and the required expertise overlaps only partially.

How We Test RAG Pipelines for Indirect Prompt Injection

Our AI penetration testing engagements for RAG systems follow a structured methodology. The goal is to identify exploitable injection paths, characterize their realistic impact, and provide mitigations that development teams can actually implement.

Phase 1: Architecture Review and Threat Modeling

Before any active testing, we map the data flow: what sources feed the vector store, what chunking and embedding pipeline is used, how retrieval results are structured into the final prompt, and what system prompt guardrails are already in place. This informs which injection vectors are actually reachable and what level of trust the model is likely to assign to retrieved content.

We also review the document ingestion controls, upload size limits, file type restrictions, access controls on who can add content to the knowledge base.

Phase 2: Document Ingestion Surface Testing

We probe every path through which external content can enter the knowledge base, including:

Direct upload endpoints (testing file type, size, content validation)
Email-to-knowledge-base integrations
Web crawlers or auto-sync integrations
API endpoints for batch document ingestion

For each path, we assess whether content is sanitized before embedding and whether any pre-ingestion inspection occurs.

Phase 3: Injection Payload Testing

We craft a range of test documents containing payloads of varying complexity, from obvious instruction strings to content that mimics the linguistic register of the target system prompt. We test different rendering methods (visible text, whitespace text, metadata fields, structured data fields within PDFs and DOCX files) to characterize which delivery methods are most reliably retrieved and most likely to influence model output.

We intentionally avoid anything that would constitute a working operational exploit, the goal is to confirm exploitability and characterize the attack surface, not to produce a toolkit.

Phase 4: Output Analysis and Impact Characterization

LLM outputs are non-deterministic, so we probe each injection vector across multiple queries and temperature settings. We document response patterns, identify which injections produced meaningful behavioral change, and assess realistic downstream impact, distinguishing between scenarios that could lead to data leakage, user redirection, or privilege escalation versus scenarios that produce noise but no meaningful security consequence.

Phase 5: Remediation Validation

Once mitigations are implemented, we retest the injection vectors to confirm that the changes were effective. We also test for regression, implementations that fix the obvious injection path while leaving adjacent vectors unaddressed.

Mitigations That Actually Work

No single control eliminates indirect prompt injection risk, but a layered approach significantly reduces exploitability.

Structural prompt separation. Clearly delineating in the system prompt what constitutes “retrieved context” versus “user instructions”, and explicitly instructing the model to treat retrieved content as data, not commands, reduces but doesn’t eliminate model compliance with injected instructions. This is the most immediately implementable control and should be the first thing any team implements.

Pre-ingestion content inspection. Before documents are chunked and embedded, run them through a content inspection layer that flags anomalies: text rendered in background-matching colors, whitespace-heavy blocks, unusually structured metadata, or heuristically detected instruction patterns. This won’t catch every payload but eliminates the low-effort attempts that constitute the majority of real-world attempts.

Ingestion access controls. Limit who can upload to the knowledge base. In many deployments, this is either public or accessible to all authenticated users, both are overly permissive for a system with this attack surface. Document-level provenance tracking (logging which user uploaded what, when) also helps with incident response.

Output filtering. A secondary filtering layer on LLM responses that checks for anomalous patterns; unexpected external links, data fields not present in the user query, unusual response structures, can catch a subset of successful injections before they reach the user.

Semantic similarity monitoring. Logging which retrieved chunks contributed to each response, and monitoring for chunks that appear disproportionately often or that consistently appear in anomalous response patterns, enables detection of active poisoning campaigns.

Regular RAG security testing. Given that the knowledge base is dynamic, new documents are added continuously, security posture can degrade quickly. Periodic penetration testing of the ingestion pipeline and retrieval behavior, rather than a one-time assessment, reflects the actual threat model.

Frequently Asked Questions

What is indirect prompt injection in a RAG system?

Indirect prompt injection is an attack where malicious instructions are embedded inside external content; such as a PDF, Word document, or webpage, that gets retrieved by a RAG pipeline and injected into the LLM’s context window. Unlike direct prompt injection (where the attacker controls the chat input), indirect injection exploits the retrieval mechanism itself. The model receives the attacker’s instructions as part of the “trusted” context it’s been designed to process, and may follow them without any user interaction.

How is this different from a standard prompt injection attack?

Standard (direct) prompt injection requires the attacker to control the user input field, the chat box, API parameter, or similar interface. Indirect injection doesn’t. The attack vector is the data pipeline: any content that can be ingested into the knowledge base becomes a potential injection point. This is significant because it means indirect injection can persist in the system, affect multiple users, and bypass input validation controls entirely.

Does this only affect certain LLMs?

No. The vulnerability is a property of RAG architecture, not a specific model. All autoregressive language models that process retrieved context as part of their input token sequence are theoretically susceptible to indirect prompt injection, because they don’t have a reliable native mechanism to distinguish “this is data” from “this is an instruction.” More capable models with better instruction-following tend to be more susceptible in some configurations, not less, they’re better at following injected instructions precisely. That said, system prompt design, output guardrails, and model-level safety training all influence practical exploitability.

What types of documents can carry indirect prompt injection payloads?

Any document format that gets parsed and embedded by the ingestion pipeline can carry a payload. PDFs are particularly common because they support invisible text layers, metadata fields, and annotations that parsing tools may extract without visual review. Word documents (DOCX), HTML files, plain text, Markdown, and structured data files like JSON and CSV are all potential carriers. The attack surface is as broad as the file types your ingestion pipeline accepts.

How do we know if our RAG pipeline has been compromised by a poisoned document?

Standard application monitoring won’t detect this. Signs that may indicate active exploitation include: LLM responses that reference topics not present in the user’s query, responses containing unexpected external links or data structures, unusual spikes in a particular document chunk appearing in retrieval logs, and user-reported responses that seem “off” in context. However, many poisoned knowledge bases go undetected because the injected content produces subtle behavioral changes rather than obvious anomalies. Proactive security testing of the ingestion pipeline and retrieval behavior is the most reliable detection method.

What compliance frameworks address this risk?

The OWASP Top 10 for LLM Applications (2025) addresses this under LLM01 (Prompt Injection) and LLM04 (Data and Model Poisoning). The NIST AI Risk Management Framework covers data integrity as a core risk under its “Manage” function. For organizations subject to SOC 2, the security and availability trust service criteria apply to AI systems used to process or transmit customer data. ISO/IEC 42001 (AI Management System standard) and the EU AI Act’s risk-based requirements are increasingly relevant for enterprise AI deployments, particularly those classified as high-risk applications.

Get Your RAG Pipeline Tested

Indirect prompt injection through document poisoning is a real, testable vulnerability, and one that most organizations have never assessed. It doesn’t require a sophisticated attacker, it doesn’t leave obvious traces, and it persists in your system for as long as the poisoned document remains in the knowledge base.

Our AI penetration testing service covers the full RAG attack surface: ingestion pipeline controls, retrieval layer security, prompt construction review, and injection payload testing across all reachable document ingestion paths. Every engagement is led by certified practitioners with hands-on experience across web, API, and AI application security.

If you’re deploying or planning to deploy a RAG-backed AI system, or if one is already in production, this is worth a conversation.

Book a 30-minute scoping call and we’ll tell you exactly what a RAG security assessment would cover for your stack.

Indirect Prompt Injection: How Poisoned Documents Hijack Your RAG Pipeline

The Attack You Didn’t See Coming

What Is Indirect Prompt Injection?

How RAG Pipelines Create the Attack Surface