Indirect Prompt Injection: How Poisoned Documents Hijack Your RAG Pipeline


The Attack You Didn’t See Coming

A company deploys a customer-facing AI assistant backed by a RAG pipeline. Employees upload product manuals, policy documents, and internal SOPs to the knowledge base. The system works well, until a routine document upload carries a hidden payload: instructions embedded in white text, buried inside a PDF footer, telling the LLM to treat all subsequent user queries as requests for account data and to forward summaries to an external endpoint.

No user did anything wrong. No API was called directly. The model wasn’t jailbroken through a chat interface. The attack entered through a document; trusted, processed without inspection, and injected directly into the model’s context window at query time.

This is indirect prompt injection targeting RAG (Retrieval-Augmented Generation) pipelines, and it’s one of the more underappreciated threat vectors in enterprise AI deployments today. At Pentest Testing Corp, we test these systems as part of our AI penetration testing service, and we see variants of this vulnerability consistently across knowledge-base-backed LLM applications.

indirect_prompt_injection_rag_featured_image

What Is Indirect Prompt Injection?

Prompt injection, broadly, is the manipulation of an LLM’s behavior by introducing instructions that override or subvert the model’s intended system prompt. In direct prompt injection, the attacker controls the user input field and types the malicious instruction themselves. In indirect prompt injection, the attack arrives through data the model retrieves from an external source; a document, a webpage, an email, a database record.

The distinction matters because indirect injection bypasses all controls focused on the user interface. Input sanitization at the chat layer, rate limiting, and session management don’t touch it. The malicious content enters through the retrieval mechanism, not the user channel.

For RAG systems specifically, this creates a particularly clean attack path. The entire design of RAG assumes that retrieved context is safe to inject into the model’s prompt, and in most production deployments, that assumption is never tested.


How RAG Pipelines Create the Attack Surface

To understand the vulnerability, it helps to understand how a typical RAG pipeline processes a user query:

  1. Document ingestion: Source documents (PDFs, Word files, HTML pages, Confluence pages, SharePoint docs) are chunked and converted to embeddings, then stored in a vector database.
  2. Query embedding: At inference time, the user’s query is embedded and used to retrieve the most semantically similar chunks from the vector store.
  3. Context injection: Retrieved chunks are inserted into the prompt sent to the LLM, typically framed as “relevant context” before the user’s actual question.
  4. LLM inference: The model generates a response based on both its system prompt and the injected context.

The attack surface lives at step 1 and step 3. If an attacker can control any content that gets ingested into the knowledge base, even read-only access to upload a file, they can plant instructions that the model will later treat as authoritative context. Most LLMs don’t reliably distinguish between “this is retrieved document content” and “this is an instruction I should follow.” That’s not a bug in any specific model; it’s a fundamental property of how autoregressive language models process token sequences.

Common document ingestion vectors include:

  • Public file upload endpoints (support ticket attachments, RFP submissions, user-generated content)
  • Email integrations that auto-ingest attachments into the knowledge base
  • Web scrapers that index third-party content
  • CI/CD pipelines that auto-sync documentation repositories
  • Shared drives without write-access controls

Any of these represents a potential injection point. The attack doesn’t require privileged access, it requires only the ability to get a document into the corpus.


OWASP LLM Top 10 Mapping: LLM01 and LLM04

This attack maps to two categories in the OWASP Top 10 for Large Language Model Applications (2025).

LLM01: Prompt Injection

OWASP LLM01 covers both direct and indirect prompt injection. The indirect variant, where malicious instructions are embedded in external content retrieved by the model, is explicitly listed as the higher-risk subtype in enterprise deployments, because it can affect any user of the system without requiring individual exploitation. A single poisoned document in a shared knowledge base creates a persistent threat that affects every query retrieving that chunk, across every user session.

Key risk factors relevant to RAG:

  • No authentication required if the upload endpoint is accessible
  • The malicious content persists until the document is removed and the vector index is updated
  • The attack scales automatically as more users query the system

LLM04: Data and Model Poisoning

LLM04 addresses scenarios where training data, fine-tuning data, or retrieval-time data is manipulated to influence model behavior. In the RAG context, knowledge base poisoning is a runtime variant of this, the “poisoned” data isn’t baked into model weights, but it’s injected at inference time with similar effect. The distinction from LLM01 is one of intent and scope: LLM04 focuses on the integrity of the data pipeline itself, while LLM01 focuses on the instruction-following behavior that results.

Both categories apply here because the attack both manipulates the retrieval data (LLM04) and triggers unauthorized instruction execution (LLM01).

The NIST AI Risk Management Framework (AI RMF) also identifies data integrity as a core risk category under the “Manage” function, specifically calling out adversarial data injection as a threat to AI system trustworthiness.


Attack Scenario: The Poisoned Policy Document

This is a sanitized, illustrative scenario, not a working exploit or step-by-step weaponizable guide. Details are generic and designed to communicate the threat model to developers and security teams.

Context

An HR platform uses a RAG-backed assistant to answer employee questions about company policy. HR administrators can upload policy documents through a web portal. Documents are automatically chunked, embedded, and stored in a vector database. Employees query the assistant via a chat interface.

What the attacker does

An attacker, who has employee-level access to the HR portal, uploads a document titled updated-remote-work-policy-2025.pdf. The document contains genuine-looking HR policy text, making it plausible to any administrator who reviews uploaded files. However, it also contains additional text rendered in a font color matching the document background: a block of text instructing the AI assistant that whenever it references remote work policy, it should append a link to an external URL for “policy acknowledgment.”

What happens at query time

An employee asks: “What is our policy on working from home while traveling internationally?”

The retrieval system returns the poisoned chunk alongside legitimate policy text. The LLM, which sees no distinction between the retrieved context and genuine instructions, processes the injected text as guidance. Depending on the model, system prompt structure, and guardrails in place, the response may include the external link, pointing to an attacker-controlled page designed to harvest session cookies or credentials.

Why this is realistic – and bounded

A few important caveats worth stating clearly for technical readers: this attack’s success depends heavily on system prompt design. A well-constructed system prompt with explicit context-separation instructions and output restrictions will significantly reduce the model’s compliance with injected instructions. Modern frontier models have improved defenses against obvious injection attempts. The scenario above works more reliably against less hardened deployments and older models.

What makes this threat persistent isn’t that it’s easy to execute at a high confidence rate, it’s that it’s easy to attempt at scale, requires no post-upload interaction, and is nearly invisible to conventional security monitoring.


RAG Security Testing vs. Traditional App Pentest

The test methodology for a RAG-backed LLM application differs substantially from a conventional web application penetration test. Here’s how the two approaches compare:

DimensionTraditional Web App PentestRAG / LLM Pipeline Pentest
Primary attack surfaceHTTP endpoints, auth flows, business logicDocument ingestion, retrieval layer, prompt construction
Injection typeSQL, XSS, command injectionPrompt injection, context manipulation, schema confusion
Exploit deliveryRequest parameters, cookies, headersUploaded documents, scraped content, external data sources
Vulnerability confirmationDeterministic – SQL error, reflected payloadProbabilistic – model response varies; requires multiple probes
Persistence modelTypically session or state-basedPersistent in vector store until document purged
ToolingBurp Suite, SQLMap, custom scriptsCustom LLM probing harnesses, vector DB inspection, embedding analysis
OWASP frameworkOWASP Web Security Testing GuideOWASP LLM Top 10 (2025)
Defense validationWAF rules, parameterized queriesSystem prompt hardening, input/output sanitization, retrieval guardrails
Output analysisHTTP response codes, DOM inspectionNatural language response analysis, data leakage patterns
False positive riskLow – deterministic outputsHigher – requires statistical sampling and prompt variation

This comparison reflects why organizations can’t simply extend their existing web app testing scope to cover AI systems. The threat model is different, the tools are different, and the required expertise overlaps only partially.


How We Test RAG Pipelines for Indirect Prompt Injection

Our AI penetration testing engagements for RAG systems follow a structured methodology. The goal is to identify exploitable injection paths, characterize their realistic impact, and provide mitigations that development teams can actually implement.

Phase 1: Architecture Review and Threat Modeling

Before any active testing, we map the data flow: what sources feed the vector store, what chunking and embedding pipeline is used, how retrieval results are structured into the final prompt, and what system prompt guardrails are already in place. This informs which injection vectors are actually reachable and what level of trust the model is likely to assign to retrieved content.

We also review the document ingestion controls, upload size limits, file type restrictions, access controls on who can add content to the knowledge base.

Phase 2: Document Ingestion Surface Testing

We probe every path through which external content can enter the knowledge base, including:

  • Direct upload endpoints (testing file type, size, content validation)
  • Email-to-knowledge-base integrations
  • Web crawlers or auto-sync integrations
  • API endpoints for batch document ingestion

For each path, we assess whether content is sanitized before embedding and whether any pre-ingestion inspection occurs.

Phase 3: Injection Payload Testing

We craft a range of test documents containing payloads of varying complexity, from obvious instruction strings to content that mimics the linguistic register of the target system prompt. We test different rendering methods (visible text, whitespace text, metadata fields, structured data fields within PDFs and DOCX files) to characterize which delivery methods are most reliably retrieved and most likely to influence model output.

We intentionally avoid anything that would constitute a working operational exploit, the goal is to confirm exploitability and characterize the attack surface, not to produce a toolkit.

Phase 4: Output Analysis and Impact Characterization

LLM outputs are non-deterministic, so we probe each injection vector across multiple queries and temperature settings. We document response patterns, identify which injections produced meaningful behavioral change, and assess realistic downstream impact, distinguishing between scenarios that could lead to data leakage, user redirection, or privilege escalation versus scenarios that produce noise but no meaningful security consequence.

Phase 5: Remediation Validation

Once mitigations are implemented, we retest the injection vectors to confirm that the changes were effective. We also test for regression, implementations that fix the obvious injection path while leaving adjacent vectors unaddressed.


Mitigations That Actually Work

No single control eliminates indirect prompt injection risk, but a layered approach significantly reduces exploitability.

Structural prompt separation. Clearly delineating in the system prompt what constitutes “retrieved context” versus “user instructions”, and explicitly instructing the model to treat retrieved content as data, not commands, reduces but doesn’t eliminate model compliance with injected instructions. This is the most immediately implementable control and should be the first thing any team implements.

Pre-ingestion content inspection. Before documents are chunked and embedded, run them through a content inspection layer that flags anomalies: text rendered in background-matching colors, whitespace-heavy blocks, unusually structured metadata, or heuristically detected instruction patterns. This won’t catch every payload but eliminates the low-effort attempts that constitute the majority of real-world attempts.

Ingestion access controls. Limit who can upload to the knowledge base. In many deployments, this is either public or accessible to all authenticated users, both are overly permissive for a system with this attack surface. Document-level provenance tracking (logging which user uploaded what, when) also helps with incident response.

Output filtering. A secondary filtering layer on LLM responses that checks for anomalous patterns; unexpected external links, data fields not present in the user query, unusual response structures, can catch a subset of successful injections before they reach the user.

Semantic similarity monitoring. Logging which retrieved chunks contributed to each response, and monitoring for chunks that appear disproportionately often or that consistently appear in anomalous response patterns, enables detection of active poisoning campaigns.

Regular RAG security testing. Given that the knowledge base is dynamic, new documents are added continuously, security posture can degrade quickly. Periodic penetration testing of the ingestion pipeline and retrieval behavior, rather than a one-time assessment, reflects the actual threat model.


Frequently Asked Questions

Get Your RAG Pipeline Tested

Indirect prompt injection through document poisoning is a real, testable vulnerability, and one that most organizations have never assessed. It doesn’t require a sophisticated attacker, it doesn’t leave obvious traces, and it persists in your system for as long as the poisoned document remains in the knowledge base.

Our AI penetration testing service covers the full RAG attack surface: ingestion pipeline controls, retrieval layer security, prompt construction review, and injection payload testing across all reachable document ingestion paths. Every engagement is led by certified practitioners with hands-on experience across web, API, and AI application security.

If you’re deploying or planning to deploy a RAG-backed AI system, or if one is already in production, this is worth a conversation.

Book a 30-minute scoping call and we’ll tell you exactly what a RAG security assessment would cover for your stack.

Leave a Comment

Scroll to Top