featured-image-does-pentest-catch-prompt-injection

Does a Standard Pentest Catch Prompt Injection? The Honest Answer

A mid-size SaaS company ran its annual penetration test in Q1. The report came back clean: no critical findings, a handful of medium-severity misconfigurations, all patched within two weeks. Three months later, a support chatbot built on top of the same infrastructure was coaxed, through nothing more than a normal-looking conversation, into revealing pieces of its internal instructions and referencing pricing logic it was never supposed to disclose.

The pentest hadn't missed a vulnerability. It had never been scoped to look for one.

That's the pattern we see repeatedly when companies ask does pentest catch prompt injection: a clean traditional report gets treated as proof that the AI layer is safe, when the traditional methodology never touched it. This post breaks down exactly what a standard pentest covers, where prompt injection and related risks actually live under OWASP LLM01 and LLM07, and how to close the gap without turning your engagement into a fishing expedition.

Contents Overview

What a Standard Pentest Actually Tests

A traditional penetration test, whether it's scoped as network, web application, or API testing, is built around a well-established methodology. Testers working from frameworks like the OWASP Web Security Testing Guide (WSTG) or PTES are looking for deterministic flaws: things that behave the same way every time you trigger them.

That includes:

Injection flaws in the classic sense (SQL injection, command injection, XXE)
Authentication and session management weaknesses
Access control issues, including IDOR and privilege escalation
Business logic flaws in workflows and transactions
Configuration and infrastructure gaps (exposed ports, weak TLS, outdated software)
API-specific issues mapped to the OWASP API Security Top 10

Every one of these categories assumes the system under test executes fixed logic. Send the same malicious payload twice, get the same result twice. That assumption is what makes automated scanning and reproducible proof-of-concept exploits possible, and it's exactly the assumption that breaks down once a large language model is part of the request path.

An LLM doesn't execute code in response to input the way a SQL parser does. It interprets natural language and decides, probabilistically, how to respond based on training, fine-tuning, and whatever instructions and context it's been given. A traditional pentest methodology has no test case for "convince the model to reinterpret its own instructions," because that isn't a bug class it was designed to look for.

Where Prompt Injection Actually Lives: LLM01 and LLM07

The OWASP Top 10 for LLM Applications (2025) exists specifically because the vulnerability classes in AI systems don't map cleanly onto the traditional Top 10. Two categories are directly relevant here.

LLM01: Prompt Injection. This covers any technique where crafted input, whether typed directly by a user or planted in content the model later ingests (a webpage, a document, an email it summarizes), causes the model to deviate from its intended behavior. Direct prompt injection comes from the user talking to the model. Indirect prompt injection comes from data the model processes on the user's behalf, which is arguably the more dangerous variant because the attacker never has to interact with your system directly.

LLM07: System Prompt Leakage. This covers scenarios where the model's underlying instructions, the system prompt that defines its role, guardrails, and often business logic, get exposed to an end user. System prompts frequently contain more than "be helpful and polite." They can include internal tool names, escalation logic, pricing rules, or references to backend systems. When that leaks, it doesn't just embarrass you; it gives an attacker a map of what to try next.

These two categories are related but distinct. Prompt injection is the technique. System prompt leakage is one common outcome. A model can also be injected into performing an unwanted action (tied to LLM06, Excessive Agency, when the model has tool access) without ever leaking its prompt at all. Scoping a test for "prompt injection" without accounting for what the model has access to, and what it's allowed to say about itself, leaves real gaps.

A Sanitized Example: Where the Gap Shows Up

To be clear about what we can and can't share here: what follows is a pattern-based, illustrative scenario built from recurring themes across engagements, not a specific client, not a working exploit, and not a set of steps you could run today. We don't publish weaponizable detail, and we wouldn't even if it made for a punchier blog post.

In one recurring pattern, a customer-facing support assistant was connected to an internal knowledge base and a ticketing tool so it could look up account information and file requests. The assistant had clear guardrails against discussing pricing outside published tiers. A tester, working through a sequence of ordinary-looking support questions and follow-ups, gradually shifted the conversational framing until the model began referencing internal categorization labels used in its own instructions, labels that hinted at how support tickets get prioritized internally. No code was exploited. No authentication was bypassed. The model simply followed the shape of the conversation further than its designers expected.

That's the essence of LLM01 and LLM07 risk: the "vulnerability" is behavioral, not architectural, and it lives in exactly the layer a traditional pentest doesn't examine.

Traditional Pentest vs. AI/LLM Pentest: Scope Comparison

Dimension	Traditional Pentest	AI/LLM-Focused Pentest
Primary attack surface	Network, web app, API endpoints	Prompts, model behavior, tool integrations, RAG pipelines
Underlying assumption	Deterministic code execution	Probabilistic, instruction-following behavior
Core methodology	OWASP WSTG, PTES, OWASP API Top 10	OWASP Top 10 for LLM Applications (2025)
Typical finding	SQLi, IDOR, misconfigurations	Prompt injection (LLM01), system prompt leakage (LLM07), excessive agency (LLM06)
Tooling	Burp Suite, Nmap, automated scanners	Structured adversarial prompting, custom test harnesses, manual conversational testing
Reproducibility	High; same payload, same result	Variable; requires multiple attempts and structured test cases per category
Skillset needed	Standard offensive security	Offensive security plus applied understanding of LLM behavior and AI system design

The takeaway from this table isn't that one type of testing is more rigorous than the other. It's that they're answering different questions. A traditional pentest answers "can someone break into our systems." An AI-focused pentest answers "can someone manipulate the model into doing something it shouldn't," which is a materially different question with a different methodology and a different definition of success.

Why Traditional Methodologies Miss LLM01 and LLM07

Three structural reasons explain the gap, and none of them reflect poorly on the firms doing traditional pentesting. They simply weren't scoped for this.

1. Non-determinism breaks the standard proof-of-concept model. A traditional finding needs a reliable, repeatable trigger for a report to hold up. Prompt injection often succeeds on the third or fifth attempt with variations in phrasing, not the first. Testing for it requires a methodology built around structured, iterative adversarial prompting rather than a single scripted payload.

2. The vulnerability isn't in the code path, it's in the instruction-following behavior. Static and dynamic application security testing tools scan code, dependencies, and request/response patterns. They have no mechanism for evaluating whether a model's response violates an intended behavioral boundary, because that boundary is defined in natural language, not in a schema or a validation function.

3. Scope documents rarely name the model as an asset. If a pentest SOW lists "web application" and "API endpoints" as in-scope assets, the LLM integration technically falls under "API endpoints" from an infrastructure standpoint, but the testing methodology applied to it is still the API methodology. Nobody is instructed to test whether the model can be talked out of its guardrails, because that line item doesn't exist in a standard SOW template.

This is also why compliance frameworks are catching up. NIST's AI Risk Management Framework explicitly calls out the need to evaluate AI systems for behaviors like manipulation and unintended disclosure, distinct from conventional software vulnerability assessment. SOC 2 auditors and enterprise security questionnaires are increasingly asking directly whether AI-specific testing has been performed, and "we ran our standard pentest" is no longer an answer that satisfies that question.

We've seen this play out during vendor security reviews more than once. A prospective enterprise customer's questionnaire asks, in plain language, whether the vendor has tested its AI features against prompt injection and data leakage. The honest answer, based on a traditional pentest report alone, is "we don't know," because the report simply doesn't address it either way. That's a worse position than a documented finding with a remediation plan. An unanswered question reads as an unmanaged risk, and procurement teams at larger enterprises are trained to treat it that way.

The same gap shows up internally before it ever reaches a customer. Engineering teams often assume that because the surrounding application passed its pentest, the AI feature layered on top inherited that assurance. It didn't. The two are tested under entirely different methodologies, and conflating them is how a genuinely dangerous LLM01 or LLM07 finding sits undiscovered for months inside a product that otherwise has a clean security track record.

What AI-Specific Pentest Scope Actually Covers

A properly scoped AI/LLM penetration test maps its test cases directly to OWASP LLM Top 10 categories rather than treating the model as a generic API. For prompt injection and system prompt leakage specifically, that scope typically includes:

Direct prompt injection testing (LLM01): Structured adversarial conversations designed to shift the model away from its intended role or guardrails, run across multiple phrasing strategies and conversational framings.
Indirect prompt injection testing (LLM01): Evaluating whether the model can be manipulated through content it ingests from external sources, such as documents, retrieved data, or third-party integrations, rather than direct user input.
System prompt extraction attempts (LLM07): Testing whether the model can be induced to reveal its instructions, internal labels, tool definitions, or business logic embedded in its configuration.
Excessive agency checks (LLM06): Where the model has tool access or the ability to take action, testing whether it can be manipulated into invoking tools or performing actions outside its intended authorization boundary.
Guardrail resilience testing: Assessing whether stated content and behavioral restrictions hold up under sustained, varied adversarial pressure rather than a single obvious jailbreak attempt.
Output handling review: Checking whether the model's outputs are properly sanitized before being rendered, executed, or passed to downstream systems, since a manipulated output can become an injection vector elsewhere in the stack.

This scope sits alongside, not instead of, traditional infrastructure and application testing. Most organizations need both: a conventional pentest for the systems around the AI feature, and an AI-specific engagement for the model's behavior itself. Our AI penetration testing service is scoped specifically to cover this second half, with test cases mapped directly to the current OWASP LLM Top 10.

How Engagements Get Scoped

The right scope depends on how deeply the LLM is integrated into your product and what it has access to. A simple FAQ chatbot with no tool access presents a narrower attack surface than an agent with database read access, third-party API calls, and multi-step autonomous workflows.

Tier	Typical Scope	Best Fit
Starter ($9,500+)	Single LLM application, core LLM01/LLM07 testing, guardrail assessment	Standalone chatbots, single-feature AI integrations
Professional ($15,000–$35,000)	Multiple integrations, tool-use and agency testing, RAG pipeline review	Products with AI features across several workflows
Enterprise ($35,000–$75,000)	Full agentic system review, multi-model environments, compliance-mapped reporting	Complex AI platforms, regulated industries, audit-driven engagements

Scope is set during an initial call, not guessed at from a generic tier description. If your product only has one AI feature but that feature has broad tool access, it may need Professional-tier depth even though it sounds like a Starter-tier footprint on paper.

A few signals tend to push an engagement into a higher tier regardless of how simple the product looks from the outside: the model has write access to a database or ticketing system rather than read-only lookups, the model can trigger external API calls or send communications on a user's behalf, or the application chains multiple model calls together in an agentic workflow where the output of one step becomes the input to the next. Each of those increases the number of places a successful injection can turn into real-world impact, which is what the excessive agency category (LLM06) is specifically concerned with.

Frequently asked questions

Does a standard penetration test include prompt injection testing?

No, not by default. A standard network or web application pentest is scoped around deterministic vulnerability classes and doesn't include structured adversarial testing against an LLM's instruction-following behavior unless it's explicitly added to the scope of work.

What OWASP LLM Top 10 categories does prompt injection fall under?

Prompt injection itself is LLM01 in the OWASP Top 10 for LLM Applications (2025). It frequently overlaps with LLM07 (System Prompt Leakage) when successful injection results in the model disclosing its internal instructions, and with LLM06 (Excessive Agency) when the model has tool access that can be misused.

Can automated scanners detect prompt injection vulnerabilities?

Automated tools can flag some known jailbreak patterns and obvious guardrail bypasses, but they can't reliably assess whether a model's behavioral boundaries hold up under varied, human-crafted adversarial conversation. Meaningful prompt injection testing still requires structured manual testing alongside any automated tooling.

How is AI/LLM penetration testing different from a web app pentest?

A web app pentest tests code paths, authentication, and infrastructure for deterministic flaws. An AI/LLM pentest tests the model's behavior for whether it can be manipulated through natural language input, which requires a different methodology, different success criteria, and testers familiar with how LLMs actually behave under adversarial pressure.

What does system prompt leakage (LLM07) look like in practice?

It typically shows up as a model referencing internal tool names, business rules, categorization logic, or configuration details that were never meant to be user-facing, usually surfaced gradually through extended conversation rather than through a single direct question.

How long does an AI-specific penetration test take to scope and run?

Scoping typically takes a single call once we understand what the model has access to and how it's integrated into your product. Engagement length varies by tier, from roughly one to two weeks for a Starter-scope single application to several weeks for an Enterprise-scope agentic system with multiple integrations.

Do we need both a traditional pentest and an AI-specific pentest?

In most cases, yes. The traditional pentest covers the infrastructure, APIs, and application logic surrounding your AI feature. The AI-specific pentest covers the model's behavior itself. Neither one substitutes for the other, and skipping either leaves a documented gap that shows up in security questionnaires and compliance audits.

Where This Leaves You

If your last pentest report doesn't mention prompt injection, system prompt leakage, or excessive agency, that's not a sign your AI features are safe. It's a sign they weren't in scope. Given how directly LLM01 and LLM07 map to real disclosure and manipulation risk, and how often "we already got pentested" gets treated as a closed question by boards, auditors, and enterprise customers, it's worth confirming explicitly rather than assuming.

We map every AI/LLM engagement to the current OWASP Top 10 for LLM Applications and align findings with the NIST AI Risk Management Framework where relevant, so results translate directly into language your compliance team and your customers already understand. If you want a straight answer on whether your specific AI integration needs this kind of testing, book a 15-minute scoping call and we'll tell you plainly, no obligation, no upsell if the answer is "not yet."

You can also review the full scope of our AI penetration testing services and how engagements are structured across each tier.

For the underlying vulnerability classes referenced throughout this post, see the OWASP Top 10 for LLM Applications project directly.

Does a Standard Pentest Catch Prompt Injection? The Honest Answer

What a Standard Pentest Actually Tests

Where Prompt Injection Actually Lives: LLM01 and LLM07

A Sanitized Example: Where the Gap Shows Up

Traditional Pentest vs. AI/LLM Pentest: Scope Comparison

Why Traditional Methodologies Miss LLM01 and LLM07

What AI-Specific Pentest Scope Actually Covers

How Engagements Get Scoped

Frequently asked questions

Where This Leaves You

Leave a Comment Cancel Reply

Company

Penetration Testing

Compliance

Resources

Privacy Policy | Terms of Use

Does a Standard Pentest Catch Prompt Injection? The Honest Answer

What a Standard Pentest Actually Tests

Where Prompt Injection Actually Lives: LLM01 and LLM07

A Sanitized Example: Where the Gap Shows Up

Traditional Pentest vs. AI/LLM Pentest: Scope Comparison

Why Traditional Methodologies Miss LLM01 and LLM07

What AI-Specific Pentest Scope Actually Covers

How Engagements Get Scoped

Frequently asked questions

Where This Leaves You

Related Posts

Leave a Comment Cancel Reply

Company

Penetration Testing

Compliance

Resources