Browser-Safe AI Systems, Part 09: Indirect Prompt Injection Through Web Pages

Series: Browser-Safe AI Systems, Part 09 of 32.

This post continues the Browser-Safe AI Systems series by focusing on indirect prompt injection through web pages. The goal is to keep the discussion useful for analysts who investigate alerts, red teams who validate controls, developers who build the pipeline, and technical stakeholders who own risk decisions.

Series navigation: Previous: Part 08

Series index

Next: Part 10

9. Indirect Prompt Injection Through Web Pages

Indirect prompt injection through web pages is one of the most important risks in browser-safe AI systems.

The reason is straightforward:

The AI system may be asked to inspect a page that is also trying to instruct it.

In a direct prompt injection, the attacker types instructions into an AI interface. In an indirect prompt injection, the attacker hides or embeds instructions inside external content that the AI system later processes. OWASP defines indirect prompt injection as a case where an LLM accepts input from external sources, such as websites or files, and that external content alters model behavior in unintended or unexpected ways. (genai.owasp.org)

For browser-safe AI, the external source is often the web page itself.

The risk is not only that the page might trick the user. The risk is that the page might trick the system protecting the user.

9.1 Why Web Pages Are Dangerous AI Inputs

A web page is not a clean document. It is a layered object with visible content, hidden content, executable behavior, metadata, embedded resources, layout tricks, accessibility attributes, iframes, scripts, event handlers, and state changes.

A human may see one thing. A model may receive another. A DOM parser may extract one version. A screenshot analyzer may see another. An accessibility-tree extractor may expose hidden labels. OCR may read text the DOM does not contain. Delayed scripts may render the real credential form after inspection.

9.2 What an Attacker Wants

The attacker usually wants a security outcome: classify a phishing page as benign, downgrade severity, suppress an alert, summarize the page as legitimate, ignore a credential form, miss a brand impersonation, allow file movement, recommend an exception, mislead SOC evidence, or fail open during uncertainty.

The injected instruction is not the final objective. The final objective is control over the decision path.

9.3 Controlled Testing Pattern

A controlled lab page might contain a fake login workflow for seeded credentials only. The visible page looks like a normal document portal. Hidden in the DOM is text that attempts to influence the classifier. Another variant may place instruction-like text in an image or SVG metadata. Another may show benign content during first load and reveal the credential form after a delay.

The red-team question is not whether credentials can be stolen. The question is whether hostile page content can influence the AI-assisted verdict.

9.4 Analyst Impact

For analysts, indirect prompt injection is an evidence-integrity problem. A useful event should show what the user saw, what artifacts were inspected, whether hidden content was present, whether DOM matched screenshot, whether the page changed after initial render, whether a credential form or QR code was present, what policy was applied, and whether the decision can be replayed.

9.5 Red-Team Impact

Useful test cases include visible prompt-style text, CSS-hidden text, off-screen text, hostile alt text, SVG metadata instructions, screenshot-only text, image-rendered login prompts, DOM and screenshot mismatch, delayed credential rendering, QR workflow escape, malformed DOM, and fake login forms with seeded credentials.

9.6 Developer Impact

The safe design rule is:

Never concatenate trusted instruction and untrusted web content into the same uncontrolled prompt context.

Developers should explicitly label untrusted page content, separate system instructions from page artifacts, minimize inputs, redact sensitive data, compare DOM and screenshot, require structured model output, validate output, apply policy outside the model, and log the evidence used for the decision.

9.7 Stakeholder Impact

Stakeholders should ask what content enters the model, what is redacted, what is retained, how output is constrained, what happens when the model is uncertain, whether hostile pages can influence allow decisions, and whether the organization can regression-test the control.

9.8 Defensive Principle

External web content must be data, never authority.

The safest rule is:

Separate trusted instruction from untrusted web content, constrain the model’s role, validate model output, apply policy outside the model, and preserve replayable evidence.