Series: Browser-Safe AI Systems, Part 12 of 32.

This post continues the Browser-Safe AI Systems series by focusing on dom versus rendered page mismatch. The goal is to keep the discussion useful for analysts who investigate alerts, red teams who validate controls, developers who build the pipeline, and technical stakeholders who own risk decisions.

Series navigation: Previous: Part 11 Series index Next: Part 13

12. DOM Versus Rendered Page Mismatch

DOM versus rendered page mismatch occurs when the page structure and the user-visible page do not tell the same story.

Browser-safe AI systems may inspect DOM structure, rendered screenshots, OCR text, accessibility-tree content, page metadata, form fields, JavaScript state, redirect behavior, and user interaction signals.

Each representation can be useful. Each representation can also be manipulated.

The safer assumption is simple:

No single representation of a web page should be treated as authoritative.

12.1 Why the Mismatch Matters

A browser page is a live execution environment. The DOM may contain one version. The screenshot may show another. The accessibility tree may expose different labels. JavaScript may change the page after inspection. An iframe may load a different trust context. Canvas rendering may display text that does not exist in the DOM. CSS overlays may hide one workflow and show another.

If the system does not detect conflict, it may produce the wrong verdict, and that verdict may influence allow, block, warn, isolate, credential-submission prevention, file handling, DLP, SOC severity, summaries, exceptions, and tuning.

12.2 Common Patterns

Common examples include benign DOM with malicious visual overlay, malicious DOM hidden behind benign visual content, login forms rendered as images, canvas-rendered text, hidden iframes, fake buttons layered over legitimate elements, form labels that differ from visible labels, accessibility labels that disagree with visible text, QR codes, delayed content, scanner-safe content, region-specific rendering, and metadata that contradicts visible intent.

12.3 Analyst Impact

Analysts need to know what the user saw, what the DOM contained, what OCR extracted, what the accessibility tree exposed, whether iframes or forms were present, whether visible form matched DOM form, whether the page changed after delay or interaction, whether QR was present, whether conflicting evidence existed, what policy applied, and whether the page state can be replayed.

12.4 Red-Team Impact

Controlled tests should include benign DOM with malicious screenshot content, malicious DOM with benign visual content, image-rendered fake login forms, canvas-rendered credential prompts, hidden fields, visible fields absent from DOM, iframe-based workflows, overlay button deception, accessibility label mismatch, delayed forms, QR workflow transitions, and conditional rendering.

Expected secure behavior is not always immediate blocking. Expected secure behavior is that conflict increases risk, preserves evidence, and avoids silent allow decisions.

12.5 Developer Impact

Developers should compare DOM text, rendered screenshot, OCR output, accessibility-tree content, form structure, link targets, iframe sources, redirect behavior, timing changes, and user interaction events.

Important conflicts should be elevated: screenshot shows login but DOM has no form, DOM contains credential fields but screenshot does not show them, OCR detects brand names absent from DOM, accessibility labels describe different actions than visible buttons, iframe source differs from trust context, or page becomes credential-harvesting after delay.

12.6 Stakeholder Impact

Stakeholders should ask whether the system inspects both DOM and rendered content, compares screenshots against extracted text, inspects iframe relationships, detects delayed page changes, handles canvas-rendered text, detects accessibility-tree mismatch, fails safely on conflict, and exposes conflicting evidence to analysts.

12.7 Defensive Principle

When page representations disagree, the security system should not assume the attacker made an honest mistake.

The safest rule is:

Do not trust the DOM alone. Do not trust the screenshot alone. Compare representations, treat conflict as risk, and preserve evidence for review.