Browser-Safe AI Systems, Part 26: Evidence Collection: What Must Be Logged and Verified
Series: Browser-Safe AI Systems, Part 26 of 32.
This post continues the Browser-Safe AI Systems series by focusing on evidence collection: what must be logged and verified. The goal is to keep the discussion useful for analysts who investigate alerts, red teams who validate controls, developers who build the pipeline, and technical stakeholders who own risk decisions.
| Series navigation: Previous: Part 25 | Series index | Next: Part 27 |
26. Evidence Collection: What Must Be Logged and Verified
Browser-safe AI systems are only as useful as the evidence they produce.
A block without evidence is hard to tune.
An allow without evidence is hard to trust.
A model verdict without artifacts is hard to investigate.
A summary without source data is hard to verify.
A SIEM event without context is hard to act on.
The purpose of evidence collection is to make security decisions reviewable.
A useful system should answer:
What happened, why did the system decide that, what action was taken, and can the decision be reproduced?
26.1 Minimum Evidence Package
A minimum evidence package should include:
- timestamp
- user identity
- device identity
- network context
- browser context
- URL
- sanitized URL
- domain
- path
- referrer where appropriate
- page title
- rendered screenshot
- DOM snapshot or extracted structural summary
- OCR output where used
- QR target where present
- redirect chain
- iframe or frame tree
- form fields detected
- file action detected
- upload or download metadata
- model verdict where available
- model confidence where available
- reason codes
- policy name
- enforcement action
- user-facing message
- SIEM event reference
- exception state
- redaction status
Not every artifact needs to be available to every analyst.
But the system should know what was collected, what was redacted, and what informed the decision.
26.2 Evidence for Analysts
Analysts need evidence that supports triage.
An analyst view should show:
- what the user saw
- what the page asked the user to do
- whether credentials were requested
- whether MFA was requested
- whether a QR code was present
- whether file movement occurred
- whether brand impersonation was suspected
- whether DOM and screenshot evidence conflicted
- whether content changed after initial load
- what policy applied
- what action was taken
- whether an exception influenced the result
- whether evidence was redacted
26.3 Evidence for Red Teams
Red teams need evidence that supports repeatability.
A red-team evidence record should include:
- test case ID
- expected secure behavior
- observed behavior
- screenshots
- DOM artifacts
- server logs
- browser logs
- policy result
- SIEM alert
- model verdict if available
- analyst-visible evidence
- seeded data tracking
- reproducibility notes
26.4 Evidence for Developers
Developers need evidence that supports debugging and secure design.
Useful developer evidence includes:
- extractor output
- redaction output
- model request metadata
- model response schema status
- validation errors
- policy decision trace
- timeout state
- fallback state
- exception logic
- evidence object identifiers
- downstream export status
- log sanitization status
This should be access-controlled.
Developer visibility should not become uncontrolled access to sensitive browser content.
26.5 Raw Evidence Versus Derived Evidence
Raw evidence includes full screenshots, DOM snapshots, logs, prompts, model responses, HAR-like artifacts, and support bundles.
Derived evidence includes extracted indicators, reason codes, risk labels, redacted summaries, hashes, feature flags, and policy results.
Raw evidence is more complete but more sensitive.
Derived evidence is safer but may omit context.
A good system stores and exposes them differently.
Raw evidence should have stricter access controls and shorter retention.
Derived evidence can often be retained longer and shared more broadly.
26.6 Replayability
Replayability means the team can reconstruct the decision.
A replayable event should include enough context to answer:
- what page state was inspected
- what artifacts were available
- what model input was used
- what output was returned
- what policy was applied
- what action resulted
- whether the event would be handled the same way today
Replayability is critical for incident response, false positive review, false negative review, red-team retesting, policy tuning, vendor escalation, and audit review.
26.7 Evidence Gaps
Common evidence gaps include:
- screenshot missing
- DOM missing
- QR target not decoded
- redirect chain not captured
- iframe tree missing
- model verdict unavailable
- reason code absent
- policy name missing
- user context missing
- exception influence not logged
- redaction status unknown
- analyst summary not tied to raw artifacts
- SIEM event missing key fields
- user-visible page state not preserved
Evidence gaps should be tracked as findings.
26.8 Defensive Principle
Evidence is the bridge between prevention and trust.
The safest rule is:
Log the evidence, protect the evidence, make the decision explainable, and ensure the event can be replayed or reviewed when the verdict matters.