OSMAP V3 Release-Gate Foundation: Building Evidence Before Trust
OSMAP V3 was not only a feature release.
It was a release-gate exercise.
The work was about forcing the project to prove that security controls were present, live, repeatable, and backed by evidence before treating the implementation as releasable.
That sounds straightforward until the evidence has to be collected from a real host, through real authentication, with real service boundaries, while still avoiding secret leakage into the repository.
The result was a V3 gate that now requires authenticated WSTG evidence, human password-plus-TOTP proof, supply-chain checks, live host readiness, TLS validation, helper-boundary validation, resource-control evidence, MIME and HTML rendering evidence, and a sanitized release archive.
The Release Gate Had To Fail Closed
The most important question was simple:
Does release mode fail when authenticated WSTG evidence is missing?
It was not enough for WSTG checks to exist.
Release mode had to reject:
- skipped authenticated tests
- missing authenticated coverage
- selected-test release runs
- missing human credential proof
- missing TOTP proof
- unauthenticated summaries
- warning or skip states in release-blocking tests
The gate now requires a full WSTG release run and validates that the command path included the human prompt-auth flow.
The successful authenticated WSTG run produced:
26passing WSTG checks0failures0warnings0skips- authenticated login proof
- TOTP proof
- protected route access proof
- session issuance proof
- logout proof
- session invalidation proof
That matters because unauthenticated security evidence is often a false comfort.
A login form can be secure while the authenticated application behind it is weak.
Human Proof Was Deliberately Awkward
Human credential and TOTP proof was intentionally not reduced to a permanent automation secret.
The operator had to run release WSTG with prompt-based authentication.
That created friction, but it also avoided baking a reusable password or TOTP seed into the release tooling.
The release gate then checked for evidence that the human path really happened.
This is a useful pattern:
- keep the secret out of automation
- capture the fact that human proof occurred
- redact the raw interaction
- commit only sanitized summaries
- make missing proof release-blocking
The raw proof directories were left untracked and ignored.
The committed artifact is the sanitized release summary, not the terminal session containing prompts.
OpenBSD Versus Linux Was Not A Footnote
Development happened from a Parrot OS workstation.
The target host was OpenBSD.
That split matters.
Shell utilities, OpenSSL behavior, service management, privilege boundaries, filesystem ownership, and socket behavior are not identical across the two environments.
One concrete example was helper-boundary proof.
A live validator generated an HMAC signature for a mailbox-helper request. The first implementation used an OpenSSL command form that worked poorly across the target environment:
openssl enc -A -hex
On the OpenBSD host, that produced OpenSSL usage output instead of the lower-case hexadecimal digest the helper expected.
The fix was to avoid that OpenSSL-specific conversion and use a simpler portable binary-to-hex path:
od -An -tx1 | tr -d ' \n'
The lesson is not that OpenSSL is bad.
The lesson is that live release evidence should run where the release will operate.
If the production host is OpenBSD, the proof must survive OpenBSD.
The Helper Boundary Became A Real Boundary
V3 hardened the mailbox-helper boundary.
Helper-backed mailbox operations now require a permission-restricted request grant key.
That exposed older live validators that started isolated helper-backed runtimes without providing grant keys.
The failure was good.
It meant the new boundary was actually enforced.
But it also meant the V2 carry-forward readiness evidence had to be updated so every live validator that starts an isolated helper either:
- creates paired web and helper grant keys, or
- provides a web-side grant key for negative backend-unavailable tests
This affected several validators:
- login-send
- inline image metadata
- all-mailbox search
- archive shortcut
- session surface
- send throttle
- move throttle
- request guardrails
- mailbox backend unavailable
- helper peer authentication
That was an important compatibility lesson.
Security hardening often breaks evidence tooling first.
The correct answer is not to weaken the control.
The correct answer is to upgrade the proof harness.
Supply-Chain Gates Needed Host-Local Tooling
The release gate required pinned supply-chain tooling.
The local workstation had the tools.
The OpenBSD host-side validation environment did not initially see them because the host validation wrapper intentionally used its own cargo home:
/home/foo/tmp-osmap-host-validation/cargo-home
That isolated cargo home needed the pinned versions of:
cargo-deny 0.18.3cargo-audit 0.22.1
This was another useful failure.
The validation wrapper was doing the right thing by isolating build state.
The missing part was remembering that isolated state also isolates installed cargo binaries.
Throttling Had To Be Observable
WSTG authentication throttling initially produced a warning.
The bounded probe did not observe throttling within the attempt budget.
That turned into two changes:
- use a local-domain throttle probe identity
- count backend-unavailable login attempts into throttle buckets
The important release property was not simply that throttling existed somewhere in code.
The release property was that the bounded WSTG probe could observe it without unsafe extra attempts.
That is the difference between a control and release evidence for a control.
Resource And Timeout Hardening Had To Be Live
V3 added release evidence for bounded resources and timeouts.
The gate required live proof for:
- mailbox worker budgets
- search worker budgets
- route-level expensive request timeouts
- helper-backed operation boundaries
- budget acquisition and release logging
- failure behavior under constrained budgets
The point was not just denial-of-service resistance.
The point was also operational explainability.
When the application refuses work because a budget is exhausted, the event should be bounded, observable, and safe to investigate.
MIME And HTML Correctness Had To Be Proven With Mail
MIME and HTML handling is easy to get wrong.
V3 evidence included live validation for rendering and attachment behavior, including:
- conservative HTML sanitization
- remote content suppression
- forced-download attachment behavior
- inline image metadata surfacing
- bounded MIME parsing
- selected source-attachment handling
- redaction of sensitive evidence
The release gate required live MIME and HTML proof because fixture-only parsing tests are not enough.
The browser route, mailbox helper, session handling, message fetching, and attachment path all have to agree.
The Release Evidence Was Sanitized
The final committed release evidence included sanitized artifacts under maint/live.
The raw human WSTG proof captures were not committed.
That distinction matters.
Raw proof can include terminal prompts, session output, timing, local paths, or other context that should not become permanent repository history.
The release archive and summaries are meant to answer:
- what was checked
- when it was checked
- what host was assessed
- what command generated the evidence
- which checks passed
- which evidence files were included
- whether any release errors remained
They are not meant to preserve secrets.
What Passed
The final V3 release gate passed:
- cargo build with locked dependencies
- cargo tests
- clippy
- formatting
- supply-chain checks
- dependency inventory
- documentation governance checks
- TLS policy checks
- authenticated WSTG release coverage
- TLS standard validation
- resource and timeout evidence
- helper-boundary evidence
- MIME and HTML proof evidence
- sanitized release evidence archive generation
The authenticated WSTG summary ended with:
pass: 26
fail: 0
warning: 0
skip: 0
release_errors: []
The host checkout was synced to the pushed main branch, and the OpenBSD services were verified healthy.
Lessons
The main lesson from V3 is that release gates should test the release process, not only the application.
A good gate should catch:
- missing evidence
- stale evidence
- partial test runs
- unauthenticated proof
- host-local tool drift
- cross-platform shell assumptions
- helper-boundary regressions
- warnings that would otherwise be rationalized away
It should also force a team to decide what evidence is safe to commit.
That last point is easy to overlook.
Security evidence can become a security problem if it captures too much.
The pattern that worked for V3 was:
- collect raw proof locally
- redact and summarize
- commit sanitized release evidence
- ignore raw human proof directories
- make missing proof release-blocking
- rerun on the real host
That is slower than a green checkmark.
It is also much more useful.
Closing Thought
V3 was less about adding one control and more about making controls accountable.
The useful question changed from:
Did we write the feature?
to:
Can the release gate prove the feature is present, live, bounded, authenticated, and safe to inspect?
That is the shape of release engineering I want for security-sensitive software.