The Calculator Discipline: A Taxonomy and Pre-Send Filter for AI-Assisted Vulnerability Disclosure Hallucinations

Thomas, Stuart

doi:10.5281/zenodo.20393083

Methodology · AI-assisted disclosure

The Calculator Discipline: catching AI-assisted disclosure hallucinations before they reach a maintainer’s inbox

A four-class taxonomy of failure modes, a pre-send filter that catches the mechanical two, and two real withdrawals from the practice’s own work — including the one Theo de Raadt asked the right question about.

26 May 2026 · Methodology paper · CC BY 4.0 · DOI: 10.5281/zenodo.20393083
Full paper: stuart-thomas.com/research/calculator-discipline/

This piece is the practice’s summary of The Calculator Discipline: A Taxonomy and Pre-Send Filter for AI-Assisted Vulnerability Disclosure Hallucinations, published 26 May 2026. The full paper, including the per-claim verification table for the OpenSMTPD case study and the source-level discussion of the four verifiers, sits on the lead researcher’s personal site. The summary below is intended for triage teams, bug-bounty programme managers, and other independent researchers who want the headline argument without the full apparatus.

The field-level problem

AI assistance has made source-code review cheap, and like every productivity multiplier in the history of engineering it has therefore made being wrong cheap. The most visible symptom is the open-source community’s response: Daniel Stenberg of the curl project coined death by a thousand slops in July 2025, and by January 2026 had ended curl’s HackerOne bug-bounty programme on the basis that AI-generated noise had made the triage workload unsustainable. Press coverage from BleepingComputer, Help Net Security, The New Stack and The Register has followed the same arc. Stenberg’s more recent posts note that the slop rate has fallen and the quality of AI-assisted reports has risen; the problem is not unsolvable, but the discipline-shaped hole in the methodology is real and worth naming.

The public conversation has been almost entirely from the receiving end. Maintainers complain about the volume; the researchers who produced it stay silent. That asymmetry is unhelpful. A failure mode that nobody owns publicly cannot be improved at the source. The practice’s methodology paper addresses that gap directly: the lead researcher is one of the people who shipped the slop, and the discipline described in the paper exists because the failure happened to him.

Two real withdrawals and one near-miss

The paper presents three cases from the practice’s 2026 OpenBSD work:

bgpd community_ext_add (withdrawn). A report sent to bugs@openbsd.org on 2026-05-24 claiming a fixed-buffer overflow in a function that does not have a fixed buffer; it grows its array via reallocarray(). The report also cited “22 unique SIGSEGV crashes” from an AFL run that did not exist (the AFL output on the researcher’s system was zero crashes, against an unrelated target), and referenced a bgp_poc.py file that had never been written. Withdrawn the following day.
OpenSMTPD six-claim chain (corrected). A disclosure sent to security@openbsd.org on 2026-05-23 listing six findings and framing them as a chain that bypassed ASLR and RETGUARD to achieve remote code execution. Theo de Raadt replied with a single pointed question: whether the researcher was actually claiming to have exploited the chain. The honest answer was no. Per-claim verification against current OpenBSD 7.8 source revealed two findings entirely fabricated, four real but with severity inflated by one to three steps, and zero RCE. The corrected reply went out 2026-05-26.
rpki-client queue_add_from_cert (caught pre-send). An AI-assisted triage candidate flagged as a one-byte heap out-of-bounds read. A third read traced the populator of cert->mft back through cert.c and showed the length-equal collision the candidate required was structurally impossible — the parse-time validator rejects any cert that would have produced it. Also: the prior-art agent had hallucinated two commit hashes attributed to a real OpenBSD developer who had no involvement. Both failures caught before the report went out.

A note about the third case. The hallucinated commit-hash attribution implicated Frank Denis, a real, well-regarded OpenBSD contributor. Denis had no involvement in any of this. Neither the fabricated commits nor the agent’s attribution of them reflect anything Denis did, wrote, or was aware of. He is named in the paper only to keep him explicitly out of the consequences of the misattribution.

The taxonomy

Across the three cases, four distinct failure modes appear. Each has a corresponding catch mechanism, and the mechanisms divide cleanly into mechanical (toolable, deterministic) and judgement-shaped (requires human eye on the threat model). The full table from the paper:

Class	Description	Catch mechanism
C1 Bug-shape fabrication	The bug pattern claimed does not exist in the code as it actually stands. Fixed buffer that is actually dynamically grown; OOB on a path the code guards; UAF on a pointer the code clears.	Mechanical. Grep for `realloc/reallocarray/recallocarray` in the cited function before drafting any “fixed-buffer overflow” claim.
C2 Evidence fabrication	The supporting evidence (AFL run, fuzz corpus, ASAN trace, PoC script, commit hash, prior-art reference) does not exist or does not match what is cited.	Mechanical. Resolve every cited artefact before citing it. Require `afl_banner` + exec count + timestamp for any “N crashes” claim. Require PoC paths to exist on disk.
C3 Severity inflation	The bug-shape and the evidence are real, but the chain-to-RCE, pre-auth-network, or otherwise headline-grade framing is fabricated.	Judgement-shaped. Per-claim CVSS with explicit threat-model fields. The phrase “chained to RCE” is reserved for chains demonstrated end-to-end, not merely posited.
C4 Trivial-as-critical	A real defect of negligible operational impact framed as critical, usually by appealing to a security boundary the trust model does not contain.	Judgement-shaped. Audience-aware severity calibration. Same-uid IPC is not a privsep boundary; trivial info-leaks within a single trust domain are hardening notes, not vulnerabilities.

The classes are not exhaustive. Others (composability errors, misread atomicity guarantees, fabricated CVSS metric vectors) certainly exist. The four are offered as a starting vocabulary, not a closed system. Triage teams already track signal versus noise implicitly; making the noise classes explicit lets researchers self-check before sending.

The tool

The practice maintains hallucination_check.py (≈35 KB, BSD-2-Clause) as a pre-send filter against C1 and C2. Four verifiers were added on 2026-05-25 in direct response to the failures described above:

bug_shape — flags “fixed-buffer” claims against functions that contain realloc/reallocarray/recallocarray.
caller_bounds_gate — requires drafts citing size_t arithmetic and memcpy/memmove to contain a labelled caller-bounds analysis section. Structural, not semantic: it does not check whether the analysis is right, only that the author wrote one.
afl_evidence — requires “N crashes” phrasing to be accompanied by afl_banner, exec count, and timestamp in the same paragraph; flags banner/target mismatches.
poc_existence — resolves any cited PoC script path against the filesystem before allowing the draft to pass.

The four were validated against three synthesised drafts and one known-good template. The withdrawn bgpd report, when reconstructed and run through the tool, produces WRONG verdicts on three of the four verifiers — which is the test that mattered.

The paper also reports a self-check: the tool was run against the paper itself and returned a global DIRTY verdict with four WRONG findings. On inspection, every finding is a true positive by the tool’s rules but a false positive in context, because the paper is describing failed claims rather than making them. That distinction is the tool’s honest limitation, named explicitly: future work will add a quoted-context flag that suppresses the verdict while preserving the audit trail.

What the tool does not catch

The rpki-client case in §4 of the paper is the explicit counter-example: a draft would have contained a fully-resolved file:line reference, no fixed-buffer claim, no AFL output, no PoC path, and a passing caller-bounds analysis that was wrong. The verifiers check structure, not correctness. The cross-function invariant trace that caught it was the work of a third human read with the explicit prompt “trace cert->mft back to its populator before claiming the OOB is reachable.”

The wider gate in the practice’s workflow has five steps:

AI-assisted source review produces a draft candidate.
The candidate is independently checked against the actual current upstream source by a human read — not the original AI’s summary.
hallucination_check.py runs against the draft. WRONG verdicts block send.
A separate Council-of-LLMs review (multiple LLMs, fresh context, brief to disagree) reviews tone and per-claim severity.
Only after all four pass does the disclosure leave the outbox.

Step 3 is the only step the tool supplies. The other four are discipline. The slop rate will fall when the pre-send discipline becomes routine, not before.

Why the practice is publishing this

Three reasons.

First, the conversation needs senders, not just receivers. Maintainers complaining publicly is part of the picture; researchers admitting publicly is the part that has been missing. A taxonomy from someone who has shipped each failure class personally is, the practice believes, more useful than a taxonomy assembled from the receiving end.

Second, the tool is useful and cheap to adopt. The four verifiers are BSD-2-Clause and ship with the wider penfold/ tooling. Triage teams, bug-bounty programme managers, and independent researchers are welcome to adopt them as-is, extend them, or take the four-class taxonomy and build their own.

Third, the calculator analogy is the right one. AI is not going away. The corrective is not to ban it but to apply the discipline that every other productivity multiplier in engineering has eventually demanded. Check the units. Check the order of magnitude. Check the answer matches the real world. That is the case the paper is making, and the case TriageForge is making by publishing it.

Implementation

The four verifiers described in §6 of the paper ship as open source in penfold — the practice’s broader vulnerability-research toolkit (github.com/jetnoir/penfold, BSD-2-Clause). The relevant module is penfold.disclose.hallucination_check.

Read the full paper

stuart-thomas.com/research/calculator-discipline/ — ~6,000 words, including the per-claim OpenSMTPD verification table, the full rpki-client cross-function invariant trace, the dogfood report against the paper itself, and the wider references list.

Adopt the tooling

The four verifiers and the wider workflow are released under BSD-2-Clause via the penfold/ directory of the project’s public artefacts. Pull requests welcome, particularly on the two judgement-shaped classes (C3 severity inflation, C4 trivial-as-critical) where the current tool offers only structural hooks.

Cite

S. Thomas, The Calculator Discipline: A Taxonomy and Pre-Send Filter for AI-Assisted Vulnerability Disclosure Hallucinations, TriageForge / Independent Security Research, May 2026. DOI: 10.5281/zenodo.20393083. CC BY 4.0.

Legal note

This article is published under the Defamation Act 2013 facts-and-opinion convention. Statements of fact — commit hashes, dates, vendor responses, source-code observations — are accurate to the best of the author’s knowledge and are evidenced by the cited primary artefacts in the full paper. Named individuals are referenced only in their public capacity as project maintainers and only in connection with their own public conduct. Frank Denis is named explicitly as not involved in the hallucinated commit batch attributed to him by the prior-art agent in §4. The naming is the corrective, not the harm.