Metis: a five-stage binary-triage toolchain
An open-source pipeline maintained by the practice for triaging compiled binaries against a library of vulnerability templates. SAT-hardness path pruning, Random-Matrix-Theory anomaly screening, SSA dataflow matching, symbolic taint analysis, and on-device validation — in that order, with each stage’s output sized to be the next stage’s input.
What it is and what it is for
Metis is the binary-vulnerability triage pipeline that the practice uses to systematise the early stages of a finding. It takes a compiled binary — macOS, Linux, or Windows — and applies five staged techniques in order of increasing cost, throwing the cheap statistical methods at the problem first and reserving symbolic execution and on-device validation for the candidates that survive the earlier filters.
The intended user is a researcher who already has a hypothesis about where a defect might live, and who needs to know quickly whether that hypothesis is worth pursuing further on this particular binary. It is not a turnkey vulnerability scanner. It is research infrastructure that compresses the steps a careful researcher would otherwise do by hand.
The five stages
| Stage | Technique | What it produces |
|---|---|---|
| C1 | Backbone-fraction path prioritisation by SAT constraint hardness | Ranked candidate paths through the binary, hardest constraints first |
| C2 | Spectral anomaly detection on the call graph using Random Matrix Theory | Functions whose connectivity pattern deviates from the surrounding code |
| C3 | SSA-level memory taint matched against a library of dataflow templates | Candidate defect sites tagged by template (missing length check, signed/unsigned confusion, unchecked allocator return, etc.) |
| C6 | Symbolic taint analysis through angr, with proof-of-concept synthesis via Z3 | Concrete attacker-controlled inputs that reach the candidate sink |
| C7 | On-device dynamic validation under subprocess, LLDB, or DTrace |
A yes/no on whether the source-level finding crashes the shipped binary |
The staging matters. Each step narrows the working set so that the next step’s budget — especially the symbolic engine’s — is spent on candidates that are worth it. The reported figure in the repository is a 60% reduction in peak active states from the C1 pruning alone.
Why the design looks like this
Three things, briefly.
One. Naive symbolic execution does not scale on production binaries. Every commercial tool that claims otherwise either ships a heavily-engineered pruning layer or relies on a researcher to set up the entry conditions correctly. Metis makes the pruning layer explicit and configurable.
Two. The cheap stages contain real signal. RMT spectral analysis of a call graph reliably surfaces the small set of functions that are statistically unusual. Most known-bad functions look unusual; this is not a coincidence. C2 was added when an audit of historical disclosures showed that the affected functions were, in retrospect, all over the spectral outlier list of the binaries they lived in.
Three. The gap between source-level defect and live exploitability is real and is the practice’s ongoing methodology concern. C7’s validation step exists specifically to close that gap — to convert a solver-blessed proof-of-concept into a verdict against the shipped binary on the shipped operating system, which is the only verdict that ultimately matters.
Pipeline invocation
A minimal end-to-end use of the pipeline, from Python:
from metis.c2_rmt import C2RMTAnalysis
from metis.c6_taint import C6Analysis
import angr
proj = angr.Project(BINARY, auto_load_libs=False)
c2_result = C2RMTAnalysis.from_project(proj).run()
# Candidates from C2 are passed through C3 and into C6 for symbolic
# analysis; survivors are handed to C7 for on-device validation.
Operational manuals, worked case studies, and per-stage documentation are in the repository. The Python requirement is 3.11 or 3.13; 3.12 is currently unsupported because of an upstream dependency issue.
Honest scope
Metis is research-grade software, not industrial-grade triage infrastructure. The components are individually robust and have been used in anger across several of the disclosures listed in the practice’s Publications section. The pipeline is not packaged as a turnkey tool that a non-author can drop onto an unfamiliar binary and trust the output of without judgement.
The release is published openly because the underlying methodology — stage cheap techniques in front of expensive ones, validate symbolic claims on real binaries — is worth being part of the public conversation about how small labs can credibly contribute to binary security research. The exact code is secondary to the layout.
Engagement
Patches and issues are welcome through the GitHub project. Methodology discussion is welcome by written email to the practice. Commercial support is not offered.