Ai-Assisted Reverse Engineering Can Fabricate Plausible Disassembly

Issue 79 Edition 2026-03-20 6 min read

General

Sources: 1 • Confidence: High • Updated: 2026-04-12 10:18

Key takeaways

An assembler-knowledgeable reviewer argued the output was not a full disassembly, consisted of short snippets, and questioned whether the snippets were correct.
Model-on-model review (using an LLM to check another LLM-generated reverse engineering artifact) did not prevent publication of a substantially inaccurate result in this case.
Searching the binary for the opcode sequence B0 E8 ('mov al,0xe8') was used to confirm that a presented snippet was not present anywhere in the binary.
Borland's Turbo Pascal 3.02 (1985) executable was 39,731 bytes and included a full text editor IDE and a Pascal compiler.
The author used Codex CLI with GPT-5.4 xhigh to review the zip for obvious hallucinations and then published the result.

An assembler-knowledgeable reviewer argued the output was not a full disassembly, consisted of short snippets, and questioned whether the snippets were correct.
The reviewer reported additional suspicious/impossible code in the artifact, including a 'ret 1' in a system call dispatcher that would misalign the stack.
The author obtained the Turbo Pascal executable and used Claude via a sequence of prompts to interpret/decompile it into an interactive annotated artifact.
A later update stated the published decompiled/annotated result was hallucinated and inaccurate.
The reviewer reported that an 'EmitByte' routine shown in the artifact included a push/pop AX sequence that does not appear in the actual binary, indicating fabrication.
After being shown the critique, Claude agreed that the artifact mixed real hex dumps and some correct disassembly with fabricated assembly and labels for roughly half the binary that fail byte-level comparison.

Model-on-model review (using an LLM to check another LLM-generated reverse engineering artifact) did not prevent publication of a substantially inaccurate result in this case.
The author used Codex CLI with GPT-5.4 xhigh to review the zip for obvious hallucinations and then published the result.
The shared Claude link did not include the actual executed code; the author instead provided a zip of intermediate files.

Searching the binary for the opcode sequence B0 E8 ('mov al,0xe8') was used to confirm that a presented snippet was not present anywhere in the binary.
The reviewer reported that an 'EmitByte' routine shown in the artifact included a push/pop AX sequence that does not appear in the actual binary, indicating fabrication.

Borland's Turbo Pascal 3.02 (1985) executable was 39,731 bytes and included a full text editor IDE and a Pascal compiler.

What is the cryptographic hash (or other unique identifier) of the specific Turbo Pascal 3.02 executable analyzed, and is it the same binary used by the reviewer for opcode searches?
What fraction of the binary can be covered by a reproducible, deterministic disassembly that maps each displayed instruction sequence to exact offsets/bytes?
Which specific parts of the artifact (if any) are confirmed correct by independent disassembly tooling, and what criteria are used to label a snippet 'correct'?
What were the exact prompts, tool configurations, and intermediate transformations used to generate the annotated artifact from the binary?
Is there any direct decision-readthrough (operator, product, or investor) described in the corpus, such as explicit changes to verification policy, publication standards, or tooling requirements?

Rising demand for deterministic provenance in AI generated security artifacts could shift spending toward tooling that maps outputs to exact bytes and offsets, improving auditability and reducing publication risk.
Model on model review appears insufficient for technical verification, implying process changes that prioritize independent, reproducible tooling checks over LLM based validation in reverse engineering workflows.

Policies or standards requiring byte and offset traceability for published disassembly, including reproducible disassembly coverage metrics and hashes of analyzed binaries.
Evidence that opcode pattern searches and independent disassembly tooling are adopted as gating checks before publication, replacing or supplementing LLM only review.
Publication of full, deterministic disassembly artifacts that map each displayed snippet to exact offsets and raw bytes, with prompts and tool configurations disclosed for reproducibility.

Continued publication of reverse engineered artifacts without binary hashes, offset anchored instruction mappings, or reproducible tooling outputs despite known fabrication risk.
Findings that deterministic tooling verification does not materially reduce inaccuracies in AI assisted reverse engineering artifacts in practice.
No observable changes in verification policy or workflow after documented failures, indicating low willingness to invest in stronger provenance and reproducibility controls.