Ai-Assisted Reverse Engineering Can Fabricate Plausible Disassembly
Sources: 1 • Confidence: High • Updated: 2026-04-12 10:18
Key takeaways
- An assembler-knowledgeable reviewer argued the output was not a full disassembly, consisted of short snippets, and questioned whether the snippets were correct.
- Model-on-model review (using an LLM to check another LLM-generated reverse engineering artifact) did not prevent publication of a substantially inaccurate result in this case.
- Searching the binary for the opcode sequence B0 E8 ('mov al,0xe8') was used to confirm that a presented snippet was not present anywhere in the binary.
- Borland's Turbo Pascal 3.02 (1985) executable was 39,731 bytes and included a full text editor IDE and a Pascal compiler.
- The author used Codex CLI with GPT-5.4 xhigh to review the zip for obvious hallucinations and then published the result.
Sections
Ai-Assisted Reverse Engineering Can Fabricate Plausible Disassembly
- An assembler-knowledgeable reviewer argued the output was not a full disassembly, consisted of short snippets, and questioned whether the snippets were correct.
- The reviewer reported additional suspicious/impossible code in the artifact, including a 'ret 1' in a system call dispatcher that would misalign the stack.
- The author obtained the Turbo Pascal executable and used Claude via a sequence of prompts to interpret/decompile it into an interactive annotated artifact.
- A later update stated the published decompiled/annotated result was hallucinated and inaccurate.
- The reviewer reported that an 'EmitByte' routine shown in the artifact included a push/pop AX sequence that does not appear in the actual binary, indicating fabrication.
- After being shown the critique, Claude agreed that the artifact mixed real hex dumps and some correct disassembly with fabricated assembly and labels for roughly half the binary that fail byte-level comparison.
Reproducibility And Provenance Constraints In Llm-Generated Technical Artifacts
- Model-on-model review (using an LLM to check another LLM-generated reverse engineering artifact) did not prevent publication of a substantially inaccurate result in this case.
- The author used Codex CLI with GPT-5.4 xhigh to review the zip for obvious hallucinations and then published the result.
- The shared Claude link did not include the actual executed code; the author instead provided a zip of intermediate files.
Low-Cost Falsification Checks For Claimed Disassembly Snippets
- Searching the binary for the opcode sequence B0 E8 ('mov al,0xe8') was used to confirm that a presented snippet was not present anywhere in the binary.
- The reviewer reported that an 'EmitByte' routine shown in the artifact included a push/pop AX sequence that does not appear in the actual binary, indicating fabrication.
Software Compactness Benchmark Is Asserted But Not Explained
- Borland's Turbo Pascal 3.02 (1985) executable was 39,731 bytes and included a full text editor IDE and a Pascal compiler.
Unknowns
- What is the cryptographic hash (or other unique identifier) of the specific Turbo Pascal 3.02 executable analyzed, and is it the same binary used by the reviewer for opcode searches?
- What fraction of the binary can be covered by a reproducible, deterministic disassembly that maps each displayed instruction sequence to exact offsets/bytes?
- Which specific parts of the artifact (if any) are confirmed correct by independent disassembly tooling, and what criteria are used to label a snippet 'correct'?
- What were the exact prompts, tool configurations, and intermediate transformations used to generate the annotated artifact from the binary?
- Is there any direct decision-readthrough (operator, product, or investor) described in the corpus, such as explicit changes to verification policy, publication standards, or tooling requirements?