Failure Mode: Persuasive But Ungrounded Disassembly/Decompilation Output

Issue 79 Edition 2026-03-20 6 min read

General

Sources: 1 • Confidence: High • Updated: 2026-04-13 03:52

Key takeaways

An assembler-knowledgeable reviewer argued the output was not a full disassembly and questioned whether the short snippets shown were correct.
The author used Codex CLI with GPT-5.4 xhigh to review the zip for obvious hallucinations and, seeing none, published the result.
Searching the binary for the hex opcode sequence B0 E8 ('mov al,0xe8') was used to confirm that a presented snippet was not present anywhere in the binary.
Borland's 1985 Turbo Pascal 3.02 executable was 39,731 bytes and included a full text editor IDE and a Pascal compiler.
A reviewer noted additional suspicious or impossible code in the artifact, including a 'ret 1' in a system call dispatcher that would misalign the stack.

An assembler-knowledgeable reviewer argued the output was not a full disassembly and questioned whether the short snippets shown were correct.
A reviewer noted additional suspicious or impossible code in the artifact, including a 'ret 1' in a system call dispatcher that would misalign the stack.
A later update states that the published decompiled/annotated result was hallucinated and inaccurate.
A reviewer identified an example where an 'EmitByte' routine included a push/pop of AX that the reviewer concluded does not appear in the actual binary.
After being shown the critique, Claude agreed the artifact mixed real hex dumps and some correct disassembly with wholesale fabricated assembly and labels for roughly half the binary that fails byte-level comparison.

The author used Codex CLI with GPT-5.4 xhigh to review the zip for obvious hallucinations and, seeing none, published the result.
The author obtained the Turbo Pascal executable and used Claude to interpret the binary and generate a decompiled/annotated interactive artifact via a sequence of prompts.
The shared Claude link did not include the actually executed code, so the author provided a zip of intermediate files instead.

Searching the binary for the hex opcode sequence B0 E8 ('mov al,0xe8') was used to confirm that a presented snippet was not present anywhere in the binary.
A reviewer identified an example where an 'EmitByte' routine included a push/pop of AX that the reviewer concluded does not appear in the actual binary.

Borland's 1985 Turbo Pascal 3.02 executable was 39,731 bytes and included a full text editor IDE and a Pascal compiler.

What is the cryptographic hash (or equivalent integrity identifier) of the exact Turbo Pascal executable that was analyzed, and is it shared so others can reproduce the checks?
What fraction of the published artifact is verifiably mapped to exact offsets and instruction bytes in the executable, and what coverage metrics exist (bytes, functions, basic blocks)?
Which specific snippets (if any) were confirmed to be correct via independent disassembly tooling, and which were falsified with byte-level comparisons?
Is the size-and-capability claim about the Turbo Pascal executable independently verified within the same workflow (e.g., by extracting features and confirming they exist in the shipped binary) rather than asserted as context?
What objective validation gates (automated opcode/offset checks, deterministic disassembly diffs) were applied prior to publication beyond a model-based review, and what were their results?

Rising demand for deterministic validation layers around LLM-assisted reverse engineering and decompilation, including byte-to-instruction grounding, offset mapping, and reproducible artifacts before publication.
Greater emphasis on provenance and integrity workflows for binary analysis outputs, such as publishing hashes and exact inputs so third parties can reproduce opcode searches and disassembly verification.
Increased scrutiny of model-on-model review as a quality gate, creating a market need for automated falsification checks that catch plausible but ungrounded assembly snippets.

Wider adoption of publishing cryptographic hashes and exact binary samples alongside reverse engineering writeups, enabling independent reproduction of snippet searches and byte-level comparisons.
Reported use of coverage metrics for generated artifacts, such as percent of bytes, functions, or basic blocks mapped to exact offsets and instruction bytes, with tooling outputs shared.
Routine inclusion of automated gates like deterministic disassembly diffs and opcode sequence checks that demonstrate displayed snippets exist verbatim in the analyzed binary.

Reproducibility efforts show that published snippets cannot be found in the binary via opcode or byte sequence searches, indicating fabrication or input mismatch.
Audits reveal most of an artifact lacks offset and byte grounding, with few sections confirmed by independent disassembly tools, reducing credibility of the workflow.
Continued reliance on model-based review without publishing hashes, inputs, and deterministic verification results, leaving outputs non-auditable and limiting adoption.