Range-Request Binary Search Over Large Static Files
Sources: 1 • Confidence: High • Updated: 2026-04-13 03:43
Key takeaways
- The demo accepts either a single character or a hexadecimal Unicode codepoint and displays the intermediate steps of the binary search through the large file.
- HTTP range request techniques are not compatible with HTTP compression because compression breaks byte-offset calculations.
- A prototype was built from a phone as an experiment in using HTTP range requests.
- The tool was deployed at tools.simonwillison.net and queries a CORS-enabled 76.6MB file hosted in S3 and fronted by Cloudflare using range requests.
- The prototype searches a large file by performing a binary search implemented via HTTP range requests.
Sections
Range-Request Binary Search Over Large Static Files
- The demo accepts either a single character or a hexadecimal Unicode codepoint and displays the intermediate steps of the binary search through the large file.
- The prototype searches a large file by performing a binary search implemented via HTTP range requests.
- A proposed use case for the approach is looking up Unicode codepoint metadata across many megabytes of data.
- This range-request binary-search approach requires the underlying data to be naturally sorted for binary search to be beneficial.
Operational Constraints For Correctness (Compression, Cors, Cdn/Object Storage)
- HTTP range request techniques are not compatible with HTTP compression because compression breaks byte-offset calculations.
- The tool was deployed at tools.simonwillison.net and queries a CORS-enabled 76.6MB file hosted in S3 and fronted by Cloudflare using range requests.
Ai-Assisted Prototyping Workflow For Building Technical Demos
- A prototype was built from a phone as an experiment in using HTTP range requests.
- An AI-assisted workflow was used in which Claude generated a specification and Claude Code for web converted it into working code via an asynchronous research workflow.
Unknowns
- What are the observed latency and number of range requests per typical lookup for the deployed tool under realistic network conditions?
- What are the actual egress and CDN cost characteristics of this approach compared to downloading the full file or using an indexed API endpoint?
- How exactly is the compression constraint handled in the deployed setup, and is correctness verified across different clients and CDN paths?
- What file format and indexing strategy (if any) is used to ensure the dataset is sorted in a way that makes binary search correct and stable across updates?
- What is the error-handling behavior for partial-content failures (missing Range support, 416 responses, transient network errors) and how does the client recover?