Mechanism: Binary Search Via Http Range Requests Over Large Static Files

Issue 58 Edition 2026-02-27 5 min read

General

Sources: 1 • Confidence: High • Updated: 2026-03-02 19:33

Key takeaways

The demo accepts either a single character or a hexadecimal Unicode codepoint and displays the steps of the binary search through the large file.
HTTP range request techniques are not compatible with HTTP compression because compression breaks byte-offset calculations.
The tool was deployed at tools.simonwillison.net and issues range requests against a CORS-enabled 76.6MB file hosted in S3 and fronted by Cloudflare.
Claude was used to generate a specification and Claude Code for web was used to convert that specification into working code via an asynchronous research workflow.
A prototype was built using a phone as an experiment in using HTTP range requests.

The demo accepts either a single character or a hexadecimal Unicode codepoint and displays the steps of the binary search through the large file.
A prototype was built using a phone as an experiment in using HTTP range requests.
The prototype performs binary search over a large file by issuing HTTP range requests.
A proposed use case for the approach is looking up Unicode codepoint metadata that spans many megabytes of data.

HTTP range request techniques are not compatible with HTTP compression because compression breaks byte-offset calculations.
This range-request binary-search approach requires data that is naturally sorted.

The tool was deployed at tools.simonwillison.net and issues range requests against a CORS-enabled 76.6MB file hosted in S3 and fronted by Cloudflare.

Claude was used to generate a specification and Claude Code for web was used to convert that specification into working code via an asynchronous research workflow.

What are the measured performance characteristics (latency per lookup, number of range requests per query, total bytes transferred per query) under realistic network conditions?
What exact server/CDN configuration ensures responses are not compressed for range requests, and how is correctness validated across different clients and network paths?
How is the large file structured (record format, fixed vs variable-length records, indexing approach if any) to support binary search on byte ranges?
What are the operational cost implications (S3 egress, Cloudflare bandwidth, request volume) relative to alternative approaches like shipping a local index or precomputed compact data?
What failure modes are handled (range request unsupported, CORS misconfiguration, partial content errors, inconsistent byte serving) and what are the user-visible fallbacks?

Broader viability of browser based data access patterns that avoid full downloads by using HTTP range requests against large static files, potentially affecting CDN and object storage usage patterns if adopted in similar demos and tools.
Operational importance of serving configurations that preserve byte offsets, implying demand for predictable non compressed partial content delivery across CDNs and storage front doors for data lookup workloads.
Increased emphasis on AI assisted prototyping workflows where specification generation and code conversion accelerate small web tooling, potentially influencing developer tool adoption if reproducible beyond this one project.

Published performance measurements such as latency per lookup, range requests per query, and bytes transferred under realistic networks showing practical UX and cost profile.
Clear documented CDN and origin configuration that reliably disables compression for range requests and demonstrates consistent correctness across browsers and network paths.
Evidence of reuse or replication in other datasets or tools using the same pattern, indicating the technique generalizes beyond a single Unicode lookup demo.

Measured performance or cost proves uncompetitive versus alternatives such as shipping a local index or using a precomputed compact dataset, limiting the technique to novelty demos.
Inability to guarantee non compressed partial content responses across common CDN setups or clients, causing incorrect byte range reads and breaking binary search correctness.
Frequent failure modes in the field such as unsupported range requests, CORS issues, partial content errors, or inconsistent byte serving with no viable fallback.