Range-Request Binary Search Over Large Static Files

Issue 58 Edition 2026-02-27 5 min read

General

Sources: 1 • Confidence: High • Updated: 2026-04-13 03:43

Key takeaways

The demo accepts either a single character or a hexadecimal Unicode codepoint and displays the intermediate steps of the binary search through the large file.
HTTP range request techniques are not compatible with HTTP compression because compression breaks byte-offset calculations.
A prototype was built from a phone as an experiment in using HTTP range requests.
The tool was deployed at tools.simonwillison.net and queries a CORS-enabled 76.6MB file hosted in S3 and fronted by Cloudflare using range requests.
The prototype searches a large file by performing a binary search implemented via HTTP range requests.

The demo accepts either a single character or a hexadecimal Unicode codepoint and displays the intermediate steps of the binary search through the large file.
The prototype searches a large file by performing a binary search implemented via HTTP range requests.
A proposed use case for the approach is looking up Unicode codepoint metadata across many megabytes of data.
This range-request binary-search approach requires the underlying data to be naturally sorted for binary search to be beneficial.

HTTP range request techniques are not compatible with HTTP compression because compression breaks byte-offset calculations.
The tool was deployed at tools.simonwillison.net and queries a CORS-enabled 76.6MB file hosted in S3 and fronted by Cloudflare using range requests.

A prototype was built from a phone as an experiment in using HTTP range requests.
An AI-assisted workflow was used in which Claude generated a specification and Claude Code for web converted it into working code via an asynchronous research workflow.

What are the observed latency and number of range requests per typical lookup for the deployed tool under realistic network conditions?
What are the actual egress and CDN cost characteristics of this approach compared to downloading the full file or using an indexed API endpoint?
How exactly is the compression constraint handled in the deployed setup, and is correctness verified across different clients and CDN paths?
What file format and indexing strategy (if any) is used to ensure the dataset is sorted in a way that makes binary search correct and stable across updates?
What is the error-handling behavior for partial-content failures (missing Range support, 416 responses, transient network errors) and how does the client recover?

Lightweight client-side querying of large static datasets via HTTP range requests could reduce need for full downloads or bespoke APIs when data is sorted and stable, favoring simple CDN plus object storage delivery.
Browser tools that expose intermediate steps may increase trust and debuggability for remote data access patterns, potentially improving adoption for developer-facing datasets where transparency matters.
AI-assisted prototyping workflows may speed delivery of technical demos and experimental tooling, increasing volume of small deployable utilities built with minimal infrastructure.

Measured production behavior shows low median latency and limited range requests per lookup under realistic networks, with acceptable failure recovery on partial-content and transient errors.
Operational setup clearly enforces no compression and consistent byte-offset semantics across CDN and storage paths, with correctness verified across major browsers and caching scenarios.
Documented cost comparison shows egress and CDN charges per lookup are materially lower than full-file fetches or competitive with an indexed API endpoint at similar reliability.

Range requests are unreliable in practice due to CDN, client, or server behavior, frequent 416 responses, or inconsistent handling that breaks correctness or user experience.
Compression and caching constraints make deployment cumbersome or incompatible with common hosting defaults, leading to incorrect byte offsets or hard-to-maintain configurations.
Typical lookups require many round trips or large byte ranges, making latency and egress costs comparable to downloading the file or using a conventional API, reducing practical advantage.