Long Context Availability

Issue 72 Edition 2026-03-13 4 min read

Not accepted General

Sources: 1 • Confidence: Medium • Updated: 2026-04-13 03:49

Key takeaways

A 1M-context window is generally available for the Opus 4.6 and Sonnet 4.6 models.
Standard pricing applies across the full 1M-context window for Opus 4.6 and Sonnet 4.6, with no long-context premium.
OpenAI and Google Gemini charge higher prompt prices once token counts exceed thresholds, including 200,000 tokens for Gemini 3.1 Pro and 272,000 tokens for GPT-5.4.

A 1M-context window is generally available for the Opus 4.6 and Sonnet 4.6 models.

Standard pricing applies across the full 1M-context window for Opus 4.6 and Sonnet 4.6, with no long-context premium.

OpenAI and Google Gemini charge higher prompt prices once token counts exceed thresholds, including 200,000 tokens for Gemini 3.1 Pro and 272,000 tokens for GPT-5.4.

What official documentation (API limits, product pages, or release notes) confirms 1M context general availability for the two named models, and are there any eligibility constraints?
What are the operational constraints at 1M context (rate limits, maximum request size in bytes, timeout limits, streaming behavior, and latency expectations)?
How billing behaves in practice for near-1M prompts (invoice line items, rounding, any minimums), and whether any hidden multipliers apply at high token counts despite “standard pricing” wording?
Are the competitor long-context pricing thresholds and step-up pricing mechanisms accurately represented, and what are the exact multipliers or tier prices?
What decision-readthrough (operator/product/investor) is warranted from this corpus beyond “verify availability and pricing in official docs”?

If 1M context is generally available for Opus 4.6 and Sonnet 4.6, some long document and large codebase use cases may shift from retrieval heavy architectures toward single call processing, potentially simplifying deployments and changing usage patterns.
If standard pricing truly applies up to 1M tokens with no premium, long context workloads could become more economically predictable and potentially more attractive relative to competitors that add prompt price step ups beyond thresholds.
If competitors apply higher prompt prices beyond stated token thresholds, enterprise buyers evaluating long context may re benchmark total cost of ownership and vendor selection based on all in prompt pricing versus tiered pricing.

Official API limits, product pages, or release notes stating 1M context general availability for Opus 4.6 and Sonnet 4.6, including any eligibility constraints and rollout status.
Pricing tables and invoices showing no long context premium up to 1M tokens, with clarity on rounding, minimums, and any multipliers for near max prompts.
Operational documentation or observed behavior confirming rate limits, timeout limits, request size constraints, and latency at 1M context, plus competitor pricing pages that validate thresholds and step up mechanics.

Documentation reveals 1M context is limited access, region limited, or gated by account tier, or the effective maximum context is materially below 1M for general users.
Billing artifacts show hidden premiums at high token counts such as surcharges, different meter rates, or mandatory minimums that raise effective cost versus standard pricing claims.
Practical constraints such as severe latency, low rate limits, request size caps, or timeouts make 1M context unusable for common production workloads, reducing the relevance of nominal maximum context.