Rosa Del Mar

Daily Brief

Issue 64 2026-03-05

Pricing Structure And Long-Context Cost Inflection

Issue 64 Edition 2026-03-05 5 min read
Not accepted General
Sources: 1 • Confidence: Medium • Updated: 2026-03-08 21:22

Key takeaways

  • GPT-5.4 pricing is slightly higher than the GPT-5.2 family, and both GPT-5.4 models cost more when usage exceeds 272,000 tokens.
  • GPT-5.4 outperforms GPT-5.3-Codex on coding-related benchmarks.
  • On an internal benchmark of spreadsheet modeling tasks resembling junior investment banking analyst work, GPT-5.4 scored 87.3% versus 68.4% for GPT-5.2.
  • In one reported instance, generating an image with GPT-5.4 Pro took 4 minutes 45 seconds and cost $1.55.
  • It is unknown whether a GPT-5.4 Codex variant will be released or whether the Codex line has been merged into the main GPT-5.4 family.

Sections

Pricing Structure And Long-Context Cost Inflection

  • GPT-5.4 pricing is slightly higher than the GPT-5.2 family, and both GPT-5.4 models cost more when usage exceeds 272,000 tokens.

Capability Crossover: General Model Vs Specialized Coding Model

  • GPT-5.4 outperforms GPT-5.3-Codex on coding-related benchmarks.

Business Spreadsheet Modeling Performance Delta

  • On an internal benchmark of spreadsheet modeling tasks resembling junior investment banking analyst work, GPT-5.4 scored 87.3% versus 68.4% for GPT-5.2.

Image Generation Operational Characteristics (Latency And Cost)

  • In one reported instance, generating an image with GPT-5.4 Pro took 4 minutes 45 seconds and cost $1.55.

Product-Line Uncertainty: Codex As Separate Sku Vs Merged

  • It is unknown whether a GPT-5.4 Codex variant will be released or whether the Codex line has been merged into the main GPT-5.4 family.

Watchlist

  • It is unknown whether a GPT-5.4 Codex variant will be released or whether the Codex line has been merged into the main GPT-5.4 family.

Unknowns

  • What are the exact per-token prices for GPT-5.4 versus GPT-5.2 at or below 272,000 tokens and above 272,000 tokens, and how is the threshold applied (per request, per day, per billing period)?
  • What specific benchmarks support the claim that GPT-5.4 outperforms GPT-5.3-Codex on coding, and what is the magnitude of the improvement across tasks?
  • Will there be a distinct GPT-5.4-Codex (or equivalent) model, or is the Codex line being discontinued/merged, and on what timeline?
  • What is the spreadsheet modeling benchmark definition (task types, allowed tools, grading rubric), and do third-party evaluations reproduce the reported GPT-5.4 vs GPT-5.2 gap?
  • What are typical (median and tail) latency and cost for GPT-5.4 Pro image generation across prompts and times, and how variable are these metrics?

Investor overlay

Read-throughs

  • A step up in costs above 272,000 tokens could shift long-context application economics, affecting customer usage patterns and vendor revenue mix between short and long-context workloads.
  • If GPT-5.4 beats a specialized coding model on coding benchmarks, model portfolios may consolidate toward general-purpose SKUs, potentially reducing differentiation of standalone coding offerings.
  • A large reported gain on spreadsheet modeling tasks suggests improved fit for spreadsheet-centric enterprise workflows, which could increase demand for automation in finance and operations use cases.

What would confirm

  • Published per-token pricing tables that specify GPT-5.4 versus GPT-5.2 rates below and above 272,000 tokens, and clear documentation of how the threshold is applied.
  • Third-party or detailed benchmark disclosures showing GPT-5.4 performance versus GPT-5.3-Codex on coding tasks, including task coverage and magnitude of improvements.
  • Independent replication or detailed methodology for the spreadsheet modeling benchmark, plus broader evidence of typical image generation latency and cost distributions beyond a single example.

What would kill

  • Clarification that the 272,000-token inflection does not materially change effective costs for most users, or that GPT-5.4 pricing is not meaningfully higher than GPT-5.2 in practice.
  • Benchmark evidence showing GPT-5.4 does not consistently outperform specialized coding models, or that gains are limited to narrow tasks not representative of real coding workloads.
  • External evaluations failing to reproduce the spreadsheet modeling performance gap, or showing that image generation cost and latency are significantly worse than implied for common usage.

Sources

  1. 2026-03-05 simonwillison.net