Pricing Structure And Long-Context Cost Inflection

Issue 64 Edition 2026-03-05 5 min read

Not accepted General

Sources: 1 • Confidence: Medium • Updated: 2026-03-08 21:22

Key takeaways

GPT-5.4 pricing is slightly higher than the GPT-5.2 family, and both GPT-5.4 models cost more when usage exceeds 272,000 tokens.
GPT-5.4 outperforms GPT-5.3-Codex on coding-related benchmarks.
On an internal benchmark of spreadsheet modeling tasks resembling junior investment banking analyst work, GPT-5.4 scored 87.3% versus 68.4% for GPT-5.2.
In one reported instance, generating an image with GPT-5.4 Pro took 4 minutes 45 seconds and cost $1.55.
It is unknown whether a GPT-5.4 Codex variant will be released or whether the Codex line has been merged into the main GPT-5.4 family.

GPT-5.4 pricing is slightly higher than the GPT-5.2 family, and both GPT-5.4 models cost more when usage exceeds 272,000 tokens.

On an internal benchmark of spreadsheet modeling tasks resembling junior investment banking analyst work, GPT-5.4 scored 87.3% versus 68.4% for GPT-5.2.

In one reported instance, generating an image with GPT-5.4 Pro took 4 minutes 45 seconds and cost $1.55.

It is unknown whether a GPT-5.4 Codex variant will be released or whether the Codex line has been merged into the main GPT-5.4 family.

It is unknown whether a GPT-5.4 Codex variant will be released or whether the Codex line has been merged into the main GPT-5.4 family.

What are the exact per-token prices for GPT-5.4 versus GPT-5.2 at or below 272,000 tokens and above 272,000 tokens, and how is the threshold applied (per request, per day, per billing period)?
What specific benchmarks support the claim that GPT-5.4 outperforms GPT-5.3-Codex on coding, and what is the magnitude of the improvement across tasks?
Will there be a distinct GPT-5.4-Codex (or equivalent) model, or is the Codex line being discontinued/merged, and on what timeline?
What is the spreadsheet modeling benchmark definition (task types, allowed tools, grading rubric), and do third-party evaluations reproduce the reported GPT-5.4 vs GPT-5.2 gap?
What are typical (median and tail) latency and cost for GPT-5.4 Pro image generation across prompts and times, and how variable are these metrics?

A step up in costs above 272,000 tokens could shift long-context application economics, affecting customer usage patterns and vendor revenue mix between short and long-context workloads.
If GPT-5.4 beats a specialized coding model on coding benchmarks, model portfolios may consolidate toward general-purpose SKUs, potentially reducing differentiation of standalone coding offerings.
A large reported gain on spreadsheet modeling tasks suggests improved fit for spreadsheet-centric enterprise workflows, which could increase demand for automation in finance and operations use cases.

Published per-token pricing tables that specify GPT-5.4 versus GPT-5.2 rates below and above 272,000 tokens, and clear documentation of how the threshold is applied.
Third-party or detailed benchmark disclosures showing GPT-5.4 performance versus GPT-5.3-Codex on coding tasks, including task coverage and magnitude of improvements.
Independent replication or detailed methodology for the spreadsheet modeling benchmark, plus broader evidence of typical image generation latency and cost distributions beyond a single example.

Clarification that the 272,000-token inflection does not materially change effective costs for most users, or that GPT-5.4 pricing is not meaningfully higher than GPT-5.2 in practice.
Benchmark evidence showing GPT-5.4 does not consistently outperform specialized coding models, or that gains are limited to narrow tasks not representative of real coding workloads.
External evaluations failing to reproduce the spreadsheet modeling performance gap, or showing that image generation cost and latency are significantly worse than implied for common usage.