Pricing Structure And Long-Context Cost Inflection
Sources: 1 • Confidence: Medium • Updated: 2026-03-08 21:22
Key takeaways
- GPT-5.4 pricing is slightly higher than the GPT-5.2 family, and both GPT-5.4 models cost more when usage exceeds 272,000 tokens.
- GPT-5.4 outperforms GPT-5.3-Codex on coding-related benchmarks.
- On an internal benchmark of spreadsheet modeling tasks resembling junior investment banking analyst work, GPT-5.4 scored 87.3% versus 68.4% for GPT-5.2.
- In one reported instance, generating an image with GPT-5.4 Pro took 4 minutes 45 seconds and cost $1.55.
- It is unknown whether a GPT-5.4 Codex variant will be released or whether the Codex line has been merged into the main GPT-5.4 family.
Sections
Pricing Structure And Long-Context Cost Inflection
- GPT-5.4 pricing is slightly higher than the GPT-5.2 family, and both GPT-5.4 models cost more when usage exceeds 272,000 tokens.
Capability Crossover: General Model Vs Specialized Coding Model
- GPT-5.4 outperforms GPT-5.3-Codex on coding-related benchmarks.
Business Spreadsheet Modeling Performance Delta
- On an internal benchmark of spreadsheet modeling tasks resembling junior investment banking analyst work, GPT-5.4 scored 87.3% versus 68.4% for GPT-5.2.
Image Generation Operational Characteristics (Latency And Cost)
- In one reported instance, generating an image with GPT-5.4 Pro took 4 minutes 45 seconds and cost $1.55.
Product-Line Uncertainty: Codex As Separate Sku Vs Merged
- It is unknown whether a GPT-5.4 Codex variant will be released or whether the Codex line has been merged into the main GPT-5.4 family.
Watchlist
- It is unknown whether a GPT-5.4 Codex variant will be released or whether the Codex line has been merged into the main GPT-5.4 family.
Unknowns
- What are the exact per-token prices for GPT-5.4 versus GPT-5.2 at or below 272,000 tokens and above 272,000 tokens, and how is the threshold applied (per request, per day, per billing period)?
- What specific benchmarks support the claim that GPT-5.4 outperforms GPT-5.3-Codex on coding, and what is the magnitude of the improvement across tasks?
- Will there be a distinct GPT-5.4-Codex (or equivalent) model, or is the Codex line being discontinued/merged, and on what timeline?
- What is the spreadsheet modeling benchmark definition (task types, allowed tools, grading rubric), and do third-party evaluations reproduce the reported GPT-5.4 vs GPT-5.2 gap?
- What are typical (median and tail) latency and cost for GPT-5.4 Pro image generation across prompts and times, and how variable are these metrics?