Model Capability Positioning For Coding
Sources: 1 • Confidence: Medium • Updated: 2026-04-12 10:22
Key takeaways
- It is uncertain whether a GPT-5.4 Codex variant will be released or whether the Codex line has been merged into the main model family.
- GPT-5.4 pricing is slightly higher than the GPT-5.2 family, and both GPT-5.4 models cost more when usage exceeds 272,000 tokens.
- On an internal benchmark of spreadsheet modeling tasks resembling junior investment banking analyst work, GPT-5.4 scored 87.3% versus 68.4% for GPT-5.2.
- In one reported run, generating an image with GPT-5.4 Pro took 4 minutes 45 seconds and cost $1.55.
- GPT-5.4 outperforms the coding-specialist GPT-5.3-Codex on relevant benchmarks.
Sections
Model Capability Positioning For Coding
- It is uncertain whether a GPT-5.4 Codex variant will be released or whether the Codex line has been merged into the main model family.
- GPT-5.4 outperforms the coding-specialist GPT-5.3-Codex on relevant benchmarks.
Pricing And Long Context Cost Structure
- GPT-5.4 pricing is slightly higher than the GPT-5.2 family, and both GPT-5.4 models cost more when usage exceeds 272,000 tokens.
Spreadsheet Analytic Task Performance
- On an internal benchmark of spreadsheet modeling tasks resembling junior investment banking analyst work, GPT-5.4 scored 87.3% versus 68.4% for GPT-5.2.
Image Generation Latency And Task Cost
- In one reported run, generating an image with GPT-5.4 Pro took 4 minutes 45 seconds and cost $1.55.
Watchlist
- It is uncertain whether a GPT-5.4 Codex variant will be released or whether the Codex line has been merged into the main model family.
Unknowns
- What are the published per-token rates for GPT-5.4 at or below 272,000 tokens versus above 272,000 tokens, and how is the threshold applied in billing?
- Which specific coding benchmarks support the claim that GPT-5.4 outperforms GPT-5.3-Codex, and what are the exact results and evaluation conditions?
- Will there be a distinct GPT-5.4 Codex variant, and if not, what explicit product-line change (merge, rename, or deprecation) is announced?
- What is the design of the internal spreadsheet modeling benchmark (task set, grading rubric, error tolerance), and are there third-party replications comparing GPT-5.4 to GPT-5.2?
- What is the typical (median/p95) latency and cost distribution for GPT-5.4 Pro image generation across prompts and times, and what factors drive variance?