Pricing Mechanics For Long-Context Usage
Sources: 1 • Confidence: Medium • Updated: 2026-04-13 03:56
Key takeaways
- GPT-5.4 pricing is slightly higher than the GPT-5.2 family, and both GPT-5.4 models cost more when usage exceeds 272,000 tokens.
- GPT-5.4 outperforms GPT-5.3-Codex on relevant coding benchmarks.
- On an internal benchmark of spreadsheet modeling tasks resembling junior investment banking analyst work, GPT-5.4 scored 87.3% versus 68.4% for GPT-5.2.
- In one reported instance, generating an image with GPT-5.4 Pro took 4 minutes 45 seconds and cost $1.55.
- It is currently unclear whether a GPT-5.4-Codex variant will be released or whether the Codex line has been merged into the main model family.
Sections
Pricing Mechanics For Long-Context Usage
- GPT-5.4 pricing is slightly higher than the GPT-5.2 family, and both GPT-5.4 models cost more when usage exceeds 272,000 tokens.
Capability Crossover: General Model Vs Coding Specialist
- GPT-5.4 outperforms GPT-5.3-Codex on relevant coding benchmarks.
Business Spreadsheet Modeling Performance Jump
- On an internal benchmark of spreadsheet modeling tasks resembling junior investment banking analyst work, GPT-5.4 scored 87.3% versus 68.4% for GPT-5.2.
Image Generation Cost And Latency Constraints
- In one reported instance, generating an image with GPT-5.4 Pro took 4 minutes 45 seconds and cost $1.55.
Product-Line Uncertainty: Codex Sku Vs Consolidation
- It is currently unclear whether a GPT-5.4-Codex variant will be released or whether the Codex line has been merged into the main model family.
Watchlist
- It is currently unclear whether a GPT-5.4-Codex variant will be released or whether the Codex line has been merged into the main model family.
Unknowns
- What are the published per-token rates for GPT-5.4 at or below the 272,000-token threshold versus above it, and how exactly is the threshold applied (per request, per day, per billing interval, or another unit)?
- Which specific coding benchmarks support the claim that GPT-5.4 outperforms GPT-5.3-Codex, and what are the evaluation details (task mix, constraints, scoring, variance)?
- Will there be a GPT-5.4-Codex (or equivalent coding-specialized) SKU, and if so, how will it differ in capability, price, and limits from GPT-5.4?
- Is the spreadsheet modeling benchmark result reproducible in third-party evaluations, and how does it translate into real-world spreadsheet/model error rates and correction burden?
- What is the typical latency and cost distribution for GPT-5.4 Pro image generation across prompts, times, and regions, and how much variability should systems expect?