Rosa Del Mar

Daily Brief

Issue 62 2026-03-03

Low-Cost Model Refresh And Explicit Token Pricing

Issue 62 Edition 2026-03-03 4 min read
General
Sources: 1 • Confidence: High • Updated: 2026-04-13 03:55

Key takeaways

  • Google released Gemini 3.1 Flash-Lite as an update to its inexpensive Flash-Lite model family.
  • Gemini 3.1 Flash-Lite supports four different thinking levels.
  • Gemini 3.1 Flash-Lite pricing is stated as $0.25 per million input tokens and $1.5 per million output tokens.
  • Gemini 3.1 Flash-Lite is described as one-eighth the price of Gemini 3.1 Pro.
  • The four thinking levels shown for Gemini 3.1 Flash-Lite are minimal, low, medium, and high.

Sections

Low-Cost Model Refresh And Explicit Token Pricing

  • Google released Gemini 3.1 Flash-Lite as an update to its inexpensive Flash-Lite model family.
  • Gemini 3.1 Flash-Lite pricing is stated as $0.25 per million input tokens and $1.5 per million output tokens.
  • Gemini 3.1 Flash-Lite is described as one-eighth the price of Gemini 3.1 Pro.

Inference-Time Controllability Via Discrete Thinking Levels

  • Gemini 3.1 Flash-Lite supports four different thinking levels.
  • The four thinking levels shown for Gemini 3.1 Flash-Lite are minimal, low, medium, and high.

Unknowns

  • What are the benchmarked quality differences between Gemini 3.1 Flash-Lite and the prior Flash-Lite version(s), and between Flash-Lite and Gemini 3.1 Pro?
  • What is the exact API parameterization for selecting thinking levels, and are the values stable and officially supported?
  • How do the thinking levels change latency, token usage, tool-calling behavior (if any), and/or billed output in practice?
  • Are there any rate limits, capacity constraints, or availability restrictions (regions, quotas, waitlists) specific to Gemini 3.1 Flash-Lite?
  • Does the stated token pricing differ by context length, batch/async modes, or other usage dimensions (e.g., caching) for Gemini 3.1 Flash-Lite?

Investor overlay

Read-throughs

  • Clear low token pricing and a stated one eighth versus Pro may pressure competitive pricing and improve customer budgeting, potentially accelerating experimentation and adoption of lower tier models.
  • Discrete thinking levels may enable cost latency quality tuning, which could broaden use cases for low cost models if controllability proves stable and useful.

What would confirm

  • Published benchmarks or third party evaluations showing quality and latency tradeoffs between Flash-Lite and Pro, and versus prior Flash-Lite, at each thinking level.
  • Official API documentation confirming stable parameters for thinking levels and detailing how they affect token usage, latency, tool calling, and billing.
  • Disclosures on availability such as regions, quotas, rate limits, and any pricing dimensions like caching, context length, or batch modes.

What would kill

  • Evidence that Flash-Lite quality is materially worse than prior versions or that higher thinking levels do not improve outcomes, limiting substitutability and usefulness.
  • Unstable or unofficial thinking level controls, or minimal practical impact on cost and latency, reducing the value of controllability.
  • Meaningful restrictions such as tight quotas, limited regions, or hidden pricing adders that undermine the stated token prices.

Sources

  1. 2026-03-03 simonwillison.net