Low-Cost Model Refresh And Explicit Token Pricing

Issue 62 Edition 2026-03-03 4 min read

General

Sources: 1 • Confidence: High • Updated: 2026-04-13 03:55

Key takeaways

Google released Gemini 3.1 Flash-Lite as an update to its inexpensive Flash-Lite model family.
Gemini 3.1 Flash-Lite supports four different thinking levels.
Gemini 3.1 Flash-Lite pricing is stated as $0.25 per million input tokens and $1.5 per million output tokens.
Gemini 3.1 Flash-Lite is described as one-eighth the price of Gemini 3.1 Pro.
The four thinking levels shown for Gemini 3.1 Flash-Lite are minimal, low, medium, and high.

Google released Gemini 3.1 Flash-Lite as an update to its inexpensive Flash-Lite model family.
Gemini 3.1 Flash-Lite pricing is stated as $0.25 per million input tokens and $1.5 per million output tokens.
Gemini 3.1 Flash-Lite is described as one-eighth the price of Gemini 3.1 Pro.

Gemini 3.1 Flash-Lite supports four different thinking levels.
The four thinking levels shown for Gemini 3.1 Flash-Lite are minimal, low, medium, and high.

What are the benchmarked quality differences between Gemini 3.1 Flash-Lite and the prior Flash-Lite version(s), and between Flash-Lite and Gemini 3.1 Pro?
What is the exact API parameterization for selecting thinking levels, and are the values stable and officially supported?
How do the thinking levels change latency, token usage, tool-calling behavior (if any), and/or billed output in practice?
Are there any rate limits, capacity constraints, or availability restrictions (regions, quotas, waitlists) specific to Gemini 3.1 Flash-Lite?
Does the stated token pricing differ by context length, batch/async modes, or other usage dimensions (e.g., caching) for Gemini 3.1 Flash-Lite?

Clear low token pricing and a stated one eighth versus Pro may pressure competitive pricing and improve customer budgeting, potentially accelerating experimentation and adoption of lower tier models.
Discrete thinking levels may enable cost latency quality tuning, which could broaden use cases for low cost models if controllability proves stable and useful.

Published benchmarks or third party evaluations showing quality and latency tradeoffs between Flash-Lite and Pro, and versus prior Flash-Lite, at each thinking level.
Official API documentation confirming stable parameters for thinking levels and detailing how they affect token usage, latency, tool calling, and billing.
Disclosures on availability such as regions, quotas, rate limits, and any pricing dimensions like caching, context length, or batch modes.

Evidence that Flash-Lite quality is materially worse than prior versions or that higher thinking levels do not improve outcomes, limiting substitutability and usefulness.
Unstable or unofficial thinking level controls, or minimal practical impact on cost and latency, reducing the value of controllability.
Meaningful restrictions such as tight quotas, limited regions, or hidden pricing adders that undermine the stated token prices.