Low-Cost Model Refresh And Explicit Token Pricing
Sources: 1 • Confidence: High • Updated: 2026-04-13 03:55
Key takeaways
- Google released Gemini 3.1 Flash-Lite as an update to its inexpensive Flash-Lite model family.
- Gemini 3.1 Flash-Lite supports four different thinking levels.
- Gemini 3.1 Flash-Lite pricing is stated as $0.25 per million input tokens and $1.5 per million output tokens.
- Gemini 3.1 Flash-Lite is described as one-eighth the price of Gemini 3.1 Pro.
- The four thinking levels shown for Gemini 3.1 Flash-Lite are minimal, low, medium, and high.
Sections
Low-Cost Model Refresh And Explicit Token Pricing
- Google released Gemini 3.1 Flash-Lite as an update to its inexpensive Flash-Lite model family.
- Gemini 3.1 Flash-Lite pricing is stated as $0.25 per million input tokens and $1.5 per million output tokens.
- Gemini 3.1 Flash-Lite is described as one-eighth the price of Gemini 3.1 Pro.
Inference-Time Controllability Via Discrete Thinking Levels
- Gemini 3.1 Flash-Lite supports four different thinking levels.
- The four thinking levels shown for Gemini 3.1 Flash-Lite are minimal, low, medium, and high.
Unknowns
- What are the benchmarked quality differences between Gemini 3.1 Flash-Lite and the prior Flash-Lite version(s), and between Flash-Lite and Gemini 3.1 Pro?
- What is the exact API parameterization for selecting thinking levels, and are the values stable and officially supported?
- How do the thinking levels change latency, token usage, tool-calling behavior (if any), and/or billed output in practice?
- Are there any rate limits, capacity constraints, or availability restrictions (regions, quotas, waitlists) specific to Gemini 3.1 Flash-Lite?
- Does the stated token pricing differ by context length, batch/async modes, or other usage dimensions (e.g., caching) for Gemini 3.1 Flash-Lite?