Model Sizing Semantics And Efficiency Mechanism
Sources: 1 • Confidence: High • Updated: 2026-04-13 03:35
Key takeaways
- Gemma 4 E2B and E4B use Per-Layer Embeddings: per-decoder-layer token embedding tables intended for quick lookups, increasing total tables while keeping effective parameter count lower for on-device efficiency.
- On the same SVG task, Gemma 4 26B-A4B produced an SVG with an 'Attribute x1 redefined' error that the author manually fixed to obtain an excellent result.
- At the time of writing, the author could not run Gemma 4 native audio input locally and suspected common local runtimes (LM Studio or Ollama) did not support it yet.
- Google DeepMind released four vision-capable reasoning LLMs under the Gemma 4 name under the Apache 2.0 license in sizes 2B, 4B, 31B, and a 26B-A4B Mixture-of-Experts variant.
- Using GGUFs in LM Studio, the author successfully ran Gemma 4 2B (4.41GB), 4B (6.33GB), and 26B-A4B (17.99GB), but Gemma 4 31B (19.89GB) looped outputting '---\n' for every prompt.
Sections
Model Sizing Semantics And Efficiency Mechanism
- Gemma 4 E2B and E4B use Per-Layer Embeddings: per-decoder-layer token embedding tables intended for quick lookups, increasing total tables while keeping effective parameter count lower for on-device efficiency.
- The 2B and 4B Gemma 4 models are labeled E2B and E4B, where the 'E' denotes an 'Effective' parameter size rather than total parameters.
- Google positioned Gemma 4 as offering unusually high intelligence per parameter, implying an emphasis on efficiency rather than only scaling parameters.
Capability Scaling And Output Validity In Svg Generation
- On the same SVG task, Gemma 4 26B-A4B produced an SVG with an 'Attribute x1 redefined' error that the author manually fixed to obtain an excellent result.
- In an API run of the pelican-riding-a-bicycle SVG prompt, Gemma 4 31B output was good but omitted the front part of the bicycle frame.
- On a pelican-riding-a-bicycle SVG task, the author observed improved output quality when moving from Gemma 4 2B to 4B to 26B-A4B.
Multimodality Audio And Ecosystem Gap
- At the time of writing, the author could not run Gemma 4 native audio input locally and suspected common local runtimes (LM Studio or Ollama) did not support it yet.
- Gemma 4 E2B and E4B include native audio input for speech recognition and understanding.
Release Scope And Licensing
- Google DeepMind released four vision-capable reasoning LLMs under the Gemma 4 name under the Apache 2.0 license in sizes 2B, 4B, 31B, and a 26B-A4B Mixture-of-Experts variant.
Local Inference Reliability Gguf And Runtime Issues
- Using GGUFs in LM Studio, the author successfully ran Gemma 4 2B (4.41GB), 4B (6.33GB), and 26B-A4B (17.99GB), but Gemma 4 31B (19.89GB) looped outputting '---\n' for every prompt.
Watchlist
- At the time of writing, the author could not run Gemma 4 native audio input locally and suspected common local runtimes (LM Studio or Ollama) did not support it yet.
Unknowns
- What is the precise technical definition of 'effective parameters' for E2B/E4B, and how exactly do Per-Layer Embeddings change memory footprint, compute, and quality relative to conventional embeddings?
- When will common local runtimes (or other local tooling) support Gemma 4 native audio input end-to-end, and what are the supported input formats and constraints?
- Is the Gemma 4 31B GGUF looping issue reproducible across other machines, LM Studio versions, and alternate GGUF builds/quantizations, and what specific component is at fault?
- How frequently do Gemma 4 models produce structurally invalid SVG (e.g., duplicated attributes) or systematic omissions on diagram tasks, and can automated lint/repair close the gap reliably?
- What are the practical limits, pricing, quotas, or latency characteristics of AI Studio API access for the larger Gemma 4 models, and do they differ by model?