Open-Licensed Multimodal Model Release (Scope And Access Paths)
Sources: 1 • Confidence: High • Updated: 2026-04-03 03:52
Key takeaways
- Google DeepMind released four new vision-capable reasoning LLMs called Gemma 4 under the Apache 2.0 license in sizes 2B, 4B, 31B, and a 26B-A4B Mixture-of-Experts variant.
- E2B and E4B use per-layer embeddings: per-decoder-layer token embedding tables used for quick lookups, increasing the number of tables while keeping effective parameter count lower for on-device efficiency.
- The author encountered an SVG error ('Attribute x1 redefined') in output from Gemma 4 26B-A4B and obtained an excellent result after manually fixing it.
- The author was unable to run Gemma 4 native audio input locally and suspects common local runtimes (LM Studio or Ollama) do not yet support audio input for these models.
- Google is positioning Gemma 4 as offering unusually high intelligence-per-parameter.
Sections
Open-Licensed Multimodal Model Release (Scope And Access Paths)
- Google DeepMind released four new vision-capable reasoning LLMs called Gemma 4 under the Apache 2.0 license in sizes 2B, 4B, 31B, and a 26B-A4B Mixture-of-Experts variant.
- The E2B and E4B Gemma 4 models include native audio input for speech recognition and understanding.
- Google provides API access to the larger Gemma 4 models via AI Studio, and the author added support to llm-gemini to use the 31B model through that API.
Effective-Parameter Branding And The Uncertain Mechanism Behind It
- E2B and E4B use per-layer embeddings: per-decoder-layer token embedding tables used for quick lookups, increasing the number of tables while keeping effective parameter count lower for on-device efficiency.
- The author does not fully understand the per-layer embeddings explanation and infers it is what the 'E' designation refers to.
- The two smaller Gemma 4 models are labeled E2B and E4B, where the 'E' denotes an effective parameter size rather than total parameters.
Capability Scaling Signals And Production-Safety Issues For Structured Outputs (Svg)
- The author encountered an SVG error ('Attribute x1 redefined') in output from Gemma 4 26B-A4B and obtained an excellent result after manually fixing it.
- In an API run with a pelican-riding-a-bicycle SVG prompt, Gemma 4 31B produced a good output but omitted the front part of the bicycle frame.
- On a pelican-riding-a-bicycle SVG task, the author observed better output quality when moving from Gemma 4 2B to 4B to 26B-A4B.
Local Deployment Friction: Modality Support Gaps And Runtime Failures
- The author was unable to run Gemma 4 native audio input locally and suspects common local runtimes (LM Studio or Ollama) do not yet support audio input for these models.
- Using GGUFs in LM Studio, the author successfully ran Gemma 4 2B (4.41GB), 4B (6.33GB), and 26B-A4B (17.99GB), but the 31B (19.89GB) model looped outputting '---\n' for every prompt.
Efficiency Narrative (Intelligence-Per-Parameter) As An Explicit Positioning
- Google is positioning Gemma 4 as offering unusually high intelligence-per-parameter.
- The expectation in the corpus is that improving usefulness of smaller models is a key current research focus.
Watchlist
- The author was unable to run Gemma 4 native audio input locally and suspects common local runtimes (LM Studio or Ollama) do not yet support audio input for these models.
Unknowns
- What is the precise, authoritative definition and accounting method for Gemma 4 “effective parameters” versus total parameters (including how per-layer embeddings are counted)?
- What are the actual latency, memory, and quality tradeoffs of E2B/E4B relative to similarly sized non-“effective” models under standardized inference settings?
- When will mainstream local runtimes (e.g., LM Studio, Ollama) support Gemma 4 native audio input end-to-end, and what formats/APIs will that require?
- What is the root cause of the Gemma 4 31B GGUF looping output in LM Studio, and does it reproduce across different GGUF conversions, quantizations, or runtime versions?
- How frequent are SVG validity errors (like duplicate attributes) and content omissions across prompts, models, and decoding settings, and can they be mitigated reliably with automated validation/repair?