Open-Licensed Multimodal Model Release (Scope And Access Paths)

Issue 92 Edition 2026-04-02 7 min read

General

Sources: 1 • Confidence: High • Updated: 2026-04-03 03:52

Key takeaways

Google DeepMind released four new vision-capable reasoning LLMs called Gemma 4 under the Apache 2.0 license in sizes 2B, 4B, 31B, and a 26B-A4B Mixture-of-Experts variant.
E2B and E4B use per-layer embeddings: per-decoder-layer token embedding tables used for quick lookups, increasing the number of tables while keeping effective parameter count lower for on-device efficiency.
The author encountered an SVG error ('Attribute x1 redefined') in output from Gemma 4 26B-A4B and obtained an excellent result after manually fixing it.
The author was unable to run Gemma 4 native audio input locally and suspects common local runtimes (LM Studio or Ollama) do not yet support audio input for these models.
Google is positioning Gemma 4 as offering unusually high intelligence-per-parameter.

Google DeepMind released four new vision-capable reasoning LLMs called Gemma 4 under the Apache 2.0 license in sizes 2B, 4B, 31B, and a 26B-A4B Mixture-of-Experts variant.
The E2B and E4B Gemma 4 models include native audio input for speech recognition and understanding.
Google provides API access to the larger Gemma 4 models via AI Studio, and the author added support to llm-gemini to use the 31B model through that API.

E2B and E4B use per-layer embeddings: per-decoder-layer token embedding tables used for quick lookups, increasing the number of tables while keeping effective parameter count lower for on-device efficiency.
The author does not fully understand the per-layer embeddings explanation and infers it is what the 'E' designation refers to.
The two smaller Gemma 4 models are labeled E2B and E4B, where the 'E' denotes an effective parameter size rather than total parameters.

The author encountered an SVG error ('Attribute x1 redefined') in output from Gemma 4 26B-A4B and obtained an excellent result after manually fixing it.
In an API run with a pelican-riding-a-bicycle SVG prompt, Gemma 4 31B produced a good output but omitted the front part of the bicycle frame.
On a pelican-riding-a-bicycle SVG task, the author observed better output quality when moving from Gemma 4 2B to 4B to 26B-A4B.

The author was unable to run Gemma 4 native audio input locally and suspects common local runtimes (LM Studio or Ollama) do not yet support audio input for these models.
Using GGUFs in LM Studio, the author successfully ran Gemma 4 2B (4.41GB), 4B (6.33GB), and 26B-A4B (17.99GB), but the 31B (19.89GB) model looped outputting '---\n' for every prompt.

Google is positioning Gemma 4 as offering unusually high intelligence-per-parameter.
The expectation in the corpus is that improving usefulness of smaller models is a key current research focus.

The author was unable to run Gemma 4 native audio input locally and suspects common local runtimes (LM Studio or Ollama) do not yet support audio input for these models.

What is the precise, authoritative definition and accounting method for Gemma 4 “effective parameters” versus total parameters (including how per-layer embeddings are counted)?
What are the actual latency, memory, and quality tradeoffs of E2B/E4B relative to similarly sized non-“effective” models under standardized inference settings?
When will mainstream local runtimes (e.g., LM Studio, Ollama) support Gemma 4 native audio input end-to-end, and what formats/APIs will that require?
What is the root cause of the Gemma 4 31B GGUF looping output in LM Studio, and does it reproduce across different GGUF conversions, quantizations, or runtime versions?
How frequent are SVG validity errors (like duplicate attributes) and content omissions across prompts, models, and decoding settings, and can they be mitigated reliably with automated validation/repair?

Permissive Apache 2.0 licensing plus multiple Gemma 4 sizes could expand commercial and on device adoption, creating read through to increased usage of Google hosted APIs for larger tier access.
Effective parameter branding and per layer embedding approach suggests a push toward smaller efficient models, implying potential demand for optimized inference toolchains and runtimes that can support these architectures and modalities.
Reported SVG validity errors and local runtime issues imply production users may need validation and repair layers plus better tooling, suggesting opportunity for vendors focused on structured output guardrails and local deployment compatibility.

Mainstream local runtimes add end to end Gemma 4 native audio input support with clearly documented formats and APIs, reducing tooling friction highlighted in the summary.
Authoritative clarification of effective versus total parameter accounting and standardized benchmarks showing latency, memory, and quality tradeoffs for E2B and E4B under consistent inference settings.
Resolution or reproducible diagnosis of GGUF looping in LM Studio and evidence that SVG validity errors can be reliably mitigated with automated validation and repair across prompts and decoding settings.

Local runtime support for Gemma 4 modalities remains unavailable or unreliable for an extended period, keeping deployment friction as the gating factor despite open weights.
Clarification or benchmarks show effective parameter claims do not translate into meaningful real world efficiency or quality benefits versus similarly sized models under standardized settings.
Structured output issues such as SVG invalid attributes and content omissions prove frequent and hard to mitigate, reducing suitability for production structured generation workloads.