Controllable Reasoning Mode And Api Exposure Gap
Sources: 1 • Confidence: High • Updated: 2026-04-12 10:16
Key takeaways
- At the time described, the author could not find Mistral API documentation for setting reasoning effort.
- Mistral Small 4 is described as a 119B-parameter Mixture-of-Experts model with 6B active parameters.
- Mistral states that Mistral Small 4 unifies reasoning, multimodal, and agentic coding capabilities previously associated with Magistral, Pixtral, and Devstral into one model.
- The author tested Mistral Small 4 via the Mistral API using the llm-mistral plugin and the model identifier "mistral/mistral-small-2603".
- Mistral announced Leanstral, an open-weight model tuned specifically to produce Lean 4 formally verifiable code.
Sections
Controllable Reasoning Mode And Api Exposure Gap
- At the time described, the author could not find Mistral API documentation for setting reasoning effort.
- Mistral Small 4 supports a reasoning_effort setting with values "none" or "high".
- Mistral claims that using reasoning_effort="high" yields verbosity equivalent to previous Magistral models.
Model Architecture And Serving Footprint
- Mistral Small 4 is described as a 119B-parameter Mixture-of-Experts model with 6B active parameters.
- The Mistral Small 4 model weights are 242GB on Hugging Face.
Product Line Consolidation Into A Single Model
- Mistral states that Mistral Small 4 unifies reasoning, multimodal, and agentic coding capabilities previously associated with Magistral, Pixtral, and Devstral into one model.
Practical Access And Reproducibility Path
- The author tested Mistral Small 4 via the Mistral API using the llm-mistral plugin and the model identifier "mistral/mistral-small-2603".
Specialization Toward Formal Verification Workflows
- Mistral announced Leanstral, an open-weight model tuned specifically to produce Lean 4 formally verifiable code.
Watchlist
- At the time described, the author could not find Mistral API documentation for setting reasoning effort.
Unknowns
- Is the MoE configuration (including number of experts, routing behavior, and the stated active-parameter count) confirmed in an official model card or technical report?
- How does Mistral Small 4 perform on standardized reasoning, multimodal, and coding/agent benchmarks relative to the referenced prior models?
- Is reasoning_effort exposed in the Mistral API today, and if so what are the precise parameter name, allowed values, defaults, and billing/usage implications?
- What is the measurable effect of reasoning_effort="high" on output length, quality, latency, and token usage for representative workloads?
- What deployment formats are available for the 242GB weights (sharding layout, precision, and any officially supported smaller variants), and what hardware/software serving requirements are implied?