Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss

ars Technica AIgoogle gemma 4 open ai models speculative decoding

Google's Gemma 4 open AI models use "speculative decoding" to get up to 3x faster

Up to 3x the speed with no loss of quality—is it too good to be true?

May 6, 3:44 PM

O'Reilly AI-MLgemma 4 local models frontier models ai providers

Local AI

The release of Gemma 4 has added energy to the discussion of local models and their importance. Models that you can download and run on hardware you own are becoming competitive with the “frontier models” hosted by large AI providers. These models have gotten good enough for production use, good enough for tasks that until […]

May 1, 2:20 PM

InfoWorld AIgoogle gemma 4 multi-modal model vision

Google’s Gemma 4 shines on local systems – both big and small

Google’s Gemma 4 comes touted as the latest evolution of Google’s multi-modal model offerings. Gemma 4 not only offers reasoning and tool use, but vision and audio functionality, and it’s available in a range of model sizes that target servers and local devices. What’s striking about Gemma 4 is that even at the higher end of its size range, it’s still decently performant on personal hardware. Google claims this is due to innovations in the architecture of the model, but the proof is in the trying. Gemma 4 is quite responsive. To that end, I took Gemma 4 for a spin on my own hardware to see how it fared for its advertised tasks. Gemma 4 model sizes Gemma 4 comes in four basic sizes or “densities”: E2B: 2.3 billion effective parameters, 5.1 billion total, 128K max context window. E4B: 4.5 billion efffective parameters, 8 billion total, 128K max context window. 31B: 31 billion parameters (the “dense” version), 256K max context window. (You will probably not use this one on your own machi

Apr 22, 9:00 AM

Analytics Vidhyagemma 4 google python ai agents

Gemma 4 Tool Calling Explained: Build AI Agents with Function Calling (Step-by-Step Guide)

Imagine asking your AI model, “What’s the weather in Tokyo right now?” and instead of hallucinating an answer, it calls your actual Python function, fetches live data, and responds correctly. That’s how empowering the tool call functions in the Gemma 4 from Google are. A truly exciting addition to open-weight AI: this function calling is […] The post Gemma 4 Tool Calling Explained: Build AI Agents with Function Calling (Step-by-Step Guide) appeared first on Analytics Vidhya.

Apr 18, 5:26 PM

MarktechPostgoogle ai auto-diagnose large language model integration test failures

Google AI Releases Auto-Diagnose: An Large Language Model LLM-Based System to Diagnose Integration Test Failures at Scale

If you have ever stared at thousands of lines of integration test logs wondering which of the sixteen log files actually contains your bug, you are not alone — and Google now has data to prove it. A team of Google researchers introduced Auto-Diagnose, an LLM-powered tool that automatically reads the failure logs from a […] The post Google AI Releases Auto-Diagnose: An Large Language Model LLM-Based System to Diagnose Integration Test Failures at Scale appeared first on MarkTechPost.

Apr 18, 6:00 AM

MarktechPostgoogle ai gemini 3.1 flash tts text-to-speech multilingual generation

Google AI Launches Gemini 3.1 Flash TTS: A New Benchmark in Expressive and Controllable AI Voice

Google has introduced Gemini 3.1 Flash TTS, a preview text-to-speech model focused on improving speech quality, expressive control, and multilingual generation. Unlike previous iterations that prioritized simple conversion, this release emphasizes natural-language audio tags, native support for more than 70 languages, and native multi-speaker dialogue. This release signals a shift from ‘black-box’ audio generation toward […] The post Google AI Launches Gemini 3.1 Flash TTS: A New Benchmark in Expressive and Controllable AI Voice appeared first on MarkTechPost.

Apr 15, 5:06 PM

Analytics Vidhyaminimax m2.7 gemma 4 minimax

MiniMax M2.7 Goes Open-Weight to Let You Run Agents Locally

Following in the footsteps of the recently released Gemma 4, MiniMax has now made its latest model, MiniMax M2.7, completely open-weight. In simple terms, developers can now download the model, run it on their own systems, and start building with it. This is in contrast with the model being a completely cloud-hosted AI service up […] The post MiniMax M2.7 Goes Open-Weight to Let You Run Agents Locally appeared first on Analytics Vidhya.

Apr 14, 3:27 PM

Machine Learning Mastery Blogtool calling gemma 4 python open-weights model ecosystem

How to Implement Tool Calling with Gemma 4 and Python

The open-weights model ecosystem shifted recently with the release of the <a href="https://blog.

Apr 13, 8:00 PM