Run Google's Gemma 4 models on your own hardware while exposing them via public API using Clarifai Local Runners. Apache 2.0 licensed, multimodal support, and production-ready.
Zyphra releases ZAYA1-8B, a reasoning Mixture of Experts model with only 760M active parameters that outperforms open-weight models many times its size on math and coding benchmarks — closing in on DeepSeek-V3.2 and surpassing Claude 4.5 Sonnet on HMMT'25 with its novel Markovian RSA test-time compute method. Trained end-to-end on AMD Instinct MI300 hardware and released under Apache 2.0, it sets a new standard for intelligence density in the small language model weight class.
The post Zyphra Releases ZAYA1-8B: A Reasoning MoE Trained on AMD Hardware That Punches Far Above Its Weight Class appeared first on MarkTechPost.
Large language models are getting incredibly powerful, but let’s be honest—their inference speed is still a massive headache for anyone trying to use them in production. Google just launched Multi-Token Prediction (MTP) drafters for the Gemma 4 model family. This specialized speculative decoding architecture can actually triple (3x) your speed at inference time, all without […]
The post Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss appeared first on MarkTechPost.
On May 21st, the Innodata GenAI Summit convenes in London for a single day of rigorous, practitioner-led exchange on the challenges defining frontier AI in 2026. Here is what the agenda covers, who is in the room, and why it's a must for AI professionals...
The release of Gemma 4 has added energy to the discussion of local models and their importance. Models that you can download and run on hardware you own are becoming competitive with the “frontier models” hosted by large AI providers. These models have gotten good enough for production use, good enough for tasks that until […]
Google’s Gemma 4 comes touted as the latest evolution of Google’s multi-modal model offerings. Gemma 4 not only offers reasoning and tool use, but vision and audio functionality, and it’s available in a range of model sizes that target servers and local devices.
What’s striking about Gemma 4 is that even at the higher end of its size range, it’s still decently performant on personal hardware. Google claims this is due to innovations in the architecture of the model, but the proof is in the trying. Gemma 4 is quite responsive.
To that end, I took Gemma 4 for a spin on my own hardware to see how it fared for its advertised tasks.
Gemma 4 model sizes
Gemma 4 comes in four basic sizes or “densities”:
E2B: 2.3 billion effective parameters, 5.1 billion total, 128K max context window.
E4B: 4.5 billion efffective parameters, 8 billion total, 128K max context window.
31B: 31 billion parameters (the “dense” version), 256K max context window. (You will probably not use this one on your own machi
Imagine asking your AI model, “What’s the weather in Tokyo right now?” and instead of hallucinating an answer, it calls your actual Python function, fetches live data, and responds correctly. That’s how empowering the tool call functions in the Gemma 4 from Google are. A truly exciting addition to open-weight AI: this function calling is […]
The post Gemma 4 Tool Calling Explained: Build AI Agents with Function Calling (Step-by-Step Guide) appeared first on Analytics Vidhya.
Following in the footsteps of the recently released Gemma 4, MiniMax has now made its latest model, MiniMax M2.7, completely open-weight. In simple terms, developers can now download the model, run it on their own systems, and start building with it. This is in contrast with the model being a completely cloud-hosted AI service up […]
The post MiniMax M2.7 Goes Open-Weight to Let You Run Agents Locally appeared first on Analytics Vidhya.