MiniMax released MSA, a sparse attention built on Grouped Query Attention. A lightweight Index Branch selects Top-k key-value blocks per query and GQA group; the Main Branch attends only to those blocks. It matches GQA on downstream benchmarks while reducing per-token attention compute 28.4× at 1M context.
The post MiniMax Sparse Attention (MSA): a Two-Branch Block-Sparse Attention Trained on a 109B-Parameter MoE With a 3T-Token Budget appeared first on MarkTechPost.
We implement xFormers, a practical toolkit for fast, memory-efficient Transformer models on GPUs. We validate memory-efficient attention against a standard implementation, then compare speed and memory across sequence lengths. We work through causal masking, packed variable-length sequences, grouped-query attention, and custom ALiBi biases. Finally, we combine these into a trainable GPT-style model with SwiGLU layers and automatic mixed-precision training.
The post How to Build Memory-Efficient Transformers with xFormers Using Packed Sequences, GQA, ALiBi, SwiGLU, and Causal Attention appeared first on MarkTechPost.
MiniMax M3 introduces MiniMax Sparse Attention, a 1M-token context window, and native image, video, and computer use support.
The post MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding appeared first on MarkTechPost.
MiniMax's M3 model could revolutionize decentralized AI by significantly reducing latency and costs, enhancing scalability and efficiency.
The post MiniMax teases M3 model with 15.6x faster decoding speed boost appeared first on Crypto Briefing.
The ruling highlights the growing tension between AI innovation and copyright law, potentially reshaping global AI market regulations.
The post MiniMax loses bid to end Disney copyright lawsuit over AI system appeared first on Crypto Briefing.
The case underscores the growing legal scrutiny on AI training data, impacting investor confidence and shaping future AI industry practices.
The post MiniMax loses bid to dismiss Disney copyright lawsuit over AI system appeared first on Crypto Briefing.
In this tutorial, we explore OpenMythos by building an advanced recurrent-depth transformer workflow that runs end-to-end in Google Colab. We create both MLA and GQA model variants, compare their parameter counts, and check the stability of the recurrent injection matrix through its spectral radius.
The post Build Recurrent-Depth Transformers with OpenMythos for MLA, GQA, Sparse MoE, and Loop-Scaled Reasoning appeared first on MarkTechPost.
In this tutorial, we explore the implementation of OpenMythos, a theoretical reconstruction of the Claude Mythos architecture that enables deeper reasoning through iterative computation rather than increased parameter size. We build and analyze models using both GQA and MLA attention mechanisms, examine memory efficiency through KV-cache comparisons, and validate stability via the spectral properties of […]
The post A Coding Tutorial on OpenMythos on Recurrent-Depth Transformers with Depth Extrapolation, Adaptive Computation, and Mixture-of-Experts Routing appeared first on MarkTechPost.
Following in the footsteps of the recently released Gemma 4, MiniMax has now made its latest model, MiniMax M2.7, completely open-weight. In simple terms, developers can now download the model, run it on their own systems, and start building with it. This is in contrast with the model being a completely cloud-hosted AI service up […]
The post MiniMax M2.7 Goes Open-Weight to Let You Run Agents Locally appeared first on Analytics Vidhya.