MiniMax Sparse Attention (MSA): a Two-Branch Block-Sparse Attention Trained on a 109B-Parameter MoE With a 3T-Token Budget

MarktechPostgpt transformers gpu attention

How to Build Memory-Efficient Transformers with xFormers Using Packed Sequences, GQA, ALiBi, SwiGLU, and Causal Attention

We implement xFormers, a practical toolkit for fast, memory-efficient Transformer models on GPUs. We validate memory-efficient attention against a standard implementation, then compare speed and memory across sequence lengths. We work through causal masking, packed variable-length sequences, grouped-query attention, and custom ALiBi biases. Finally, we combine these into a trainable GPT-style model with SwiGLU layers and automatic mixed-precision training. The post How to Build Memory-Efficient Transformers with xFormers Using Packed Sequences, GQA, ALiBi, SwiGLU, and Causal Attention appeared first on MarkTechPost.

Jun 17, 12:02 AM

MarktechPostminimax minimax m3 msa architecture minimax sparse attention

MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding

MiniMax M3 introduces MiniMax Sparse Attention, a 1M-token context window, and native image, video, and computer use support. The post MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding appeared first on MarkTechPost.

Jun 1, 8:40 PM

Crypto Briefingminimax decentralized ai m3 model decoding speed

MiniMax teases M3 model with 15.6x faster decoding speed boost

MiniMax's M3 model could revolutionize decentralized AI by significantly reducing latency and costs, enhancing scalability and efficiency. The post MiniMax teases M3 model with 15.6x faster decoding speed boost appeared first on Crypto Briefing.

May 27, 8:03 PM

Crypto Briefingai system minimax disney copyright lawsuit

MiniMax loses bid to end Disney copyright lawsuit over AI system

The ruling highlights the growing tension between AI innovation and copyright law, potentially reshaping global AI market regulations. The post MiniMax loses bid to end Disney copyright lawsuit over AI system appeared first on Crypto Briefing.

May 27, 2:24 AM

Crypto Briefingai system minimax disney ai training data

MiniMax loses bid to dismiss Disney copyright lawsuit over AI system

The case underscores the growing legal scrutiny on AI training data, impacting investor confidence and shaping future AI industry practices. The post MiniMax loses bid to dismiss Disney copyright lawsuit over AI system appeared first on Crypto Briefing.

May 26, 9:43 PM

MarktechPostsparse moe google colab openmythos recurrent-depth transformers

Build Recurrent-Depth Transformers with OpenMythos for MLA, GQA, Sparse MoE, and Loop-Scaled Reasoning

In this tutorial, we explore OpenMythos by building an advanced recurrent-depth transformer workflow that runs end-to-end in Google Colab. We create both MLA and GQA model variants, compare their parameter counts, and check the stability of the recurrent injection matrix through its spectral radius. The post Build Recurrent-Depth Transformers with OpenMythos for MLA, GQA, Sparse MoE, and Loop-Scaled Reasoning appeared first on MarkTechPost.

May 22, 7:39 AM

MarktechPostclaude mythos kv-cache openmythos recurrent-depth transformers

A Coding Tutorial on OpenMythos on Recurrent-Depth Transformers with Depth Extrapolation, Adaptive Computation, and Mixture-of-Experts Routing

In this tutorial, we explore the implementation of OpenMythos, a theoretical reconstruction of the Claude Mythos architecture that enables deeper reasoning through iterative computation rather than increased parameter size. We build and analyze models using both GQA and MLA attention mechanisms, examine memory efficiency through KV-cache comparisons, and validate stability via the spectral properties of […] The post A Coding Tutorial on OpenMythos on Recurrent-Depth Transformers with Depth Extrapolation, Adaptive Computation, and Mixture-of-Experts Routing appeared first on MarkTechPost.

Apr 23, 9:25 PM

Analytics Vidhyagemma 4 minimax minimax m2.7

MiniMax M2.7 Goes Open-Weight to Let You Run Agents Locally

Following in the footsteps of the recently released Gemma 4, MiniMax has now made its latest model, MiniMax M2.7, completely open-weight. In simple terms, developers can now download the model, run it on their own systems, and start building with it. This is in contrast with the model being a completely cloud-hosted AI service up […] The post MiniMax M2.7 Goes Open-Weight to Let You Run Agents Locally appeared first on Analytics Vidhya.

Apr 14, 3:27 PM