Nous Research releases Token Superposition Training (TST), a two-phase pre-training method that cuts wall-clock training time by up to 2.5x at matched FLOPs by averaging contiguous token embeddings into bags during Phase 1 and reverting to standard next-token prediction in Phase 2 — without changing the model architecture, tokenizer, optimizer, or inference-time behavior. Validated at 270M, 600M, 3B dense, and 10B-A1B MoE scales.
The post Nous Research Releases Token Superposition Training to Speed Up LLM Pre-Training by Up to 2.5x Across 270M to 10B Parameter Models appeared first on MarkTechPost.
Nous Research has published Lighthouse Attention, a selection-based hierarchical attention mechanism that wraps around standard scaled dot-product attention during pretraining and is removed afterward. Unlike prior methods such as NSA and HISA that pool only keys and values, Lighthouse pools Q, K, and V symmetrically across a multi-resolution pyramid, reducing the attention call from O(N·S·d) to O(S²·d) and running stock FlashAttention on a small dense sub-sequence. Tested on a 530M Llama-3-style model at 98K context, it achieves a 1.40–1.69× end-to-end wall-clock speedup against a cuDNN SDPA baseline with matching or lower final training loss.
The post Nous Research Proposes Lighthouse Attention: A Training-Only Selection-Based Hierarchical Attention That Delivers 1.4–1.7× Pretraining Speedup at Long Context appeared first on MarkTechPost.
The integration streamlines AI deployment, enhancing efficiency and autonomy in task execution, potentially transforming AI agent utilization.
The post Nous Research integrates Grok subscriptions into Hermes Agent, ditching API key friction appeared first on Crypto Briefing.
AI agents are moving beyond simple command-line tools into systems that can plan, schedule, call tools, and run automated workflows. Nous Research’s Hermes Agent framework offers a self-hosted runtime for building advanced agents with state management, tool integration, and secure execution. It supports multi-step planning, background task control, and real-world automation beyond single-purpose coding assistants. […]
The post Hermes Agent Guide: What is it and How to Use it? appeared first on Analytics Vidhya.
Hermes Agent, the open-source self-improving AI agent from Nous Research, has overtaken OpenClaw to claim the #1 position on OpenRouter's global daily token rankings as of May 10, 2026 — generating 224 billion daily tokens versus OpenClaw's 186 billion. The milestone places a Nous Research project ahead of an OpenAI-sponsored platform in real-world daily inference volume, just three months after launch.
The post OpenClaw vs Hermes Agent: Why Nous Research’s Self-Improving Agent Now Leads OpenRouter’s Global Rankings appeared first on MarkTechPost.
MiMo-V2.5-Pro matches frontier coders on benchmarks, ships under MIT, and burns 40-60% fewer tokens per agent run - but it's 1.02 trillion parameters of MoE and you can't run it on your gaming rig.
In this tutorial, we build an end-to-end implementation around Qwen 3.6-35B-A3B and explore how a modern multimodal MoE model can be used in practical workflows. We begin by setting up the environment, loading the model adaptively based on available GPU memory, and creating a reusable chat framework that supports both standard responses and explicit thinking […]
The post A Coding Implementation on Qwen 3.6-35B-A3B Covering Multimodal Inference, Thinking Control, Tool Calling, MoE Routing, RAG, and Session Persistence appeared first on MarkTechPost.