Perplexity AI announces a hybrid local-server inference orchestrator for Personal Computer, automatically routing AI tasks between on-device and cloud models.
The post Perplexity AI Introduces Hybrid Local-Server Inference Orchestrator for Personal Computer: Automatic On-Device and Cloud Task Routing appeared first on MarkTechPost.
Stanford researchers released OpenJarvis, an open-source framework that runs inference, agents, memory, and learning entirely on-device. It decomposes a personal AI system into five composable primitives — Intelligence, Engine, Agents, Tools & Memory, and Learning — and lands within 3.2 points of the best cloud model at roughly 800× lower marginal API cost.
The post Meet OpenJarvis: A Local-First Framework for On-Device Personal AI Agents with Tools, Memory, and Learning appeared first on MarkTechPost.
Nvidia's entry into the PC market with a powerful AI chip could redefine local AI processing, challenging existing tech giants and reshaping user data privacy.
The post Nvidia enters personal computer market with new AI chip that can run 120 billion parameter models locally appeared first on Crypto Briefing.
Perplexity AI open-sources a rewritten Unigram tokenizer that reduces reranker latency and cuts production CPU utilization by 5-6x.
The post Perplexity AI Open-Sources Unigram Tokenizer That Achieves 5x Lower p50 Latency Than Hugging Face tokenizers Crate appeared first on MarkTechPost.
The rise of Neocloud ETFs could significantly reshape AI infrastructure investment, emphasizing specialized GPU services over traditional cloud models.
The post Roundhill files for Neocloud ETF targeting GPU-as-a-Service infrastructure appeared first on Crypto Briefing.
Perplexity has opened its Personal Computer feature to all Mac users through a new desktop app, bringing local AI agent capabilities beyond its previous Max subscriber waitlist. The tool extends Perplexity’s cloud-based Computer product onto users’ own devices, giving AI agents access to local files, native Mac applications, over 400 connectors, and the web to […]
Snap has quietly terminated its $400 million partnership with AI search startup Perplexity, revealing the split as part of its first-quarter earnings report. The deal, announced last November, would have embedded Perplexity’s conversational AI search engine directly into Snapchat’s Chat interface. Despite limited testing with select users, the companies failed to agree on a path […]
Three key advantages of SLMs
Division of labor: Modern AI architecture uses routers to send routine tasks to 7B-parameter SLMs, reserving trillion-parameter LLMs only for complex reasoning.
Economic efficiency: For high-volume, repetitive tasks, SLMs can reduce cloud inference costs by up to 90% while providing near-instant latency.
Privacy at the edge: Because SLMs can run locally on-device or on-premises, they reduce the data leakage risks inherent in sending sensitive telemetry to the public cloud.
Large language models (LLMs) are the workhorses of AI, supporting ever more sophisticated capabilities and workflows, and approaching near-human level performance.
But sometimes more isn’t always better — it’s just more. Specialized data and limited capabilities are just fine for some workflows.
This realization is driving the evolution of small language models (SLMs), rather than one-size-fits-all LLMs. SLMs — coming in the form of domain-specific models, statistical langua