LeCun's critique suggests a paradigm shift in AI research, emphasizing the need for models that integrate sensory experiences for true intelligence.
The post Yann LeCun says large language models are a dead end, gives them five years appeared first on Crypto Briefing.
Because generative AI (genAI) tools and services have become so ubiquitous (and popular), the costs of using them are going through the roof — leading to an insatiable appetite for tokens.
Tokens represent a common way to measure and price AI use. Much like letters and words in English, large language models (LLMs) grasp a sentence or query by breaking words into tokens.
With the AI explosion well under way, tokens are now “the fundamental units of data our models process, many representing a problem being solved,” according to Google CEO Sundar Pichai. (Google, by the way, processes about 3.2 quadrillion tokens a month.)
But as the price of all those tokens adds up, business and IT execs are looking for ways to cut costs while keeping corporate productivity up. Uncontrolled token use has already landed one company with an unexpected $500 million AI bill.
There are a number of ways companies can rein in the price of AI at the model, infrastructure, silicon, and business levels. Here’s
We’ve all heard the mantra from the quants in the business community: you can’t manage what you can’t measure. And if that’s true for human intelligence, it should be true for the artificial kind too.
How do we measure agents and large language models (LLMs)? We’re just beginning to come up with statistical metrics. Here are several of the most common metrics that designers and users toss about when they’re evaluating a model.
[ See also: 27 questions to ask before choosing an LLM ]
Time to first token
How long does it take to generate the first token? For real-time applications with time constraints, faster responses can be essential. It’s well-known that people hate waiting even a few milliseconds. The teams that develop user interfaces learned decades ago that it’s important for the software to respond quickly when a human is waiting for an answer. Even a few seconds of delay mean that the human will wander off to another window to check some email or place some bet on a prediction
Enterprises are increasingly investing copious amounts of cash in AI without a lot to show for it. This could be, in part, because the wrong people are leading the change.
As I’ve argued before, AI isn’t likely to eliminate developers so much as change what we need from them. For example, we keep asking whether junior developers are needed in a world where large language models can write code faster and cheaper. What this overlooks is the reality that these younger developers and their relative inexperience may be exactly what we need to rewrite the rules of software development.
This thought hit me while reading James Governor’s riff on something Ben Griffiths wrote about our industry’s habit of confusing age with authority. Griffiths remembered sitting through a conference talk in which a speaker tried to shame a young audience for not recognizing some of the older men who had shaped computing. The irony, Ben noted, was that many of those “old men” had done their world-changing wor
Extremely powerful large language models (LLMs) still operate as though they’re typing on a keyboard, processing workloads in a simple left-to-right fashion. But in locally-run, single-user scenarios, this sequential processing can leave graphics processing units (GPUs) and tensor processing units (TPUs) underutilized.
Google is betting that DiffusionGemma can get around this bottleneck. The new experimental open model generates text “exceptionally fast,” creating entire blocks of text simultaneously through diffusion techniques rather than through token-by-token processing. The company says this technique results in 4x faster inference compared to auto-regressive models that rely on sequential processing.
It can also save users money. Technology analyst Carmi Levy noted that existing pay-per-token monetization models “penalize the use of less than optimally efficient AI solutions.”
But DiffusionGemma “could herald a new generation of task-defined, efficient solutions that can enable e
Extremely powerful large language models (LLMs) still operate as though they’re typing on a keyboard, processing workloads in a simple left-to-right fashion. But in locally-run, single-user scenarios, this sequential processing can leave graphics processing units (GPUs) and tensor processing units (TPUs) underutilized.
Google is betting that DiffusionGemma can get around this bottleneck. The new experimental open model generates text “exceptionally fast,” creating entire blocks of text simultaneously through diffusion techniques rather than through token-by-token processing. The company says this technique results in 4x faster inference compared to auto-regressive models that rely on sequential processing.
It can also save users money. Technology analyst Carmi Levy noted that existing pay-per-token monetization models “penalize the use of less than optimally efficient AI solutions.”
But DiffusionGemma “could herald a new generation of task-defined, efficient solutions that can enable e
Coinbase for Agents connects AI to financial execution channels to automate trading and payments directly from user portfolios. Large language models process vast quantities of data but lack direct integration with active financial portfolios. Individuals frequently employ these models to evaluate market developments or research investment opportunities. These software tools possess the capacity for complex […]
The post Coinbase for Agents: Automating portfolio trading with AI appeared first on AI News.