Tests of how well 19 large language models (LLMs) complete and perform complicated multi-step tasks has shown that they are both error-prone and, in many cases, unreliable.
The findings are contained a preprint paper, LLMs Corrupt Your Documents When You Delegate, written by Microsoft researchers Philippe Laban, Tobias Schnabel and Jennifer Neville based on a benchmark they created called DELEGATE-52 that allowed them to simulate workflows that might be part of a knowledge worker’s tasks. The paper is currently under review.
They said that the benchmark contains 310 work environments across 52 professional domains including coding, crystallography, genealogy and music sheet notation. Each environment consists of real documents totaling around 15K tokens in length, and five to 10 complex editing tasks that a user might ask an LLM to perform.
And, they stated in the paper’s abstract: “Our analysis shows that current LLMs are unreliable delegates: they introduce sparse but severe errors
Tests of how well 19 large language models (LLMs) complete and perform complicated multi-step tasks has shown that they are both error-prone and, in many cases, unreliable.
The findings are contained a preprint paper, LLMs Corrupt Your Documents When You Delegate, written by Microsoft researchers Philippe Laban, Tobias Schnabel and Jennifer Neville based on a benchmark they created called DELEGATE-52 that allowed them to simulate workflows that might be part of a knowledge worker’s tasks. The paper is currently under review.
They said that the benchmark contains 310 work environments across 52 professional domains including coding, crystallography, genealogy and music sheet notation. Each environment consists of real documents totaling around 15K tokens in length, and five to 10 complex editing tasks that a user might ask an LLM to perform.
And, they stated in the paper’s abstract: “Our analysis shows that current LLMs are unreliable delegates: they introduce sparse but severe errors
The post NVIDIA’s Jensen Huang Says AI Will Turn Intelligence into a Commodity for Billions appeared on BitcoinEthereumNews.com.
TLDR: Jensen Huang says AI will make intelligence a commodity accessible to billions worldwide for the first time. NVIDIA chips power data centers at Amazon, Microsoft, Google, and Meta, driving the global AI buildout. Huang argues AI automates tasks but elevates human purpose, pushing back against job displacement fears. The NVIDIA CEO urges scientists, engineers, and policymakers to advance AI capabilities and safety together. NVIDIA chief executive Jensen Huang addressed graduates at Carnegie Mellon University on Sunday, May 10. He received an honorary doctorate at the commencement ceremony. Huang said artificial intelligence will make intelligence a commodity for everyone. He argued the technology will reach billions who have never accessed computing power before. His remarks touched on jobs, safety, and America’s industrial future. AI as a Tool for Closi
OpenAI CEO Sam Altman and Microsoft CTO Kevin Scott. | Image: Getty Images
When OpenAI was busy experimenting with AI-powered gaming bots, Microsoft CEO Satya Nadella and OpenAI CEO Sam Altman were in the early days of forming an AI partnership. Court documents from the ongoing Musk v. Altman trial have provided a rare look at the communications between Microsoft's top executives about investing in OpenAI and fears the AI startup could "storm off to Amazon" and "shit-talk" Microsoft.
Just days after OpenAI showed a bot beating a Dota 2 professional in the summer of 2017, Altman responded to Nadella's congratulations email with a proposal for a much bigger partnership with OpenAI to fund its next phase of AI resear …
Read the full story at The Verge.