Friday, June 6, 2025
HomePositive VibesYoshua Bengio’s LawZero Aims to Detect AI Deception

Yoshua Bengio’s LawZero Aims to Detect AI Deception


AI is everywhere now, helping people move faster and work smarter. But despite its growing reputation, it’s often not that intelligent. Spend enough time with a chatbot and it’ll eventually say something completely wrong or bizarre. A December study by Anthropic AI and Redwood Research found that some AI systems not only lie deliberately, but can strategically mislead their developers to avoid modification.

These developments have fueled a broader debate with two main concerns: whether AI can build a worldwide reputation for reliable, evidence-based responses and whether it can continue to be regulated and modified without developing autonomous resistance. 

Leadership Lab offer

Meet Yoshua Bengio: The AI godfather driving the push for honest AI

Yoshua Bengio, a renowned computer scientist often referred to as the “AI godfather” of AI deep learning, is among those working to find a solution. He is set to lead a new nonprofit organization called LawZero dedicated to developing honest AI systems designed to detect artificial intelligence systems that lie or deceive humans. 

In recent years, Yoshua Bengio has not only been one of the most influential minds in AI, but also a guiding voice for professionals, leading organizations and high-ranking governments on how to navigate the future of artificial intelligence. A recipient of the 2018 Turing Award—often described as the Nobel Prize of computing—Bengio was more recently commissioned by the U.K. government to lead an international AI safety report to examine the malicious natures of AI systems. He has consistently raised alarm bells about a wide range of concerns, from the potential misuse of AI in misinformation and surveillance, to the risks of autonomous systems acting beyond human control. 

AI, driven by patterns and explicit instructions, functions autonomously. As such, it demands thoughtful and practical governance to prevent it from acting outside human morale and to ensure it remains embedded in, rather than separate from, our world.

How AI models can engage in blackmail and prioritize self-interest

Anthropic AI, a leading voice in the ethical debate surrounding artificial intelligence, shocked the tech world in late May when it revealed in a safety report that its Claude Opus 4 system was capable of “extreme actions,” such as blackmailing engineers by threatening to leak personal information. While the company stated these instances are rare, they acknowledged that such behavior is more common than in previous AI models.

Just a few months earlier, a similar incident emerged involving OpenAI’s training of their o1 model. In an experiment where the AI was instructed to pursue its goal at all costs, it lied to testers when it believed that telling the truth would lead to its deactivation, according to Apollo Research.  

“I’m deeply concerned by the behaviors that unrestrained agentic AI systems are already beginning to exhibit—especially tendencies toward self-preservation and deception,” Bengio wrote in a blog post on Tuesday. “Is it reasonable to train AI that will be more and more agentic while we do not understand their potentially catastrophic consequences? LawZero’s research plan aims at developing a non-agentic and trustworthy AI, which I call the Scientist AI,” he further wrote.

Scientist AI will detect and target malicious AI agents that mislead humans

Backed by around $30 million in funding and a research team of over a dozen, Scientist AI will target AI agents—such as those used in customer service, trading or autonomous learning—that show signs of deception or self-preservation, particularly when they appear to deliberately mislead or resist human instructions. 

According to Bengio, part of the current problem catalyzing errors and misjudgments in AI behavior stems from their training. The way AI is taught to mimic human behavior is one of the major variables at play, pushing it to produce responses that aim more to please and reach a conclusion than to be accurate or truthful. Bengio’s AI technology intends to incorporate a broader set of probabilities into its responses and decisions, ensuring it remains fundamentally critical and balanced. 

AI’s breakneck development pace requires a regulatory response that’s just as flexible and determined. Unlike past industrial surges that allowed for thoughtful strategy, governments and regulators are relying on the very executives and organizations speedrunning the challenges to also find the solutions. Bengio’s new AI software is not built or designed like autonomous bots intended to perform human tasks. Instead, Scientist AI will ultimately serve as a watchdog and community preserver—or, as LawZero calls it, “a selfless, idealized and platonic scientist.”

Its purpose is to learn and understand the world rather than actively participate in it. In this way, the system can become a sort of arbiter of virtual right and wrong and potentially a saving grace in combating the epidemic of AI-driven misinformation and its consequences.

Photo by Gumbariya/Shutterstock

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments