Grow your YouTube channel like a PRO with a free tool

Get Free YouTube Subscribers, Views and Likes

DSI Seminar Series | How Could We Design Aligned and Provably Safe AI?

On April 19, 2024, Dr. Yoshua Bengio presented “How Could We Design Aligned and Provably Safe AI?” His talk was cosponsored by LLNL’s Data Science Institute and the Center for Advanced Signal and Image Sciences. A Turning Award winner, Bengio is recognized as one of the world’s leading AI experts, known for his pioneering work in deep learning. He is a full professor at the University of Montreal, and the founder and scientific director of the Mila – Quebec AI Institute. In 2022, Bengio became the mostcited computer scientist in the world.
Evaluating the risks with a learned AI system statically seems hopeless because the number of contexts in which it could behave is infinite or exponentially large and static checks can only verify a finite and relatively small set of such contexts. However, if we had a runtime evaluation of risk, we could potentially prevent actions with an unacceptable level of risk. The probability of harm produced by an action or a plan in a given context and past data under the true explanation for how the world works is unknown. However, under reasonable hypotheses related to Occam's Razor and having a nonparametric Bayesian prior (that thus includes the true explanation) it can be shown to be bounded by quantities that can in principle be numerically approximated or estimated by large neural networks, all based on a Bayesian view that captures epistemic uncertainty about what is harm and how the world works. Capturing this uncertainty is essential: The AI could otherwise be confidently wrong about what is “good” and produce catastrophic existential risks, for example through instrumental goals or taking control of the reward mechanism (wrongly thinking that the rewards recorded in the computer are what it should maximize). The bound relies on a kind of paranoid theory, the one that has maximal probability given that it predicts harm and given the past data. The talk will discuss the research program based on these ideas and how amortized inference with large neural networks could be made to estimate the required quantities.
LLNLVIDEO865371