CITP Seminar Jaime Fernández Fisac - Machine Bullshit: Emergent Manipulative Behavior in Language Agents
Our research group is currently trying to shed light on what we think is one of the most pressing dangers presaged by the increasing power and reach of AI technologies. The conjunction of large-scale language models like GPT-3 with advanced strategic decision-making systems like AlphaZero can bring about a plethora of extremely effective AI text-generation systems with the ability to produce compelling arguments in support of arbitrary ideas, whether true, false, benign or malicious.
Through continued interactions with many millions of users, such systems could quickly learn to produce statements that are highly likely to elicit the desired human response, belief or action. That is, these systems will reliably say whatever they need to say to achieve their goal: we call this Machine Bullshit, after Harry Frankfurt’s excellent 1986 philosophical essay “On Bullshit”. If not properly understood and mitigated, this technology could result in a large-scale behavior manipulation device far more effective than subliminal advertising, and far more damaging than “deep fakes” in the hands of malicious actors.
Our aim is to bring together insights from dynamic game theory, machine learning and human-robot interaction to better understand these risks and inform the design of safe language-enabled AI systems.”
Jaime Fernández Fisac is an assistant professor in Department of Electrical and Computer Engineering at Princeton. He is an associated faculty member in the Department of Computer Science and the Center for Statistics and Machine Learning as well as a co-director of the Princeton AI4ALL summer camp.
He is interested in ensuring the safe operation of robotic systems in the human space. Fernández Fisac’s work combines safety analysis from control theory with machine learning and artificial intelligence techniques to enable robotic systems to reason competently about their own safety in spite of using inevitably fallible models of the world and other agents. This is done by having robots monitor their own ability to understand the world around them, accounting for how the gap between their models and reality affects their ability to guarantee safety.
Much of his research uses dynamic game theory together with insights from cognitive science to enable robots to strategically plan their interaction with human beings in contexts ranging from human-robot teamwork to drone navigation and autonomous driving. His lab’s scope spans theoretical work, algorithm design, and implementation on a variety of robotic platforms.
Fernández Fisac completed his Ph.D. in electrical engineering and computer science at UC Berkeley in 2019; at the midpoint of his Ph.D., he spent six months doing R&D work at Apple. Before that, Fernández Fisac received his B.S./M.S. in electrical engineering at the Universidad Politécnica de Madrid in Spain and a master’s degree in aeronautics at Cranfield University in the UK. Before joining Princeton in fall 2020, he spent a year as a research scientist at Waymo (formerly known as Google’s Self-Driving Car project).