Advancing AI alignment through debate and shared normative reasoning
Leveraging a debate framework to assess and improve the normative reasoning skills of AI agents in a multi-agent setting.
Aligning AI systems with human values is one of the key challenges of AI safety. Gillian Hadfield will draw on the insights from economics, cultural evolution, cognitive science and political science to take a novel approach to the challenge of alignment. Using a debate framework, this project will assess and improve the normative reasoning skills of AI agents in a multi-agent reinforcement learning setting. The approach takes into account the pluralistic, heterogenous nature of human values and the recognition that normative institutions have developed in order to reconcile competing interests and preferences in ways that can address the challenge of alignment, and allow for the integrating of AI agents into human normative systems.
Collaborators
Gillian Hadfield
Vector Institute, Johns Hopkins University, University of Toronto (on leave)

