Advancing AI alignment through debate and shared normative reasoning

Leveraging a debate framework to assess and improve the normative reasoning skills of AI agents in a multi-agent setting.

| April 11, 2026
Abstract background with geometric shapes

Aligning AI systems with human values is one of the key challenges of AI safety. Gillian Hadfield will draw on the insights from economics, cultural evolution, cognitive science and political science to take a novel approach to the challenge of alignment. Using a debate framework, this project will assess and improve the normative reasoning skills of AI agents in a multi-agent reinforcement learning setting. The approach takes into account the pluralistic, heterogenous nature of human values and the recognition that normative institutions have developed in order to reconcile competing interests and preferences in ways that can address the challenge of alignment, and allow for the integrating of AI agents into human normative systems.

Collaborators

  • Gillian Hadfield

    Vector Institute, Johns Hopkins University, University of Toronto (on leave)