Advancing AI alignment through debate and shared normative reasoning

Leveraging a debate framework to assess and improve the normative reasoning skills of AI agents in a multi-agent setting.

Catalyst Project | April 11, 2026

Abstract background with geometric shapes

Aligning AI systems with human values is one of the key challenges of AI safety. Gillian Hadfield will draw on the insights from economics, cultural evolution, cognitive science and political science to take a novel approach to the challenge of alignment. Using a debate framework, this project will assess and improve the normative reasoning skills of AI agents in a multi-agent reinforcement learning setting. The approach takes into account the pluralistic, heterogenous nature of human values and the recognition that normative institutions have developed in order to reconcile competing interests and preferences in ways that can address the challenge of alignment, and allow for the integrating of AI agents into human normative systems.

Collaborators

Gillian Hadfield
Vector Institute, Johns Hopkins University, University of Toronto (on leave)

Related Research

Catalyst Project

Advancing AI alignment through debate and shared normative reasoning

Collaborators

Related Research

Addressing AI-Safety through Indigenous Community-based Governance

Adversarial robustness in knowledge graphs

Adversarial robustness of large language model (LLM) safety