A unified statistical framework for quantifying rare event risks for language models

Quantifying rare-event problems using standard statistical tools to better compare models, set safety standards and measure model regression.

| April 11, 2026
Abstract background with flowing shapes

One of the central challenges of AI alignment is quantifying extremely small failure rates – probabilities so low that ordinary testing will never observe them. ‘Long-tail failures,’ like jailbreaks, policy evasion or subtle safety violations, are where oversight is weakest, making it impossible to compare models, set safety standards or measure model regression. Canada CIFAR AI Chair Bei Jiang will address this issue using standard statistical tools designed for these rare-event problems, which are currently underused in large language model evaluation.

Collaborators

  • Bei Jiang

    Canada CIFAR AI Chair, Amii; University of Alberta