Scaling laws, data distributions, and learning dynamics: simulated high-energy physics data as a benchmark for data in the wild

Physics-based data generation for predicting and testing fundamental questions around how AI learns, generalizes and scales.

AI Alignment Project | April 11, 2026

Current theoretical AI research often relies on overly simple data modeling, making it difficult to answer fundamental questions about scaling laws or whether models truly learn underlying latent parameters. University of Toronto Professor Yonatan Kahn proposes a method of physics-based data generation that will provide ‘ground-truth information’ to researchers, allowing them to predict and test some of the fundamental questions around how AI learns, generalizes and scales.

Collaborators

Yonatan Kahn
University of Toronto

Related Research

AI Alignment Project

Scaling laws, data distributions, and learning dynamics: simulated high-energy physics data as a benchmark for data in the wild

Collaborators

Related Research

A unified statistical framework for quantifying rare event risks for language models

Game-theoretic safety guarantees for advanced AI systems

Sample-efficient online fine-tuning against resistant behaviors: statistical foundations for post-training alignment